WO2019200923A1 - 基于拼音的语义识别方法、装置以及人机对话系统 - Google Patents
基于拼音的语义识别方法、装置以及人机对话系统 Download PDFInfo
- Publication number
- WO2019200923A1 WO2019200923A1 PCT/CN2018/117626 CN2018117626W WO2019200923A1 WO 2019200923 A1 WO2019200923 A1 WO 2019200923A1 CN 2018117626 W CN2018117626 W CN 2018117626W WO 2019200923 A1 WO2019200923 A1 WO 2019200923A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sentence
- recognized
- statement
- pinyin
- vector
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 95
- 239000013598 vector Substances 0.000 claims abstract description 158
- 238000013528 artificial neural network Methods 0.000 claims abstract description 60
- 238000012549 training Methods 0.000 claims description 110
- 230000015654 memory Effects 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 14
- 239000012634 fragment Substances 0.000 claims description 14
- 230000004044 response Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 description 26
- 230000008569 process Effects 0.000 description 19
- 238000010586 diagram Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 7
- 230000003993 interaction Effects 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 5
- 230000004913 activation Effects 0.000 description 3
- 238000010422 painting Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
Definitions
- Embodiments of the present disclosure relate to the field of human-machine dialogue, and in particular, to a semantic recognition method, apparatus, and human-machine dialog system.
- a method for semantic recognition is provided.
- the pinyin sequence of the sentence to be recognized is obtained.
- the pinyin sequence includes a plurality of pinyin segments.
- word vectors of the plurality of pinyin segments are obtained.
- the word vectors of the plurality of pinyin segments are combined into a sentence vector of the sentence to be recognized.
- an output vector of the sentence to be recognized is obtained using a neural network.
- the semantics of the statement to be recognized are identified as the semantics of the reference sentence.
- the pinyin segment is a pinyin of a word in the sentence to be recognized.
- the Pinyin fragment is a Pinyin letter of a word in the sentence to be recognized.
- the step of determining a reference sentence that is semantically similar to the statement to be recognized based on an output vector of the to-be-identified statement calculating an output vector and a reference sentence of the statement to be recognized The distance between the output vectors of the focused candidate reference statements. When the distance is less than a threshold, the candidate reference sentence is determined as a reference sentence that is semantically similar to the statement to be recognized.
- the word vectors of the plurality of Pinyin segments are obtained using a word embedding model.
- the method further comprises training the word embedding model using the first training data.
- the first training data includes a pinyin sequence of a plurality of training sentences.
- the method further comprises: obtaining a Pinyin sequence for each of the at least one set of training sentences, wherein the semantics of the training statements in each set of training sentences are similar; for each set of training statements: Obtaining a word vector for each Pinyin fragment in each Pinyin sequence of the training sentence; combining the word vectors of each Pinyin fragment in each Pinyin sequence of each training sentence into a sentence vector for each training sentence; and using each training The sentence vector of the statement trains the neural network such that the neural network has the same output vector for each training statement.
- the pinyin sequence of the sentence to be recognized in the step of obtaining the pinyin sequence of the sentence to be recognized, the pinyin sequence of the sentence to be recognized input by the user through the pinyin input method is obtained.
- the voice information of the statement to be recognized issued by the user is obtained. Then, the voice information is voice-recognized to obtain text information corresponding to the voice information. Next, the text information is converted into a pinyin sequence of the sentence to be recognized.
- an apparatus for semantic recognition includes at least one processor and at least one memory storing a computer program. And when the computer program is executed by the at least one processor, causing the apparatus to: obtain a pinyin sequence of a sentence to be recognized, the pinyin sequence includes a plurality of pinyin segments; and obtain words of the plurality of pinyin segments a vector; combining the word vectors of the plurality of pinyin segments into a sentence vector of the sentence to be recognized; and obtaining an output vector of the statement to be recognized using a neural network based on the sentence vector of the sentence to be recognized; Determining a reference sentence that is semantically similar to the sentence to be recognized based on an output vector of the statement to be recognized; and identifying a semantic of the sentence to be recognized as a semantic of the reference sentence.
- an apparatus for semantic recognition includes: a pinyin sequence obtaining module configured to obtain a pinyin sequence of a sentence to be recognized; a word embedding module configured to obtain a word vector of the plurality of pinyin segments; a sentence vector obtaining module configured to The word vectors of the plurality of pinyin segments are combined into a sentence vector of the sentence to be recognized; a neural network module configured to obtain an output vector of the to-be-identified sentence using a neural network based on a sentence vector of the sentence to be recognized a semantic recognition module configured to determine a reference sentence that is semantically similar to the statement to be recognized based on an output vector of the statement to be recognized, and to identify a semantic of the statement to be recognized as the reference sentence Semantics.
- a system for human-machine dialog comprising: an obtaining device configured to acquire a statement to be recognized from a user; use according to any one of the embodiments of the present disclosure And a display device configured to, in response to determining a reference sentence that is semantically similar to the statement to be recognized, obtain a reply associated with the reference sentence, and output the reply to the user.
- a computer readable storage medium storing computer executable instructions that, when executed by a computer, cause the computer to perform any of the embodiments in accordance with the present disclosure
- the method for semantic recognition is also provided.
- a computer system comprising a processor and a memory coupled to the processor, the memory storing program instructions, the processor being configured to load and execute A method for semantic recognition according to any one of the embodiments of the present disclosure is performed by program instructions in a memory.
- FIG. 1 shows a schematic structural diagram of an exemplary human-machine dialog system in which a semantic recognition method and apparatus can be implemented in accordance with an embodiment of the present disclosure
- Figure 2 shows a schematic dialog flow diagram of the human-machine dialog system shown in Figure 1;
- FIG. 3 illustrates a flow chart of a semantic recognition method in accordance with an embodiment of the present disclosure
- FIG. 4 illustrates an exemplary training process for the word embedding model in a semantic recognition method in accordance with an embodiment of the present disclosure
- FIG. 5 illustrates an exemplary training process for the neural network in a semantic recognition method in accordance with an embodiment of the present disclosure
- FIG. 6 shows a schematic structural block diagram of a semantic recognition apparatus according to an embodiment of the present disclosure
- FIG. 7 shows a schematic structural block diagram of a semantic recognition apparatus according to an embodiment of the present disclosure.
- the erroneous word detection model is mainly used, and the target word is paired with the universal word and judged one by one whether the erroneous word pair feature in the model is met. If the result of the test is that the target word is a wrong word, the wrong word is replaced with a generic word corresponding to the wording.
- the implementation steps of this method are cumbersome, and the processing of the wrong words requires manual labeling, which further increases the cost.
- FIG. 1 shows a schematic block diagram of an exemplary human-machine dialog system 100 in which a semantic recognition method and apparatus may be implemented in accordance with an embodiment of the present disclosure.
- the human-machine dialog system may include a smart terminal unit 110, a voice recognition server 120, a web server 130, and a semantic server 140.
- the smart terminal unit 110 may be a smart terminal such as a personal computer, a smart phone, a tablet computer, or the like.
- the smart terminal unit 110 may have a voice collection function, so that the user's voice information can be collected; the network communication function, so that the collected voice information can be sent to the voice recognition server 120 for processing, and the voice recognition server 120 can be recognized.
- the information is sent to the web server 130; and a certain computational storage function enables storage and calculations related to the collection and transmission of voice information and other functions.
- the voice recognition server 120 can be a server computer system with voice recognition function, which can use a third party voice recognition service, such as a voice recognition function provided by a company such as Keda Xunfei and Baidu. After the smart terminal unit 110 sends the collected voice information to the voice recognition server 120, the voice recognition server 120 performs voice recognition on the voice information, generates corresponding text information, and returns the text information to the smart information. Terminal unit 110.
- a third party voice recognition service such as a voice recognition function provided by a company such as Keda Xunfei and Baidu.
- the smart terminal unit 110 may itself have a voice recognition function, and in this case, the human-machine dialog system 100 may not include the separate voice recognition server 120.
- the web server 130 can be a computer system having web service functionality and providing a web access interface.
- the web server 130 receives the text information sent by the smart terminal unit 110 as problem information, sends the text information to the semantic server 140, and sends the result returned by the semantic server 140 as a result of the reply.
- the smart terminal unit 110 is given.
- the semantic server 140 may be a computer system having a semantic understanding function for processing problem information.
- the matching problem is sought by matching the problem information with the stored questions in the database including the question answer.
- the problem information is identified by the matched question, and then the corresponding reply is returned.
- the semantic server 140 includes functionality to provide semantic understanding services, as well as functionality to provide model training of models upon which semantic understanding relies.
- the semantic server 140 may only include a semantic understanding service function that uses a trained model to provide a semantic understanding service. The training of the model can be located on a separate server.
- the web server 130 and the semantic server 140 can be combined into a single server and implemented on a single computer system.
- the intelligent terminal unit 110, the voice recognition server 120, the web server 130, and the semantic server 140 may be communicably connected to each other through a network.
- the network may be, for example, any one or more of a computer network and/or a telecommunications network such as the Internet, a local area network, a wide area network, an intranet, and the like.
- FIG. 2 there is shown a schematic dialog flow diagram of the human-machine dialog system shown in Figure 1. As shown in FIG. 2, the dialog flow includes the following steps:
- step 201 the smart terminal unit 110 collects voice information through a microphone or the like, and then transmits the collected voice information to the voice recognition server 120 through the network.
- the voice recognition server 120 performs voice recognition on the voice information collected by the smart terminal unit 110, generates text information (for example, Chinese character text information or text information in other languages) as a result of voice recognition, and returns it to the smart terminal unit. 110.
- text information for example, Chinese character text information or text information in other languages
- step 203 after receiving the text information as the result of the voice recognition, the smart terminal unit 110 transmits it as problem information (for example, packaged problem information having a specific format) to the web server 130.
- problem information for example, packaged problem information having a specific format
- the web server 130 obtains the text information from the question information sent by the smart terminal unit 110 as a question text and sends it to the semantic server 140.
- the semantic server 140 After receiving the question text, the semantic server 140 performs semantic recognition by matching the question text to the question in the database including the question answer. The semantic server 140 returns a corresponding response after finding the best matching question.
- semantic recognition methods and apparatus in accordance with embodiments of the present disclosure are primarily implemented in the semantic server 140 of the dialog system 100.
- composition and dialog flow of the exemplary dialog system 100 in which the semantic recognition method and apparatus may be implemented according to an embodiment of the present disclosure are described above with reference to the accompanying drawings.
- the web server can also be implemented by other types of servers or local computer systems. Some systems may also not include a web server, but rather communicate directly with the semantic server by the intelligent terminal unit.
- the semantic recognition method and apparatus according to an embodiment of the present disclosure may also be implemented in other systems than the dialog system 100.
- the semantic recognition method and apparatus according to an embodiment of the present disclosure can also be applied to any case where a pinyin input method is used to semantically recognize text (for example, Chinese text) input using a pinyin input method.
- a pinyin input method is used to semantically recognize text (for example, Chinese text) input using a pinyin input method.
- a semantic recognition method and apparatus according to an embodiment of the present disclosure may be used for pinyin.
- the text output by the input method is semantically identified to identify and/or replace typos.
- the system in which the semantic recognition method and apparatus according to an embodiment of the present disclosure may be applied may not include the voice recognition server, but may include: intelligence for accepting the user's pinyin input and generating corresponding text information a terminal unit, a web server for receiving text information from the smart terminal unit, and a semantic server for receiving text information from the web server, semantically identifying the text information, and returning a semantic recognition result.
- the smart terminal unit may include a device having a pinyin input method, such as a keyboard, a touch screen, etc., so that text can be input using the pinyin input method.
- the intelligent terminal unit may not include a voice collection function.
- FIG. 3 a flow diagram of a semantic recognition method in accordance with an embodiment of the present disclosure is shown. At least a portion of the semantic recognition method can be performed, for example, in the dialog system 100 shown in FIG. 1 and described above (eg, primarily by the semantic server 140), or in other systems (eg, systems using pinyin input methods). Executed in).
- the semantic recognition method may include the following steps:
- a pinyin sequence of the sentence to be recognized is obtained.
- the pinyin sequence includes a plurality of pinyin segments.
- This step 301 can be performed by, for example, the semantic server 140 in the dialog system 100 shown in FIG. 1, in which case the semantic server 140 can obtain user-to-speech conversion from the web server 130 or the smart terminal unit 110. Text information and convert it to the corresponding pinyin sequence.
- This step 301 can also be performed in common by, for example, the semantic server 140, the smart terminal unit 110, the voice recognition server 120, and the web server 130 in the dialog system 100 shown in FIG.
- the statement to be recognized may include, for example, a word or a word in a Chinese sentence, and may also include a word in a sentence such as English or the like.
- the step 301 of obtaining the Pinyin sequence of the sentence to be recognized includes the substep of obtaining a Pinyin sequence of the sentence to be recognized input by the user through the Pinyin input method. This sub-step can be performed, for example, by a smart terminal unit using a pinyin input method.
- the step 301 of obtaining a Pinyin sequence of the statement to be recognized includes the following sub-steps:
- Sub-step 1 Obtain the voice information of the statement to be recognized issued by the user.
- This sub-step can be performed, for example, by the intelligent terminal unit 110 in the dialog system 100.
- the smart terminal unit 110 can obtain the voice information of the sentence "what year this picture is drawn" issued by the user.
- Sub-step 2 Perform speech recognition on the speech information to obtain text information corresponding to the speech information.
- This sub-step can be performed, for example, by the speech recognition server 120 in the dialog system 100.
- the voice recognition server 120 can perform voice recognition on the voice information of the sentence "Which picture is this year", and obtain text information "what year is this picture”.
- Sub-step 3 Convert the text information into a pinyin sequence of the sentence to be recognized.
- This sub-step can be performed, for example, by the semantic server 140 in the dialog system 100.
- the semantic server 140 can receive text information "what year is this picture drawn”. After the word information is divided into words, the text information is converted into a pinyin sequence "zhe fu hua shi na nian hua de”.
- step 302 word vectors for the plurality of pinyin segments are obtained.
- This step 302 can be performed, for example, by the semantic server 140 of the dialog system 100 shown in Figure 1 or a semantic server in other systems.
- the plurality of pinyin segments are pinyin segments of each of the words to be recognized.
- each pinyin fragment in the pinyin sequence “zhe fu hua shi na nian hua de” is “zhe”, “fu”, “hua”, “shi”, “na”, “nian”, “hua”, “ De”.
- the method before the step 302, the method further includes a step 303, in which the pinyin segments of each of the words to be recognized are split into initials and finals.
- the pinyin segment in the Pinyin sequence For example, the pinyin “zhe”, “fu”, “hua”, “shi”, “na”, “nian”, “hua”, “” of each word in the pinyin sequence "zhe fu hua shi na nian hua de” De” is split into initials and finals to form pinyin fragments "zh”, “e”, “f”, "u”, "h”, “ua”, “sh”, “i”, "n”, “a”, “n”, “ian”, "h”, “ua”, “d”, “e”.
- the word vectors of the plurality of pinyin segments are obtained using a word embedding model.
- the word embedding model may embed a model for a trained word, and the training method may be as described later.
- the word embedding model can be any type of word embedding model known in the art.
- the word embedding model can be used to use words from a vocabulary (for example, in this application, the initials or finals of Chinese characters, Chinese pinyin or Chinese pinyin can also be used, such as English and the like. Language words, etc.) are mapped to vectors in vector space (which can be called word vectors).
- the word embedding model receives each pinyin segment in the pinyin sequence as an input, and outputs a word vector of each pinyin segment.
- the word embedding model receives pinyin segments "zh”, “e”, “f”, “u”, “h”, “ua”, “sh”, “i”, "n”, “a”, “n”, “ian”, "h”, “ua”, “d”, “e”, and output the word vector of each pinyin fragment.
- the word embedding model is a Word2vec model.
- the Word2vec model is a common set of word embedding models. These models are a two-layer neural network that is trained to reconstruct the linguistic context of the word.
- Word2vec takes a text corpus as input and produces a vector space that typically has hundreds of dimensions. Each word in the corpus is assigned a corresponding vector in the space, the word vector. Word vectors are distributed in vector space such that word vectors of words having a common context in the corpus are located close to each other in vector space.
- the word vectors of the plurality of pinyin segments are combined into a sentence vector of the sentence to be recognized.
- Each element of the sentence vector is a word vector of each pinyin segment in the pinyin sequence of the sentence to be recognized.
- the sentence vector can be a multi-dimensional vector.
- the sentence vector of the sentence "Which picture is this year” is composed of each pinyin fragment “zh”, “e”, “f”, “u”, “h”, “ua”, “sh”, “ Word vectors of i”, "n”, “a”, “n”, “ian”, "h”, “ua”, “d”, “e”.
- This step 304 can be performed, for example, by the semantic server 140 of the dialog system 100 shown in FIG. 1 or a semantic server in other systems.
- an output vector of the statement to be recognized is obtained using a neural network based on the sentence vector of the sentence to be recognized.
- This step 305 can be performed, for example, by the semantic server 140 of the dialog system 100 shown in FIG. 1 or a semantic server in other systems.
- the neural network can for example be stored in the form of software in the memory of the semantic server.
- the neural network may be a trained neural network, and the training method thereof may be as described later.
- the neural network may be any neural network or combination of several neural networks that are known in the art to be capable of analyzing natural language.
- the neural network may be a deep learning neural network such as Convolutional Neural Networks (CNN) or Long Short-Term Memory (LSTM).
- CNN Convolutional Neural Networks
- LSTM Long Short-Term Memory
- CNN can generally include an input layer (A+B), several convolutional layers + activation function layers, several sub-sampling layers interleaved with convolution layers, and an output layer, as is known in the art.
- the input layer is for receiving input data.
- the convolution layer is used to perform convolution processing on data output from the previous layer.
- the convolutional layer has weights and offsets.
- the weight represents a convolution kernel, and the offset is the scalar of the output superimposed on the convolutional layer.
- each convolutional layer can include tens or hundreds of convolution kernels.
- Each CNN can include multiple convolution layers.
- the activation function layer is used to perform function transformation on the output data of the previous convolutional layer.
- the sub-sampling layer is used to subsample the data from the previous layer, including but not limited to: max-pooling, avg-pooling, random combining, undersampling (decimation, for example, selecting a fixed pixel), demultiplexing the output (demuxout, splitting the input image into a plurality of smaller images), and the like.
- the output layer can include an activation function and is used to output output data.
- Neural networks usually go through the training phase and the use phase.
- the neural network is trained using training data, which includes input data and expected output data.
- input data is input into the neural network to obtain output data.
- the parameters inside the neural network are adjusted by comparison with the expected output data.
- the trained neural network can be used to perform tasks such as image, semantic recognition, etc., that is, input data is input into the trained neural network to obtain corresponding output data.
- a reference sentence that is semantically similar to the statement to be recognized is determined.
- the semantics of the statement to be recognized is identified as the semantics of the reference sentence.
- the steps 306 and 307 can be performed, for example, by the semantic server 140 of the dialog system 100 shown in FIG. 1 or a semantic server in other systems.
- the output layer of the neural network may be used directly to determine a reference statement that is semantically similar to the statement to be recognized based on an output vector of the statement to be recognized.
- the reference sentence may be, for example, a question statement from a database including a question answer.
- a plurality of problem statements that may be involved in the dialog system 100 and responses corresponding to each question statement may be included in the database.
- the database may be stored, for example, in a memory associated with the semantic server 140 or in a memory accessible by the semantic server 140.
- the neural network can be used to obtain the output vector of the statement in step 305.
- a sentence vector for each question statement in the database (which may be obtained by step 304 above) is input to the neural network to obtain an output vector for each question statement.
- the statement to be recognized is semantically similar to a question statement. If it is determined that the to-be-identified statement is semantically similar to a question statement in the database, a reply corresponding to the question statement may be obtained from the database. The reply is then provided to the user as a reply to the statement to be recognized.
- the reference sentence may be, for example, a search sentence library from a search system.
- the search statement library can include a large number of search statements that may be involved in the search system.
- the neural network can be used in step 305 to obtain an output vector of the statement to be recognized.
- a sentence vector of each search sentence in the search sentence library (which can be obtained by the same steps described above) is input to the neural network to obtain an output vector of each search sentence.
- it is determined whether the statement to be recognized is semantically similar to a certain search sentence. If it is determined that the to-be-identified statement is semantically similar to a certain search sentence, the search statement may be presented to the user to replace the search sentence input by the user that may contain the wrong pinyin.
- the step 306 of identifying whether the statement to be recognized and the reference sentence are semantically similar by comparing an output vector of the statement to be recognized with an output vector of a reference sentence may include the following sub- step:
- sub-step 1 the distance between the output vector of the statement to be recognized and the output vector of the candidate reference statement in the reference sentence set is calculated.
- the candidate reference sentence is determined as a reference sentence that is semantically similar to the statement to be recognized.
- cosine distance also called cosine similarity
- Euclidean distance Euclidean distance
- Mahalanobis distance Mahalanobis distance
- the word embedding model used in the step 302 can be a trained word embedding model.
- the neural network used in the step 305 can be a trained neural network.
- the semantic recognition method may further include implementing a training process for the word embedding model and a training process for the neural network.
- the training process for the word embedding model can be completed prior to step 302 of using the word embedding model.
- the training process for the neural network can be completed prior to step 305 of using the neural network.
- These training processes may be performed by, for example, the semantic server 140 in the dialog system 100 shown in FIG. 1, or may also be performed by a semantic server in other systems.
- the technical solution of the embodiment of the present disclosure it is possible to obtain a pinyin sequence having a high degree of similarity to the pronunciation of the pinyin sequence of the words in the sentence to be recognized, so as to remove the words of the same pronunciation but different meanings appearing in the speech recognition or spelling process.
- the interference caused. This improves the accuracy of speech understanding or pinyin input.
- the pre-processing steps required according to the technical solution of the embodiments of the present disclosure are simple and efficient, and thus are a low-cost solution.
- Fig. 4 illustrates an exemplary training process for the word embedding model in a semantic recognition method, in accordance with an embodiment of the present disclosure.
- the training process for the word embedding model in the semantic recognition method includes the following steps:
- the word embedding model is trained using the first training data.
- the first training data includes a pinyin sequence of a plurality of training sentences.
- the first training data can be generated, for example, by acquiring a large number of sentences from a text corpus, converting each sentence into a pinyin sequence, and obtaining a plurality of pinyin segments in the pinyin sequence of each sentence.
- the pinyin segments may be, for example, pinyin for each word (or word), or may be a pinyin segment formed by further splitting the pinyin of each word (or word) into initials and finals.
- the text corpus may for example be a text corpus for a particular kind of dialog system.
- the statement in the text corpus is the statement used in the particular kind of dialog system.
- a text corpus for a dialog system for technical support for a certain or a certain type of product will include various statements used in the technical support process for that or such product.
- the text corpus may also be a corpus of statements used in some other context.
- the text corpus may also be a corpus of common sentences in a language (eg, Chinese, English).
- the pinyin segments in the pinyin sequence of each sentence in the first training data are input into the word embedding model.
- the word embedding model outputs a word vector for each pinyin piece in the pinyin sequence of each sentence.
- the parameters of the word embedding model are continuously adjusted such that word vectors of a pinyin segment having a common context (eg, appearing in the same sentence and having a distance less than a specified distance) in the first training data are in the vector space. The location is closer.
- the trained word embedding model can output a word vector close to the distance for the pinyin segments having the common context.
- the word embedding model can be used in the step 302.
- Fig. 5 illustrates an exemplary training process for the neural network in a semantic recognition method, in accordance with an embodiment of the present disclosure.
- the training process for the neural network in the semantic recognition method includes the following steps:
- a pinyin sequence for each of the at least one set of training statements is obtained.
- the semantics of the training statements in each set of training statements are similar.
- the training sentence "Who is this picture drawn” and the training statement "Who is the author of this picture” is a set of semantically similar training statements.
- the at least one set of training statements can be derived, for example, from a text corpus.
- the text corpus may for example be a text corpus for a particular kind of dialog system.
- the statement in the text corpus is the statement used in the particular kind of dialog system.
- a text corpus for a dialog system for technical support for a certain or a certain type of product will include various statements used in the technical support process for that or such product.
- the text corpus may also be a corpus of statements used in some other context.
- the text corpus may also be a corpus of common sentences in a language (eg, Chinese, English).
- each training sentence can be converted to a pinyin sequence. Then, a plurality of pinyin segments in the Pinyin sequence of each training sentence are obtained.
- the pinyin segments may be, for example, pinyin for each word (or word), or may be a pinyin segment formed by further splitting the pinyin of each word (or word) into initials and finals.
- a word vector for each pinyin segment in each of the Pinyin sequences of the training statement is obtained.
- the word vector of each Pinyin fragment in the Pinyin sequence is obtained using the word embedding model.
- the word embedding model may be, for example, a word embedding model trained in the above step 401.
- the word vectors of each pinyin segment in the Pinyin sequence of each training sentence are combined into a sentence vector for each training sentence.
- Each element of the sentence vector of each training sentence is a word vector of each pinyin segment in the Pinyin sequence of each training sentence.
- the sentence vector can be a multi-dimensional vector.
- the neural network is trained using a sentence vector of each of the at least one set of training statements.
- a sentence vector of each of the set of semantically similar training sentences is input to the neural network to obtain an output of the neural network.
- the internal parameters of the neural network are then adjusted with the same goal of making each of the set of semantically similar training statements identical.
- the neural network will be able to output the same or similar results for a plurality of sentences that are semantically identical or similar but different in text, thereby obtaining semantic recognition capabilities.
- the semantic recognition method has been described above with reference to the accompanying drawings, and it is to be noted that the above description is only an example, and is not a limitation of the present disclosure.
- the method may have more, fewer, or different steps, and the order, inclusion, and function relationships between the various steps may differ from those described and illustrated.
- multiple functions that are typically completed in one step can also be performed in a number of separate steps.
- Multiple steps to perform different functions can be combined into one step to perform these functions.
- Some steps can be performed in any order or in parallel. All such variations are within the spirit and scope of the disclosure.
- a semantic recognition apparatus is also provided.
- Fig. 6, shows a schematic block diagram of a semantic recognition apparatus 600 in accordance with an embodiment of the present disclosure.
- the functions or operations performed by the components in the semantic recognition device 600 correspond to at least some of the above-described semantic recognition methods according to embodiments of the present disclosure.
- the semantic recognition device is implemented by, for example, the semantic server 140 in the dialog system 100 shown in FIG. 1, or by a semantic server in other systems.
- the semantic recognition device may be implemented, for example, by a combination of general purpose computer hardware and semantic recognition software that implements the processor, memory, etc. of the semantic server. Where the memory loads the semantic recognition software to the processor and the semantic recognition software is executed by the processor, the components in the semantic recognition device are formed and their functions or operations are performed.
- the semantic recognition apparatus 600 includes a Pinyin sequence obtaining module 601, a word embedding module 602, a sentence vector obtaining module 603, a neural network module 604, and a semantic recognition module 605.
- the Pinyin Sequence Acquisition Module 601 is configured to obtain a Pinyin sequence of the sentence to be recognized.
- the word embedding module 602 is configured to obtain word vectors for the plurality of pinyin segments.
- the sentence vector obtaining module 603 is configured to combine the word vectors of the plurality of pinyin segments into a sentence vector of the sentence to be recognized.
- the neural network module 604 is configured to obtain an output vector of the statement to be recognized using a neural network based on a sentence vector of the sentence to be recognized.
- the semantic recognition module 605 is configured to determine a reference sentence that is semantically similar to the sentence to be recognized based on an output vector of the sentence to be recognized, and to identify a semantic of the sentence to be recognized as a semantic of the reference sentence.
- the pinyin segment is a pinyin of a word in the sentence to be recognized.
- the semantic recognition apparatus further includes:
- the splitting module 606 is configured to split the pinyin corresponding to the words in the sentence to be recognized in the pinyin sequence into an initial and a final as the pinyin in the pinyin sequence.
- the semantic recognition module 605 is further configured to:
- the candidate reference sentence is determined as a reference sentence that is semantically similar to the statement to be recognized.
- the word embedding model is a Word2vec model.
- the word embedding module is further configured to be trained using the first training data.
- the first training data includes a pinyin sequence of a plurality of training sentences.
- the Pinyin Sequence Acquisition Module 601 is further configured to obtain a Pinyin sequence of words in each of the at least one set of second training sentences.
- the semantics of the training statements in each set of second training sentences are similar.
- the word embedding module 602 is further configured to obtain a word vector for each of the pinyin segments of each training sentence.
- the sentence vector obtaining module 603 is further configured to combine the word vectors of each pinyin segment in the Pinyin sequence of each training sentence into a sentence vector of each training sentence.
- the neural network module 604 is further configured to train the neural network using a sentence vector of each training statement such that the neural network has the same output vector for each training statement.
- the Pinyin Sequence Acquisition Module 601 is further configured to obtain a Pinyin sequence of a sentence to be recognized input by a user through a Pinyin input method.
- FIG. 7 shows a schematic structural block diagram of a semantic recognition apparatus 700 according to an embodiment of the present disclosure.
- the apparatus 700 can include a processor 701 and a memory 702 that stores a computer program.
- the apparatus 700 is caused to perform the steps of the semantic recognition method as shown in FIG. That is, device 700 can obtain a pinyin sequence of the sentence to be recognized.
- the pinyin sequence includes a plurality of pinyin segments.
- the device 700 can then obtain the word vectors for the plurality of pinyin segments.
- the apparatus 700 may combine the word vectors of the plurality of pinyin segments into a sentence vector of the sentence to be recognized.
- Apparatus 700 can then obtain an output vector of the statement to be recognized using a neural network based on the sentence vector of the statement to be recognized.
- the device 700 may determine a reference sentence that is semantically similar to the statement to be recognized based on an output vector of the statement to be recognized.
- Apparatus 700 can identify the semantics of the statement to be recognized as the semantics of the reference statement.
- processor 701 may be, for example, a central processing unit CPU, a microprocessor, a digital signal processor (DSP), a multi-core based processor architecture processor, or the like.
- Memory 702 can be any type of memory implemented using data storage techniques including, but not limited to, random access memory, read only memory, semiconductor based memory, flash memory, magnetic disk memory, and the like.
- device 700 may also include an input device 703, such as a keyboard, mouse, microphone, etc., for entering a statement to be recognized. Additionally, device 700 can also include an output device 704, such as a display or the like, for outputting a reply.
- an input device 703 such as a keyboard, mouse, microphone, etc.
- an output device 704 such as a display or the like, for outputting a reply.
- the apparatus 700 may determine, based on an output vector of the to-be-identified statement, a reference sentence that is semantically similar to the statement to be recognized by calculating an output vector of the to-be-identified sentence. a distance from an output vector of the candidate reference statement in the reference sentence set; when the distance is less than the threshold, the candidate reference sentence is determined as a reference sentence that is semantically similar to the statement to be recognized.
- apparatus 700 may also train the word embedding model using first training data.
- the first training data includes a pinyin sequence of a plurality of training sentences.
- apparatus 700 may also obtain a Pinyin sequence for each of the at least one set of training statements, wherein the semantics of the training statements in each set of training sentences are similar. For each set of training sentences, the apparatus 700 may further: obtain a word vector of each Pinyin fragment in each Pinyin sequence of the training sentence; combine the word vectors of each Pinyin fragment in each Pinyin sequence of each training sentence into each a sentence vector of the training statement; and training the neural network using the sentence vector of each training statement such that the neural network has the same output vector for each training statement.
- the apparatus 700 may obtain a Pinyin sequence of a sentence to be recognized by obtaining a Pinyin sequence of a sentence to be recognized input by a user through a Pinyin input method.
- the apparatus 700 may obtain a pinyin sequence of a sentence to be recognized by obtaining a voice information of a sentence to be recognized issued by a user, and performing voice recognition on the voice information to obtain a corresponding Text information of the voice information; converting the text information into a pinyin sequence of the sentence to be recognized.
- the device may have more, fewer, or different modules, and the relationships, inclusive and functional relationships between the various modules may be different than described and illustrated.
- the device may have more, fewer, or different modules, and the relationships, inclusive and functional relationships between the various modules may be different than described and illustrated.
- multiple functions performed by one module can also be performed by multiple separate modules. Multiple modules that perform different functions can be combined into one module that performs these functions.
- the functions performed by one module can also be performed by another module. All such variations are within the spirit and scope of the disclosure.
- a human-machine dialog system is also provided.
- the human-machine dialog system may be, for example, the human-machine dialog system 100 shown in FIG. 1, or a part thereof or a variant thereof.
- the human-machine dialog system may include: an acquisition device, the semantic recognition device 600, 700, and an output device according to any one of the embodiments of the present disclosure.
- the obtaining means is configured to acquire a statement to be recognized from the user.
- the output device is configured to, in response to determining a reference statement that is semantically similar to the statement to be recognized, obtain a reply associated with the reference statement and output the reply to the user.
- a computer readable storage medium storing computer executable instructions.
- the computer executable instructions when executed by a computer, cause the computer to perform a semantic recognition method in accordance with any one of the embodiments of the present disclosure.
- a computer system in still another aspect of the present disclosure, includes a processor and a memory coupled to the processor.
- Program instructions are stored in the memory, the processor being configured to perform a semantic recognition method according to any one of the embodiments of the present disclosure by loading and executing program instructions in the memory.
- the computer system may also include other components, such as various input and output components, communication components, etc., as such components may be components of existing computer systems and therefore will not be described again.
- text information is converted to pinyin during the training phase.
- the pinyin of a word is further divided into two parts, an initial and a final. Then the word is embedded.
- the text information is converted into a sentence vector, it is trained through the neural network.
- the service is provided, the text information is converted into a pinyin sequence, and then the neural network forward operation is used to obtain the sentence with the highest similarity as the matching result.
- This can adapt to more erroneous words and remove the interference caused by words with different meanings in speech recognition or spelling. And you can keep the original network design unchanged, just add simple pre-processing.
- the technical solution provided by the embodiments of the present disclosure ultimately improves the accuracy of semantic understanding in the entire system, and is a low-cost solution.
- the semantic recognition method and apparatus and human-machine dialog system may be implemented by hardware, software, firmware, or any combination thereof.
- the semantic recognition method and apparatus and human-machine dialog system according to embodiments of the present disclosure may be implemented in a centralized manner in one computer system, or in a distributed manner, in which different components are distributed over several interconnected In a computer system.
- a typical combination of hardware and software can be a general purpose computer system with a computer program.
- the program code modules in the computer program correspond to modules in the semantic recognition apparatus according to an embodiment of the present disclosure, and when the computer program is loaded and executed, the computer system is controlled to execute the embodiment according to the present disclosure. Semantic recognition of the operation and function of each module in the device.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (21)
- 一种用于语义识别的方法,包括:获得待识别语句的拼音序列,所述拼音序列包括多个拼音片段;获得所述多个拼音片段的词向量;将所述多个拼音片段的所述词向量组合成所述待识别语句的句向量;基于所述待识别语句的所述句向量,使用神经网络获得所述待识别语句的输出向量;基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句;以及将所述待识别语句的语义识别为所述参考语句的语义。
- 根据权利要求1所述的方法,其中,所述拼音片段是所述待识别语句中的词的拼音。
- 根据权利要求1所述的方法,其中,所述拼音片段是所述待识别语句中的词的拼音字母。
- 根据权利要求1所述的方法,其中,所述基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句包括:计算所述待识别语句的输出向量与参考语句集中的候选参考语句的输出向量之间的距离;当所述距离小于阈值时,将所述候选参考语句确定为与所述待识别语句在语义上相似的参考语句。
- 根据权利要求1所述的方法,其中,所述多个拼音片段的词向量使用词嵌入模型获得。
- 根据权利要求5所述的方法,进一步包括:使用第一训练数据训练所述词嵌入模型,其中,所述第一训练数据包括多个训练语句的拼音序列。
- 根据权利要求1所述的方法,进一步包括:获得至少一组训练语句中的每个训练语句的拼音序列,其中每组训练语句中的训练语句的语义相似;对于每组训练语句:获得每个训练语句的拼音序列中的每个拼音片段的词向量;将每个训练语句的拼音序列中的每个拼音片段的词向量组合成每个训练语句的句向量;以及使用每个训练语句的句向量训练所述神经网络,以使得所述神经网络针对每个训练语句的输出向量相同。
- 根据权利要求1所述的方法,其中,所述获得待识别语句的拼音序列包括:获得用户通过拼音输入法输入的待识别语句的拼音序列。
- 根据权利要求1所述的方法,其中,所述获得待识别语句的拼音序列包括:获得用户发出的待识别语句的语音信息;对所述语音信息进行语音识别,以获得对应于所述语音信息的文本信息;将所述文本信息转换为所述待识别语句的拼音序列。
- 一种用于语义识别的装置,包括:至少一个处理器;以及存储有计算机程序的至少一个存储器;其中,当所述计算机程序由所述至少一个处理器执行时,使得所述装置执行以下操作:获得待识别语句的拼音序列,所述拼音序列包括多个拼音片段;获得所述多个拼音片段的词向量;将所述多个拼音片段的所述词向量组合成所述待识别语句的句向量;基于所述待识别语句的所述句向量,使用神经网络获得所述待识别语句的输出向量;基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句;以及将所述待识别语句的语义识别为所述参考语句的语义。
- 根据权利要求10所述的装置,其中,所述拼音片段是所述待识别语句中的词的拼音。
- 根据权利要求10所述的装置,其中,所述拼音片段是所述待识别语句中的词的拼音字母。
- 根据权利要求10所述的装置,其中,所述计算机程序在由所述至少一个处理器执行时使得所述装置通过以下操作来基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句:计算所述待识别语句的输出向量与参考语句集中的候选参考语句的输出向量之间的距离;当所述距离小于阈值时,将所述候选参考语句确定为与所述待识别语句在语义上相似的参考语句。
- 根据权利要求10所述的装置,其中,所述多个拼音片段的词向量使用词嵌入模型获得。
- 根据权利要求14所述的装置,其中,所述计算机程序在由所述至少一个处理器执行时使得所述装置还执行以下操作:使用第一训练数据训练所述词嵌入模型,其中,所述第一训练数据包括多个训练语句的拼音序列。
- 根据权利要求10所述的装置,其中,所述计算机程序在由所述至少一个处理器执行时使得所述装置还执行以下操作:获得至少一组训练语句中的每个训练语句的拼音序列,其中每组训练语句中的训练语句的语义相似;对于每组训练语句:获得每个训练语句的拼音序列中的每个拼音片段的词向量;将每个训练语句的拼音序列中的每个拼音片段的词向量组合成每个训练语句的句向量;以及使用每个训练语句的句向量训练所述神经网络,以使得所述神经网络针对每个训练语句的输出向量相同。
- 根据权利要求10所述的装置,其中,所述计算机程序在由所述至少一个处理器执行时使得所述装置通过以下操作来获得待识别语句的拼音序列:获得用户通过拼音输入法输入的待识别语句的拼音序列。
- 根据权利要求10所述的装置,其中,所述计算机程序在由所述至少一个处理器执行时使得所述装置通过以下操作来获得待识别语句的拼音序列:获得用户发出的待识别语句的语音信息;对所述语音信息进行语音识别,以获得对应于所述语音信息的文本信息;将所述文本信息转换为所述待识别语句的拼音序列。
- 一种用于语义识别的装置,包括:拼音序列获得模块,其被配置为获得待识别语句的拼音序列;词嵌入模块,其被配置为获得所述多个拼音片段的词向量;句向量获得模块,其被配置为将所述多个拼音片段的所述词向量组合成所述待识别语句的句向量;神经网络模块,其被配置为基于所述待识别语句的所述句向量,使用神经网络获得所述待识别语句的输出向量;以及语义识别模块,其被配置为基于所述待识别语句的输出向量,确定与所述待识别语句在语义上相似的参考语句,以及将所述待识别语句的语义识别为所述参考语句的语义。
- 一种用于人机对话的系统,包括:获取装置,其被配置为获取来自用户的待识别语句;根据权利要求10-18中任何一个所述的用于语义识别的装置,以及输出设备,其被配置为响应于确定与所述待识别语句在语义上相似的参考语句,获得与所述参考语句关联的答复,并将所述答复输出给用户。
- 一种计算机可读存储介质,其存储有计算机可执行指令,所述计 算机可执行指令当被计算机执行时使得该计算机执行根据权利要求1-9中任何一个所述的方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/464,381 US11100921B2 (en) | 2018-04-19 | 2018-11-27 | Pinyin-based method and apparatus for semantic recognition, and system for human-machine dialog |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810354766.8A CN108549637A (zh) | 2018-04-19 | 2018-04-19 | 基于拼音的语义识别方法、装置以及人机对话系统 |
CN201810354766.8 | 2018-04-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019200923A1 true WO2019200923A1 (zh) | 2019-10-24 |
Family
ID=63515638
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/117626 WO2019200923A1 (zh) | 2018-04-19 | 2018-11-27 | 基于拼音的语义识别方法、装置以及人机对话系统 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11100921B2 (zh) |
CN (1) | CN108549637A (zh) |
WO (1) | WO2019200923A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942767A (zh) * | 2019-11-05 | 2020-03-31 | 深圳市一号互联科技有限公司 | 一种asr语言模型识别标注与优化方法及其装置 |
CN110992959A (zh) * | 2019-12-06 | 2020-04-10 | 北京市科学技术情报研究所 | 一种语音识别方法及系统 |
CN111079898A (zh) * | 2019-11-28 | 2020-04-28 | 华侨大学 | 一种基于TextCNN网络的信道编码识别方法 |
CN111414481A (zh) * | 2020-03-19 | 2020-07-14 | 哈尔滨理工大学 | 基于拼音和bert嵌入的中文语义匹配方法 |
CN112133295A (zh) * | 2020-11-09 | 2020-12-25 | 北京小米松果电子有限公司 | 语音识别方法、装置及存储介质 |
CN113360623A (zh) * | 2021-06-25 | 2021-09-07 | 达闼机器人有限公司 | 一种文本匹配方法、电子设备及可读存储介质 |
Families Citing this family (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549637A (zh) | 2018-04-19 | 2018-09-18 | 京东方科技集团股份有限公司 | 基于拼音的语义识别方法、装置以及人机对话系统 |
CN109446521B (zh) * | 2018-10-18 | 2023-08-25 | 京东方科技集团股份有限公司 | 命名实体识别方法、装置、电子设备、机器可读存储介质 |
CN109299269A (zh) * | 2018-10-23 | 2019-02-01 | 阿里巴巴集团控股有限公司 | 一种文本分类方法和装置 |
CN109657229A (zh) * | 2018-10-31 | 2019-04-19 | 北京奇艺世纪科技有限公司 | 一种意图识别模型生成方法、意图识别方法及装置 |
CN109684643B (zh) * | 2018-12-26 | 2021-03-12 | 湖北亿咖通科技有限公司 | 基于句向量的文本识别方法、电子设备及计算机可读介质 |
US11250221B2 (en) * | 2019-03-14 | 2022-02-15 | Sap Se | Learning system for contextual interpretation of Japanese words |
CN109918681B (zh) * | 2019-03-29 | 2023-01-31 | 哈尔滨理工大学 | 一种基于汉字-拼音的融合问题语义匹配方法 |
CN110097880A (zh) * | 2019-04-20 | 2019-08-06 | 广东小天才科技有限公司 | 一种基于语音识别的答题判定方法及装置 |
CN111862961A (zh) * | 2019-04-29 | 2020-10-30 | 京东数字科技控股有限公司 | 识别语音的方法和装置 |
CN112037776A (zh) * | 2019-05-16 | 2020-12-04 | 武汉Tcl集团工业研究院有限公司 | 一种语音识别方法、语音识别装置及终端设备 |
CN112151018B (zh) * | 2019-06-10 | 2024-10-29 | 阿里巴巴集团控股有限公司 | 语音评测及语音识别方法、装置、设备及存储介质 |
CN110288980A (zh) * | 2019-06-17 | 2019-09-27 | 平安科技(深圳)有限公司 | 语音识别方法、模型的训练方法、装置、设备及存储介质 |
FR3098000B1 (fr) * | 2019-06-27 | 2022-05-13 | Ea4T | Procédé et dispositif d’obtention d’une réponse à partir d’une question orale posée à une interface homme-machine. |
US11170175B1 (en) * | 2019-07-01 | 2021-11-09 | Intuit, Inc. | Generating replacement sentences for a particular sentiment |
CN110473540B (zh) * | 2019-08-29 | 2022-05-31 | 京东方科技集团股份有限公司 | 语音交互方法及系统、终端设备、计算机设备及介质 |
CN110705274B (zh) * | 2019-09-06 | 2023-03-24 | 电子科技大学 | 基于实时学习的融合型词义嵌入方法 |
CN110909534B (zh) * | 2019-11-08 | 2021-08-24 | 北京华宇信息技术有限公司 | 一种深度学习评价模型、输入法拼音纠错方法及装置 |
EP4080399A4 (en) * | 2019-12-18 | 2022-11-23 | Fujitsu Limited | INFORMATION PROCESSING PROGRAM, INFORMATION PROCESSING METHOD AND INFORMATION PROCESSING DEVICE |
CN110990632B (zh) * | 2019-12-19 | 2023-05-02 | 腾讯科技(深圳)有限公司 | 一种视频处理方法及装置 |
CN111192572A (zh) * | 2019-12-31 | 2020-05-22 | 斑马网络技术有限公司 | 语义识别的方法、装置及系统 |
CN111145734A (zh) * | 2020-02-28 | 2020-05-12 | 北京声智科技有限公司 | 一种语音识别方法及电子设备 |
CN113539247B (zh) * | 2020-04-14 | 2024-06-18 | 京东科技控股股份有限公司 | 语音数据处理方法、装置、设备及计算机可读存储介质 |
CN111696535B (zh) * | 2020-05-22 | 2021-10-26 | 百度在线网络技术(北京)有限公司 | 基于语音交互的信息核实方法、装置、设备和计算机存储介质 |
KR20210087098A (ko) | 2020-05-22 | 2021-07-09 | 바이두 온라인 네트웍 테크놀러지 (베이징) 캄파니 리미티드 | 음성 인터랙션에 기반하는 정보 검증 방법, 장치, 기기, 컴퓨터 저장 매체 및 컴퓨터 프로그램 제품 |
CN111832308B (zh) * | 2020-07-17 | 2023-09-08 | 思必驰科技股份有限公司 | 语音识别文本连贯性处理方法和装置 |
CN114091408A (zh) * | 2020-08-04 | 2022-02-25 | 科沃斯商用机器人有限公司 | 文本纠正、模型训练方法、纠正模型、设备及机器人 |
CN111986653B (zh) * | 2020-08-06 | 2024-06-25 | 杭州海康威视数字技术股份有限公司 | 一种语音意图识别方法、装置及设备 |
CN112149680B (zh) * | 2020-09-28 | 2024-01-16 | 武汉悦学帮网络技术有限公司 | 错字检测识别方法、装置、电子设备及存储介质 |
CN112259182B (zh) * | 2020-11-05 | 2023-08-11 | 中国联合网络通信集团有限公司 | 一种电子病历的生成方法和装置 |
CN112767924A (zh) * | 2021-02-26 | 2021-05-07 | 北京百度网讯科技有限公司 | 语音识别方法、装置、电子设备及存储介质 |
CN113035200B (zh) * | 2021-03-03 | 2022-08-05 | 科大讯飞股份有限公司 | 基于人机交互场景的语音识别纠错方法、装置以及设备 |
CN113268974B (zh) * | 2021-05-18 | 2022-11-29 | 平安科技(深圳)有限公司 | 多音字发音标注方法、装置、设备及存储介质 |
CN113284499B (zh) * | 2021-05-24 | 2024-07-12 | 亿咖通(湖北)技术有限公司 | 一种语音指令识别方法及电子设备 |
CN113345429B (zh) * | 2021-06-18 | 2022-03-29 | 图观(天津)数字科技有限公司 | 一种基于复杂场景的语义分析方法及系统 |
CN113655893B (zh) * | 2021-07-08 | 2024-06-18 | 华为技术有限公司 | 一种词句生成方法、模型训练方法及相关设备 |
CN113781998B (zh) * | 2021-09-10 | 2024-06-07 | 河南松音科技有限公司 | 基于方言纠正模型的语音识别方法、装置、设备及介质 |
CN114048751A (zh) * | 2021-11-08 | 2022-02-15 | 北京明略软件系统有限公司 | 拼音字母向量的确定方法、装置、电子设备和存储介质 |
CN114360517B (zh) * | 2021-12-17 | 2023-04-18 | 天翼爱音乐文化科技有限公司 | 一种复杂环境下的音频处理方法、装置及存储介质 |
CN116312968A (zh) * | 2023-02-09 | 2023-06-23 | 广东德澳智慧医疗科技有限公司 | 一种基于人机对话和核心算法的心理咨询和疗愈系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090210218A1 (en) * | 2008-02-07 | 2009-08-20 | Nec Laboratories America, Inc. | Deep Neural Networks and Methods for Using Same |
CN106484664A (zh) * | 2016-10-21 | 2017-03-08 | 竹间智能科技(上海)有限公司 | 一种短文本间相似度计算方法 |
CN106897263A (zh) * | 2016-12-29 | 2017-06-27 | 北京光年无限科技有限公司 | 基于深度学习的机器人对话交互方法及装置 |
CN107491547A (zh) * | 2017-08-28 | 2017-12-19 | 北京百度网讯科技有限公司 | 基于人工智能的搜索方法和装置 |
CN108549637A (zh) * | 2018-04-19 | 2018-09-18 | 京东方科技集团股份有限公司 | 基于拼音的语义识别方法、装置以及人机对话系统 |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6848080B1 (en) * | 1999-11-05 | 2005-01-25 | Microsoft Corporation | Language input architecture for converting one text form to another text form with tolerance to spelling, typographical, and conversion errors |
EP1627325B1 (en) * | 2003-05-28 | 2011-07-27 | LOQUENDO SpA | Automatic segmentation of texts comprising chunks without separators |
US8095364B2 (en) * | 2004-06-02 | 2012-01-10 | Tegic Communications, Inc. | Multimodal disambiguation of speech recognition |
US9471566B1 (en) * | 2005-04-14 | 2016-10-18 | Oracle America, Inc. | Method and apparatus for converting phonetic language input to written language output |
US20060282255A1 (en) * | 2005-06-14 | 2006-12-14 | Microsoft Corporation | Collocation translation from monolingual and available bilingual corpora |
US8204751B1 (en) * | 2006-03-03 | 2012-06-19 | At&T Intellectual Property Ii, L.P. | Relevance recognition for a human machine dialog system contextual question answering based on a normalization of the length of the user input |
US8862988B2 (en) * | 2006-12-18 | 2014-10-14 | Semantic Compaction Systems, Inc. | Pictorial keyboard with polysemous keys for Chinese character output |
JP2010531492A (ja) * | 2007-06-25 | 2010-09-24 | グーグル・インコーポレーテッド | ワード確率決定 |
US20140330865A1 (en) * | 2011-11-30 | 2014-11-06 | Nokia Corporation | Method and apparatus for providing address geo-coding |
KR101394253B1 (ko) * | 2012-05-16 | 2014-05-13 | 광주과학기술원 | 음성 인식 오류 보정 장치 |
KR101364774B1 (ko) * | 2012-12-07 | 2014-02-20 | 포항공과대학교 산학협력단 | 음성 인식의 오류 수정 방법 및 장치 |
CN103678675A (zh) | 2013-12-25 | 2014-03-26 | 乐视网信息技术(北京)股份有限公司 | 通过拼音进行搜索的方法、服务器及系统 |
US20170206004A1 (en) * | 2014-07-15 | 2017-07-20 | Amar Y Servir | Input of characters of a symbol-based written language |
CN104298429B (zh) * | 2014-09-25 | 2018-05-04 | 北京搜狗科技发展有限公司 | 一种基于输入的信息展示方法和输入法系统 |
CN105874874B (zh) * | 2014-12-09 | 2021-01-29 | 华为技术有限公司 | 一种处理信息的方法及装置 |
US9965045B2 (en) * | 2015-02-12 | 2018-05-08 | Hoi Chiu LO | Chinese input method using pinyin plus tones |
CN105244029B (zh) * | 2015-08-28 | 2019-02-26 | 安徽科大讯飞医疗信息技术有限公司 | 语音识别后处理方法及系统 |
CN106683677B (zh) * | 2015-11-06 | 2021-11-12 | 阿里巴巴集团控股有限公司 | 语音识别方法及装置 |
US10509862B2 (en) * | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
CN107515850A (zh) * | 2016-06-15 | 2017-12-26 | 阿里巴巴集团控股有限公司 | 确定多音字发音的方法、装置和系统 |
KR20180055189A (ko) * | 2016-11-16 | 2018-05-25 | 삼성전자주식회사 | 자연어 처리 방법 및 장치와 자연어 처리 모델을 학습하는 방법 및 장치 |
CN107220235B (zh) * | 2017-05-23 | 2021-01-22 | 北京百度网讯科技有限公司 | 基于人工智能的语音识别纠错方法、装置及存储介质 |
CN107451121A (zh) * | 2017-08-03 | 2017-12-08 | 京东方科技集团股份有限公司 | 一种语音识别方法及其装置 |
-
2018
- 2018-04-19 CN CN201810354766.8A patent/CN108549637A/zh active Pending
- 2018-11-27 US US16/464,381 patent/US11100921B2/en active Active
- 2018-11-27 WO PCT/CN2018/117626 patent/WO2019200923A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090210218A1 (en) * | 2008-02-07 | 2009-08-20 | Nec Laboratories America, Inc. | Deep Neural Networks and Methods for Using Same |
CN106484664A (zh) * | 2016-10-21 | 2017-03-08 | 竹间智能科技(上海)有限公司 | 一种短文本间相似度计算方法 |
CN106897263A (zh) * | 2016-12-29 | 2017-06-27 | 北京光年无限科技有限公司 | 基于深度学习的机器人对话交互方法及装置 |
CN107491547A (zh) * | 2017-08-28 | 2017-12-19 | 北京百度网讯科技有限公司 | 基于人工智能的搜索方法和装置 |
CN108549637A (zh) * | 2018-04-19 | 2018-09-18 | 京东方科技集团股份有限公司 | 基于拼音的语义识别方法、装置以及人机对话系统 |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110942767A (zh) * | 2019-11-05 | 2020-03-31 | 深圳市一号互联科技有限公司 | 一种asr语言模型识别标注与优化方法及其装置 |
CN110942767B (zh) * | 2019-11-05 | 2023-03-17 | 深圳市一号互联科技有限公司 | 一种asr语言模型识别标注与优化方法及其装置 |
CN111079898A (zh) * | 2019-11-28 | 2020-04-28 | 华侨大学 | 一种基于TextCNN网络的信道编码识别方法 |
CN111079898B (zh) * | 2019-11-28 | 2023-04-07 | 华侨大学 | 一种基于TextCNN网络的信道编码识别方法 |
CN110992959A (zh) * | 2019-12-06 | 2020-04-10 | 北京市科学技术情报研究所 | 一种语音识别方法及系统 |
CN111414481A (zh) * | 2020-03-19 | 2020-07-14 | 哈尔滨理工大学 | 基于拼音和bert嵌入的中文语义匹配方法 |
CN111414481B (zh) * | 2020-03-19 | 2023-09-26 | 哈尔滨理工大学 | 基于拼音和bert嵌入的中文语义匹配方法 |
CN112133295A (zh) * | 2020-11-09 | 2020-12-25 | 北京小米松果电子有限公司 | 语音识别方法、装置及存储介质 |
CN112133295B (zh) * | 2020-11-09 | 2024-02-13 | 北京小米松果电子有限公司 | 语音识别方法、装置及存储介质 |
CN113360623A (zh) * | 2021-06-25 | 2021-09-07 | 达闼机器人有限公司 | 一种文本匹配方法、电子设备及可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US11100921B2 (en) | 2021-08-24 |
CN108549637A (zh) | 2018-09-18 |
US20200335096A1 (en) | 2020-10-22 |
US20210264903A9 (en) | 2021-08-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019200923A1 (zh) | 基于拼音的语义识别方法、装置以及人机对话系统 | |
US11238845B2 (en) | Multi-dialect and multilingual speech recognition | |
US11568855B2 (en) | System and method for defining dialog intents and building zero-shot intent recognition models | |
US10540964B2 (en) | Method and apparatus for processing natural language, method and apparatus for training natural language processing model | |
US10437929B2 (en) | Method and system for processing an input query using a forward and a backward neural network specific to unigrams | |
US9805718B2 (en) | Clarifying natural language input using targeted questions | |
CN117521675A (zh) | 基于大语言模型的信息处理方法、装置、设备及存储介质 | |
US11907665B2 (en) | Method and system for processing user inputs using natural language processing | |
US11615787B2 (en) | Dialogue system and method of controlling the same | |
CN113157959A (zh) | 基于多模态主题补充的跨模态检索方法、装置及系统 | |
CN110717021A (zh) | 人工智能面试中获取输入文本和相关装置 | |
Alrumiah et al. | Intelligent Quran Recitation Recognition and Verification: Research Trends and Open Issues | |
CN113051384A (zh) | 基于对话的用户画像抽取方法及相关装置 | |
CN113609873A (zh) | 翻译模型训练方法、装置及介质 | |
JP2019204415A (ja) | 言い回し文生成方法、言い回し文装置及びプログラム | |
Latha et al. | Visual audio summarization based on NLP models | |
Hattimare et al. | Maruna Bot: An extensible retrieval-focused framework for task-oriented dialogues | |
US20220215834A1 (en) | System and method for speech to text conversion | |
JP7411149B2 (ja) | 学習装置、推定装置、学習方法、推定方法及びプログラム | |
CN118377909B (zh) | 基于通话内容的客户标签确定方法、装置及存储介质 | |
KR102383043B1 (ko) | 생략 복원 학습 방법과 인식 방법 및 이를 수행하기 위한 장치 | |
US20240282296A1 (en) | Method and system for conditional hierarchical domain routing and intent classification for virtual assistant queries | |
Vasuki et al. | Using Pre-trained Models for Code-Switched Speech Recognition | |
Nzeyimana | KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods | |
JP2023007014A (ja) | 応答システム、応答方法、および応答プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18915564 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18915564 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18915564 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.05.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18915564 Country of ref document: EP Kind code of ref document: A1 |