CN112259084B - Speech recognition method, device and storage medium - Google Patents
Speech recognition method, device and storage medium Download PDFInfo
- Publication number
- CN112259084B CN112259084B CN202010597703.2A CN202010597703A CN112259084B CN 112259084 B CN112259084 B CN 112259084B CN 202010597703 A CN202010597703 A CN 202010597703A CN 112259084 B CN112259084 B CN 112259084B
- Authority
- CN
- China
- Prior art keywords
- text
- sentence
- current
- lattice
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000003062 neural network model Methods 0.000 claims abstract description 47
- 238000012549 training Methods 0.000 claims description 21
- 230000015654 memory Effects 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 11
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006854 communication Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
The disclosure provides a voice recognition method, a voice recognition device and a storage medium, and relates to the technical field of voice recognition. A speech recognition method of the present disclosure includes: acquiring candidate lattice according to the voice signal of the current sentence; resetting the neural network model according to the text above corresponding to the current sentence, wherein the text above is the identification text of the previous sentence or multiple sentences of the current sentence; re-scoring the candidate lattice by the reset neural network model to obtain re-scored lattice; and determining the identification text of the current sentence according to the repartitioning lattice. By the method, the information of one or more sentences can be considered in the voice recognition of the current sentence, so that the prior information is more fully utilized, the re-scoring is more accurate, and the accuracy of the voice recognition is improved.
Description
Technical Field
The present disclosure relates to the field of speech recognition technology, and in particular, to a speech recognition method, apparatus, and storage medium.
Background
The voice recognition is a key technology in voice quality inspection, man-machine conversation and other systems, and is widely applied to the fields of logistics, finance, industry and the like. The accurate recognition rate is a key of all voice systems, for example, in a conversation robot, if the voice recognition accuracy rate is poor, the true intention of a speaker cannot be accurately understood, and then an error instruction is issued.
Disclosure of Invention
It is an object of the present disclosure to improve the accuracy of speech recognition.
According to an aspect of some embodiments of the present disclosure, there is provided a voice recognition method, including: acquiring candidate lattice according to the voice signal of the current sentence; resetting a neural network model according to the text above corresponding to the current sentence, wherein the text above is the identification text of the sentence or sentences before the current sentence, and the neural network model is generated based on corpus sample training with the text above; re-scoring the candidate lattice by the reset neural network model to obtain re-scored lattice; and determining the identification text of the current sentence according to the repartitioning lattice.
In some embodiments, the voice recognition method further comprises: the recognition text of the current sentence is stored in the buffer for use as the context text for the subsequent sentence.
In some embodiments, the voice recognition method further comprises: and acquiring the identified text corresponding to the current sentence from the buffer.
In some embodiments, obtaining a candidate lattice from the speech signal of the current sentence includes: and decoding the voice signal for one time based on the acoustic model and the language model to obtain candidate language.
In some embodiments, determining the recognition text of the current sentence from the repartitioning lattice includes: and performing acoustic weight and language weight analysis on the scoring lattice to obtain a decoding result of the path with the highest score, and using the decoding result as an identification text of the current sentence.
In some embodiments, the neural network model includes an LSTM (Long-Short Term Memory, long term memory) model or a GRU (Gate Recurrent Unit, gated loop unit) model.
In some embodiments, where the speech signal is a speech signal of a conversation, the identified text above for the current sentence includes the speech signal of the speaker preceding the current sentence that is closest to the utterance of the current sentence.
In some embodiments, the voice recognition method further comprises: training a neural network model using the samples with the above until the output of the loss function converges, comprising: acquiring a sample candidate lattice according to the voice signal of the current sample sentence; resetting a neural network model to be trained according to the above sample text corresponding to the current sample sentence, wherein the above sample text is the sample text of the previous sentence or multiple sentences of the current sample sentence; re-scoring the sample candidate lattice by the reset neural network model to be trained, obtaining a re-scored sample lattice, and determining an identification text of a current sample sentence; and determining the output of the loss function according to the identification text of the current sample sentence and the sample text of the current sample sentence.
By the method, the information of one or more sentences can be considered in the voice recognition of the current sentence, so that the prior information is more fully utilized, the re-scoring is more accurate, and the accuracy of the voice recognition is improved.
According to an aspect of other embodiments of the present disclosure, there is provided a voice recognition apparatus including: a decoding unit configured to acquire candidate lattice according to the voice signal of the current sentence; the resetting unit is configured to reset a neural network model according to the identified text corresponding to the current sentence, wherein the text is the identification text of the sentence or sentences before the current sentence, and the neural network model is generated based on corpus sample training with the text; the re-scoring unit is configured to re-score the candidate lattice through the reset neural network model to obtain a re-scored lattice; and the recognition unit is configured to determine the recognition text of the current sentence according to the repartitioning lattice.
In some embodiments, the voice recognition apparatus further comprises: and the caching unit is configured to store the identification text of the current sentence into the cache area so as to serve as the text of the subsequent sentence.
In some embodiments, the resetting unit is further configured to obtain the identified context text corresponding to the current sentence from the buffer.
In some embodiments, the decoding unit is configured to decode the speech signal in one pass based on the acoustic model and the language model to obtain candidate lattice.
In some embodiments, the recognition unit is configured to perform acoustic weight and language weight analysis on the scoring lattice to obtain a decoding result of the highest scoring path as the recognition text of the current sentence.
In some embodiments, the neural network model includes an LSTM model or a GRU model.
In some embodiments, where the speech signal is a speech signal of a conversation, the identified text above for the current sentence includes the speech signal of the speaker preceding the current sentence that is closest to the utterance of the current sentence.
In some embodiments, the voice recognition apparatus further comprises: a training unit configured to train the neural network model with the samples with the above until the output of the loss function converges.
According to an aspect of some embodiments of the present disclosure, there is provided a voice recognition apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform any of the speech recognition methods mentioned above based on instructions stored in the memory.
In the voice recognition of the current sentence, the device can consider the information of one or more sentences, so that the prior information is more fully utilized, the re-scoring is more accurate, and the accuracy of the voice recognition is improved.
According to an aspect of some embodiments of the present disclosure, a computer-readable storage medium is presented, on which computer program instructions are stored, which instructions, when executed by a processor, implement the steps of any one of the speech recognition methods mentioned above.
By executing the instructions on the computer-readable storage medium, the information of one or more sentences above can be considered in the voice recognition of the current sentence, so that the prior information is more fully utilized, the re-classification is more accurate, and the accuracy of the voice recognition is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the present disclosure, and together with the description serve to explain the present disclosure. In the drawings:
Fig. 1 is a flow chart of some embodiments of a speech recognition method of the present disclosure.
Fig. 2 is a flow chart of other embodiments of the speech recognition method of the present disclosure.
Fig. 3 is a schematic diagram of some embodiments of a speech recognition device of the present disclosure.
Fig. 4 is a schematic diagram of further embodiments of a speech recognition device of the present disclosure.
Fig. 5 is a schematic diagram of still other embodiments of a speech recognition apparatus of the present disclosure.
Detailed Description
The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.
The voice recognition system firstly utilizes a simple language model to carry out quick decoding to generate a lattice network, and then utilizes a complex language model to re-score the generated lattice network so as to obtain higher recognition accuracy. The speech recognition rate obtained by one-pass decoding is often lower, and the accuracy can be further improved after the complex language model obtained by large corpus training is re-scored. The language model used for the repartition adopts a high-order n-gram language model firstly, and the neural network is replaced by the neural network model by virtue of the superior modeling capability of the neural network.
The inventors found that although neural networks are superior in performance, the related art often performs a re-scoring based on the relationship between the front and rear words, and does not consider the logic between the front and rear sentences.
A flowchart of some embodiments of the speech recognition method of the present disclosure is shown in fig. 1.
In step 101, candidate lattice is obtained from the speech signal of the current sentence.
In some embodiments, the speech signal may be decoded in one pass based on the acoustic model and the language model to obtain candidate lattice. In some embodiments, one pass decoding may be performed in any manner of the related art to obtain the original lattice network, i.e., as a candidate lattice.
In step 102, the neural network model is reset according to the identified text above for the current sentence. The above text may be an identifying text of a sentence or sentences preceding the current sentence, for example, a predetermined number of sentences immediately adjacent to the current sentence, or a preceding sentence. In some embodiments, paragraphs may be time-divided by speech intervals, or distinguished by keywords.
In some embodiments, the order of execution of steps 101, 102 may be arbitrary.
In step 103, the candidate lattice is re-scored by the neural network model after being reset, and the re-scored lattice is obtained. In some embodiments, acoustic weight and language weight analysis may be performed on the scoring lattice to obtain the decoding result of the highest scoring path as the recognition text of the current sentence.
In step 104, the recognition text of the current sentence is determined based on the repartitioning lattice.
By the method, the information of one or more sentences can be considered in the voice recognition of the current sentence, so that the prior information is more fully utilized, the re-scoring is more accurate, and the accuracy of the voice recognition is improved.
In some embodiments, where the speech signal is a speech signal of a conversation, the identified text above for the current sentence includes the speech signal of the speaker preceding the current sentence that is closest to the utterance of the current sentence. In some embodiments, the speaker may be determined to be changing based on the timbre.
By the method, question and answer logic in the communication process can be fully utilized, and the accuracy of voice recognition is further improved.
A flowchart of further embodiments of the speech recognition method of the present disclosure is shown in fig. 2.
In step 201, a speech signal is decoded in one pass based on the acoustic model and the low-order language model to obtain candidate lattice.
In step 202, the identified text corresponding to the current sentence is obtained from the buffer. In some embodiments, the corresponding text may be retrieved in the buffer according to a predetermined policy, which may include determining the recognition text of the last speaker's proximity utterance, or the recognition text of the last sentence, last paragraph.
In step 203, the neural network model is reset according to the above text obtained from the buffer.
In step 204, the candidate lattice is re-scored by the re-set neural network model, and the re-scored lattice is obtained. In some embodiments, the neural network model includes an LSTM model or a GRU model.
In step 205, an acoustic weight and a language weight analysis are performed on the scoring lattice to obtain a decoding result of the path with the highest score, which is used as the recognition text of the current sentence.
In step 206, the recognized text of the current sentence is stored in the buffer as the text of the context of the subsequent sentence.
By the method, the recognized text can be cached and managed in time to serve as a basis for recognizing subsequent sentences; and resetting the neural network model in time, analyzing and estimating the current sentence by utilizing the above information, and improving the prediction accuracy of the language model.
In some embodiments, training of the neural network model is required prior to speech recognition by any of the methods above. The training corpus sample needs to be provided with the above. In some embodiments, training may be performed by acquiring training text with the above according to the corresponding application scenario, and the neural network training ends when the result of the loss function converges to a stable state (e.g., the change in output is less than a predetermined value). In the test stage, a sample candidate lattice can be obtained according to the voice signal of the current sample sentence, and the neural network model is reset through the sample text corresponding to the current sample sentence. In some embodiments, the above sample text is sample text of one or more sentences preceding the current sample sentence. And (3) scoring the sample candidate lattice again through the reset neural network model to be trained, and determining the optimal recognition text.
By the method, the neural network model can be trained based on the corpus sample with the above, so that the generated neural network model has the capability of performing the re-scoring by utilizing the logics between the front sentence and the rear sentence, and the accuracy of voice recognition is further improved.
Found after testing with the voice test dataset. By the method in the embodiments of the present disclosure, PPL (Perplexity, confusion) of the single-layer LSTM neuro-language model is reduced from 43.2 to 40.05; meanwhile, the accuracy of voice recognition is absolutely improved by 0.7% by the Lattice scoring, and the improvement effect is obvious.
A schematic diagram of some embodiments of the speech recognition apparatus of the present disclosure is shown in fig. 3.
The decoding unit 301 can acquire a candidate lattice from the speech signal of the current sentence. In some embodiments, the speech signal may be decoded in one pass based on the acoustic model and the language model to obtain candidate lattice.
The resetting unit 302 can reset the neural network model according to the recognized text above corresponding to the current sentence. The above text may be an identifying text of a sentence or sentences preceding the current sentence, for example, a predetermined number of sentences immediately adjacent to the current sentence, or a preceding sentence. In some embodiments, paragraphs may be time-divided by speech intervals, or distinguished by keywords.
The re-scoring unit 303 can re-score the candidate lattice through the neural network model after being reset, and obtain the re-scored lattice. In some embodiments, acoustic weight and language weight analysis may be performed on the scoring lattice to obtain the decoding result of the highest scoring path as the recognition text of the current sentence.
The recognition unit 304 can determine the recognition text of the current sentence from the repartitioning lattice.
In the voice recognition of the current sentence, the device can consider the information of one or more sentences, so that the prior information is more fully utilized, the re-scoring is more accurate, and the accuracy of the voice recognition is improved.
In some embodiments, as shown in fig. 3, the speech recognition apparatus may further include a buffer unit 305 capable of storing the recognition text of the current sentence in the buffer as the context text of the subsequent sentence. The resetting unit 302 can obtain the identified text corresponding to the current sentence from the buffer, and reset the neural network model according to the obtained text. In some embodiments, the corresponding text may be retrieved in the buffer according to a predetermined policy, which may include determining the recognition text of the last speaker's proximity utterance, or the recognition text of the last sentence, last paragraph.
The device can buffer and manage the recognized text in time to serve as a basis for recognizing subsequent sentences; and resetting the neural network model in time, analyzing and estimating the current sentence by utilizing the above information, and improving the prediction accuracy of the language model.
In some embodiments, as shown in fig. 3, the speech recognition apparatus may further include a training unit 306, which can train the neural network model until the output of the loss function converges, to generate the re-scoring unit 303. The corpus sample on which training is based needs to be provided with the above. In some embodiments, training may be performed based on the initial speech recognition apparatus as shown in fig. 3, where the training unit 306 inputs the corpus sample into the decoding unit 301, and obtains a sample candidate lattice according to the speech signal of the current sample sentence; the resetting unit resets the neural network model to be trained through the sample text corresponding to the current sample sentence, the re-scoring unit re-scores the sample candidate lattice through the reset neural network model to be trained, the re-scoring sample lattice is obtained, and the recognition unit determines the recognition text of the current sample sentence; the training unit 306 determines the output of the loss function according to the recognition text of the current sample sentence and the sample text of the current sample sentence, and if the training unit 306 determines that the change of the output is smaller than the predetermined value, the output is determined to be converged, and the training of the neural network model is completed.
The device can train the neural network model based on the corpus sample with the above, so that the generated neural network model has the capability of performing the re-scoring by utilizing the logics between the front sentence and the rear sentence, and the accuracy of voice recognition is further improved.
A schematic structural diagram of one embodiment of a speech recognition device of the present disclosure is shown in fig. 4. The speech recognition device comprises a memory 401 and a processor 402. Wherein: memory 401 may be a magnetic disk, flash memory, or any other non-volatile storage medium. The memory is used to store instructions in the corresponding embodiments of the speech recognition method above. Processor 402 is coupled to memory 401 and may be implemented as one or more integrated circuits, such as a microprocessor or microcontroller. The processor 402 is configured to execute instructions stored in the memory, so that priori information can be more fully utilized, re-classification is more accurate, and accuracy of speech recognition is improved.
In one embodiment, as also shown in FIG. 5, the speech recognition device 500 includes a memory 501 and a processor 502. The processor 502 is coupled to the memory 501 via a BUS 503. The speech recognition device 500 may also be connected to an external storage device 505 via a storage interface 504 for invoking external data, and may also be connected to a network or another computer system (not shown) via a network interface 506. And will not be described in detail herein.
In the embodiment, the data instruction is stored by the memory, and then the instruction is processed by the processor, so that the prior information can be more fully utilized, the re-scoring is more accurate, and the accuracy of voice recognition is improved.
In another embodiment, a computer readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the steps of the method in the corresponding embodiments of the speech recognition method. It will be apparent to those skilled in the art that embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Thus far, the present disclosure has been described in detail. In order to avoid obscuring the concepts of the present disclosure, some details known in the art are not described. How to implement the solutions disclosed herein will be fully apparent to those skilled in the art from the above description.
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Finally, it should be noted that: the above embodiments are merely for illustrating the technical solution of the present disclosure and are not limiting thereof; although the present disclosure has been described in detail with reference to preferred embodiments, those of ordinary skill in the art will appreciate that: modifications may be made to the specific embodiments of the disclosure or equivalents may be substituted for part of the technical features; without departing from the spirit of the technical solutions of the present disclosure, it should be covered in the scope of the technical solutions claimed in the present disclosure.
Claims (12)
1. A method of speech recognition, comprising:
Acquiring candidate lattice according to the voice signal of the current sentence;
resetting a neural network model according to an upper text corresponding to a current sentence, wherein the upper text is an identification text of one or more sentences before the current sentence, the neural network model is generated based on corpus sample training with the upper text, and the upper text corresponding to the current sentence comprises an identification text of a speech signal of a speaker closest to the current sentence before the current sentence when the speech signal is a speech signal of a dialogue;
Re-scoring the candidate lattice by the reset neural network model to obtain re-scored lattice;
And determining the recognition text of the current sentence according to the re-scoring lattice.
2. The method of claim 1, further comprising:
The recognition text of the current sentence is stored in the buffer for use as the context text for the subsequent sentence.
3. The method of claim 2, further comprising:
And acquiring the text corresponding to the current sentence from the buffer area.
4. The method of claim 1, wherein the obtaining a candidate lattice from the speech signal of the current sentence comprises:
and decoding the voice signal for one time based on an acoustic model and a language model to obtain the candidate lattice.
5. The method of claim 1, wherein the determining the recognition text of the current sentence from the repartitioning lattice comprises:
and performing acoustic weight and language weight analysis on the re-scoring lattice to obtain a decoding result of the path with the highest score, and taking the decoding result as an identification text of the current sentence.
6. The method of claim 1, wherein the neural network model comprises an LSTM model or a GRU model.
7. The method of any one of claims 1-6, further comprising:
training the neural network model with the samples with the above until the output of the loss function converges, comprising:
acquiring a sample candidate lattice according to the voice signal of the current sample sentence;
Resetting a neural network model to be trained according to the above sample text corresponding to the current sample sentence, wherein the upper Wen Yangben text is the sample text of the previous sentence or multiple sentences of the current sample sentence;
Re-scoring the sample candidate lattice through the reset neural network model to be trained, obtaining a re-scored sample lattice, and determining an identification text of the current sample sentence;
And determining the output of the loss function according to the identification text of the current sample sentence and the sample text of the current sample sentence.
8. A speech recognition apparatus comprising:
the decoding unit is configured to acquire candidate lattice according to the voice signal of the current sentence;
A resetting unit configured to reset a neural network model according to an upper text corresponding to a current sentence, wherein the upper text is an identification text of one or more sentences preceding the current sentence, the neural network model is generated based on corpus sample training with the upper text, and in the case that the speech signal is a speech signal of a dialogue, the upper text corresponding to the current sentence includes an identification text of a speech signal of a speaker preceding the current sentence, which is closest to the speech of the current sentence;
The re-scoring unit is configured to re-score the candidate lattice by the reset neural network model to obtain re-scored lattice;
And the recognition unit is configured to determine the recognition text of the current sentence according to the repartitioning lattice.
9. The apparatus of claim 8, further comprising:
And the caching unit is configured to store the identification text of the current sentence into the cache area so as to serve as the text of the subsequent sentence.
10. The apparatus of claim 8 or 9, further comprising:
A training unit configured to train the neural network model with the samples with the above until the output of the loss function converges.
11. A speech recognition apparatus comprising:
A memory; and
A processor coupled to the memory, the processor configured to perform the method of any of claims 1-7 based on instructions stored in the memory.
12. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010597703.2A CN112259084B (en) | 2020-06-28 | 2020-06-28 | Speech recognition method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010597703.2A CN112259084B (en) | 2020-06-28 | 2020-06-28 | Speech recognition method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112259084A CN112259084A (en) | 2021-01-22 |
CN112259084B true CN112259084B (en) | 2024-07-16 |
Family
ID=74224197
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010597703.2A Active CN112259084B (en) | 2020-06-28 | 2020-06-28 | Speech recognition method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112259084B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112885338B (en) * | 2021-01-29 | 2024-05-14 | 深圳前海微众银行股份有限公司 | Speech recognition method, device, computer-readable storage medium, and program product |
CN113838456B (en) * | 2021-09-28 | 2024-05-31 | 中国科学技术大学 | Phoneme extraction method, voice recognition method, device, equipment and storage medium |
CN114171003A (en) * | 2021-12-09 | 2022-03-11 | 云知声智能科技股份有限公司 | Re-scoring method and device for voice recognition system, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108711422A (en) * | 2018-05-14 | 2018-10-26 | 腾讯科技(深圳)有限公司 | Audio recognition method, device, computer readable storage medium and computer equipment |
CN110517693A (en) * | 2019-08-01 | 2019-11-29 | 出门问问(苏州)信息科技有限公司 | Audio recognition method, device, electronic equipment and computer readable storage medium |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9520068B2 (en) * | 2004-09-10 | 2016-12-13 | Jtt Holdings, Inc. | Sentence level analysis in a reading tutor |
KR100755677B1 (en) * | 2005-11-02 | 2007-09-05 | 삼성전자주식회사 | Apparatus and method for dialogue speech recognition using topic detection |
JP4674609B2 (en) * | 2008-02-18 | 2011-04-20 | ソニー株式会社 | Information processing apparatus and method, program, and recording medium |
KR102097710B1 (en) * | 2014-11-20 | 2020-05-27 | 에스케이텔레콤 주식회사 | Apparatus and method for separating of dialogue |
US10304013B2 (en) * | 2016-06-13 | 2019-05-28 | Sap Se | Real time animation generator for voice content representation |
US10861446B2 (en) * | 2018-12-10 | 2020-12-08 | Amazon Technologies, Inc. | Generating input alternatives |
CN111145733B (en) * | 2020-01-03 | 2023-02-28 | 深圳追一科技有限公司 | Speech recognition method, speech recognition device, computer equipment and computer readable storage medium |
-
2020
- 2020-06-28 CN CN202010597703.2A patent/CN112259084B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108711422A (en) * | 2018-05-14 | 2018-10-26 | 腾讯科技(深圳)有限公司 | Audio recognition method, device, computer readable storage medium and computer equipment |
CN110517693A (en) * | 2019-08-01 | 2019-11-29 | 出门问问(苏州)信息科技有限公司 | Audio recognition method, device, electronic equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112259084A (en) | 2021-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106683680B (en) | Speaker recognition method and device, computer equipment and computer readable medium | |
JP5901001B1 (en) | Method and device for acoustic language model training | |
CN110364171B (en) | Voice recognition method, voice recognition system and storage medium | |
CN111429946A (en) | Voice emotion recognition method, device, medium and electronic equipment | |
US8818813B2 (en) | Methods and system for grammar fitness evaluation as speech recognition error predictor | |
CN112259084B (en) | Speech recognition method, device and storage medium | |
CN110033760A (en) | Modeling method, device and the equipment of speech recognition | |
WO2018192186A1 (en) | Speech recognition method and apparatus | |
CN113327575B (en) | Speech synthesis method, device, computer equipment and storage medium | |
CN109036471B (en) | Voice endpoint detection method and device | |
JP6615736B2 (en) | Spoken language identification apparatus, method thereof, and program | |
WO2021103712A1 (en) | Neural network-based voice keyword detection method and device, and system | |
CN110473527B (en) | Method and system for voice recognition | |
WO2019017462A1 (en) | Satisfaction estimation model learning device, satisfaction estimation device, satisfaction estimation model learning method, satisfaction estimation method, and program | |
JP6495792B2 (en) | Speech recognition apparatus, speech recognition method, and program | |
CN113053390B (en) | Text processing method and device based on voice recognition, electronic equipment and medium | |
CN113744727A (en) | Model training method, system, terminal device and storage medium | |
US12136435B2 (en) | Utterance section detection device, utterance section detection method, and program | |
JP2018004947A (en) | Text correction device, text correction method, and program | |
CN113053414A (en) | Pronunciation evaluation method and device | |
Damavandi et al. | NN-grams: Unifying neural network and n-gram language models for speech recognition | |
JP6716513B2 (en) | VOICE SEGMENT DETECTING DEVICE, METHOD THEREOF, AND PROGRAM | |
CN113920987B (en) | Voice recognition method, device, equipment and storage medium | |
CN111883109B (en) | Voice information processing and verification model training method, device, equipment and medium | |
US12112749B2 (en) | Command analysis device, command analysis method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210526 Address after: 100176 room 1004, 10th floor, building 1, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing Applicant after: Beijing Huijun Technology Co.,Ltd. Address before: Room A402, 4th floor, building 2, No.18, Kechuang 11th Street, Daxing District, Beijing, 100176 Applicant before: BEIJING WODONG TIANJUN INFORMATION TECHNOLOGY Co.,Ltd. Applicant before: BEIJING JINGDONG CENTURY TRADING Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |