CN101118745A - Confidence degree quick acquiring method in speech identification system - Google Patents
Confidence degree quick acquiring method in speech identification system Download PDFInfo
- Publication number
- CN101118745A CN101118745A CNA2006100891355A CN200610089135A CN101118745A CN 101118745 A CN101118745 A CN 101118745A CN A2006100891355 A CNA2006100891355 A CN A2006100891355A CN 200610089135 A CN200610089135 A CN 200610089135A CN 101118745 A CN101118745 A CN 101118745A
- Authority
- CN
- China
- Prior art keywords
- voice
- state
- frame
- speech
- acoustic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to an improved algorithm to the confidence level of a voice recognition system, including: the pre-treatment of sub-frames; the pick-up of voice features of every frame voice; the likelihood probability p(xt/sj) of each frame voice in the graphic state is worked out according to state chart, acoustic model and the feature vector of the frame voice; the likelihood probability p(xt/sj) is stored in the light of frame number and state number; the state gets trimmed according to the likelihood probability p (xt/sj); the likelihood probability of an acoustic space and the general posteriori probability are calculated after trimming; the general posteriori probability of each acoustic element is worked out and regarded as the scores for the confidence level. In prior art, the search for acoustic elements is needed to obtain the acoustic element candidates, and then a second search is carried to calculate the confidence level by using a variety of acoustic models. The present invention is a synchronous calculation method which works out the confidence level by using the same acoustic model when a recognizer is in the course of searching frame in-phase beams, therefore, the search is done once, and the operating time of the system and the complexity of calculation are saved.
Description
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a method for quickly solving confidence coefficient of a voice recognition system.
Background
The speech recognition system is used in natural conditions, unlike in ideal environments, where the performance of the speech recognition system is greatly degraded. Moreover, for real spoken language, many non-speech sounds, such as abnormal pauses, coughing sounds and many environmental noises, are mixed in the speech, which makes it difficult for the conventional speech recognition system to achieve the original recognition performance. In addition, if the words spoken by the user are not in the preset domain range of the voice recognition system, recognition errors are easily caused. In summary, for a commercial speech recognition system, the user's desire is to reject as much as possible the wrong speech, and the confidence score evaluation method is a good way to solve these difficulties.
The confidence evaluation method can carry out hypothesis test on the recognition result of the voice recognition system, evaluate the reliability of the recognition result through a threshold value set by tests, and locate errors in the result, thereby improving the recognition rate and the robustness of the recognition system.
At present, a two-pass calculation method is a method which is widely applied when calculating confidence. The input speech is first decoded in one pass by the recognizer, in which process a word graph or sequence of words corresponding to the input speech is obtained. The second pass of the calculation process is performed on the basis of the previously obtained word graph or word sequence, and a confidence score is calculated, as shown in fig. 2. In the two-pass calculation process, the used acoustic models are different, and the acoustic model in the second pass calculation of the confidence coefficient generally uses a full-phoneme model. Because two decoding passes are needed, the confidence coefficient is higher in calculation complexity, longer system time is needed, and the online use of the voice recognition system is not facilitated.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and comprehensively consider the calculation speed and the robustness, thereby providing a method for quickly obtaining the confidence coefficient by only searching once.
In order to achieve the above object, the method for quickly obtaining confidence level in a speech recognition system provided by the present invention comprises the following steps:
1) And inputting the voice to be recognized into the voice recognition system.
2) And preprocessing the input voice, wherein the preprocessing comprises framing processing.
3) And extracting the voice features to obtain the MFCC feature vector of each frame of voice.
4) Traversing all the voice frames, and calculating the likelihood probability p (x) of each state in the state diagram corresponding to the frame voice according to the state diagram, the acoustic model and the MFCC feature vector of the frame voice for each frame voice t /s j ) The negative logarithm is:
wherein x is t In order to input the characteristics of the speech,
S j for the state of its corresponding Markov model, the model is the normal distribution N (μ) j Σ j); n is the dimension of the feature vector;
5) Storing the likelihood probability p (x) obtained in the step 4) according to the frame number and the state number of the current voice t /s j )。
6) Judging whether the current pointer points to a virtual node in the state diagram, if so, entering the step
7) (ii) a If the judgment result is no, pruning is carried out on the current state; the virtual node is a mark of the end of a phoneme in the state diagram;
7) Calculating likelihood probability sum of acoustic space after pruningWherein D is * Is the set of all the states retained in the state diagram after pruning;
9) Respectively calculating generalized posterior probability of each phoneme
Where N is the number of states that make up each HMM. Tau. b [j]、τ e [j]Respectively indicating the initial frame number and the ending frame number of the voice input data in the current state, wherein j is a state number; and taking the generalized posterior probability of the phoneme as the confidence score of the phoneme.
In the above technical solution, the preprocessing the input speech in step 2) includes digitizing, pre-emphasizing, high-frequency boosting, framing, and windowing the input speech.
In the above technical solution, the extracting of the voice feature in step 3) includes: and calculating MFCC cepstrum coefficients, cepstrum weighting and calculating differential cepstrum coefficients.
In the above technical solution, the pruning process in the step 6) adopts a pruning method based on frame synchronization beam search.
The invention has the advantages that only one decoding is needed, in the prior art, after phoneme searching is carried out to obtain phoneme candidates, second searching is carried out for calculating the confidence coefficient, and different acoustic models are used for the two searching.
Drawings
FIG. 1 is a flow diagram of one embodiment of a fast confidence score method of the present invention;
FIG. 2 is a schematic diagram of a confidence two-pass search calculation method of the prior art;
FIG. 3 is a schematic diagram of the state diagram of the present invention;
FIG. 4 is a schematic diagram of a state diagram of the present invention;
FIG. 5 is a schematic diagram of confidence synchronization calculation pruning based on a state diagram according to the present invention;
FIG. 6 is a ROC plot of the performance of the one-pass search method of the present invention versus the two-pass search method of the prior art.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
Examples
As shown in fig. 1, the method for fast obtaining confidence in a speech recognition system provided by the present invention includes the following steps:
a) And inputting the speech to be recognized into the speech recognition system.
b) And (5) voice preprocessing. Mainly performs framing processing. In this example, the pretreatment was carried out by the following flow: 1. digitizing a speech signal at a 16K sampling rate
2. The high frequency boosting by pre-emphasis the pre-emphasis filter is
H(z)=1-αz -1 Wherein α =0.98
3. The data is processed by framing, the frame length is 20ms, and the frames are overlapped for 10ms.
4. And (5) windowing. The window function adopts a common hamming window function, namely:
c) And extracting the voice features. The invention adopts MFCC (mel-frequency cepstral coefficient), a characteristic extraction method, and the specific flow is as follows:
5. calculating the MFCC cepstrum coefficient c (m), wherein m is more than or equal to 1 and less than or equal to N c
Wherein N is c Is the number of cepstral coefficients, N c =14
6. And (4) cepstral weighting. I.e. adjusting the weight of each dimension of the cepstral coefficients
7. First and second order differences of the energy features and the cepstral features are calculated.
The difference cepstrum coefficients are calculated using the regression formula:
where μ is the normalization factor, τ is an integer, 2t +1 is the number of speech frames used to calculate the differential cepstrum coefficients.
8. For each frame, a 39-dimensional MFCC feature vector is generated.
The invention can also adopt LPC feature extraction method, which is the prior art and is not described again.
d) For each frame of speech, likelihood probabilities p (x) corresponding to each state constituting a phoneme Markov model are calculated for the frame of speech based on a state diagram, an acoustic model, and MFCC feature vectors of the frame of speech itself t /s j ) The likelihood probability p (x) t /s j ) Is inputting a speech feature x t Corresponding state s j Acoustic layer of markov modelsAnd (6) scoring.
The method for constructing the state diagram utilized in this step is as follows:
as shown in fig. 3, a word-based search space, i.e. a word network, is first built up according to the content of the task grammar, and the recognizer will search on the word network to find the best path corresponding to the input speech as the recognition result. Before searching, the network of words is expanded into a phoneme network whose minimum unit is a phoneme by means of the information of the dictionary in the recognition system. Each node is transformed from a word to a phoneme and each phoneme is then replaced by a corresponding markov model (HMM) in the acoustic model. Each Malkov model (HMM) is composed of several states, so that the final search space becomes a state diagram, as shown in FIG. 4.
In fig. 4, each node represents one state in a certain HMM. Any path in the state diagram represents a sentence or word candidate in the task grammar. In order to reduce the search space and the space required for storage, the state diagrams are merged, so that the final state diagram is obtained. In this process, each node is subjected to forward combining and backward combining. When forward combining, searching nodes with the same forward path and combining; when backward combining, those nodes with the same backward path are combined.
The method of calculating the likelihood probability for each state is as follows:
in the form of traversing all speech frames, when a frame of data enters the recognizer, the likelihood probability p (x) of each state corresponding to the current frame in the state diagram is calculated first t /s j ) The comparison of the accumulation of the likelihood probability and the state transition probability with the pruning threshold will be used as the basis for pruning. Likelihood probability p (x) t /s j ) Is inputting a speech feature x t Corresponding state s j The acoustic layer score of the markov model of (a), the negative logarithm of the acoustic layer score being:
wherein state s j Is modeled as a normal distribution N (μ) j ,∑ j ) The specific value of which can be obtained from an acoustic model, x t Is the feature vector, mu, of a speech frame j Sum Σ j Are respectively state S j The mean vector and covariance matrix of the model of (2), n is the eigenvector x t Dimension (i.e. mu) of j Sum Σ j Dimension (c) of (a).
The acoustic model employed in the present embodiment is an acoustic model containing 5005 states, 16 gaussian models.
e) Storing the likelihood probability p (x) obtained in step d) according to the frame number and state number of the current voice t /s j )。
f) Judging whether the pointer points to the virtual node, if so, entering the step g); if not, pruning the current state.
In the state diagram used by the recognition system, each phoneme has a dummy node as a marker for ending. A phoneme is identified as long as the search pointer reaches a dummy node.
In the decoding process of the recognizer, the pruning strategy is implemented to improve the decoding speed and reduce the search space. In fig. 5, the solid dots represent the state of remaining after pruning, and the hollow dots represent the state of being pruned. As shown, when a state contributes too little to the appearance of an observation sequence (the observation sequence in this embodiment is a MFCC feature vector), the likelihood probability p (x) of the state for the observation sequence is t /s j ) If the current state is less than the preset threshold value, the state is cut off. In this embodiment, a pruning strategy based on frame synchronization beam search is used in the decoding process. The search strategy employs a conventional viterbi algorithm. In this embodiment, the pruning threshold is set to 200, and the pruning standard is as follows: taking the logarithm value of the probability of the current frame speech for each state,probability pair with current positionThe maximum value of the numerical values is compared with the value obtained after the pruning threshold is cut off, and if the logarithm value of the probability of the current frame speech for each state is smaller than the value, the numerical value is cut off.
g) Calculating likelihood probability sum of acoustic space after pruningWherein D is * Is the set of all states that remain in the state diagram after pruning.
The accumulation of likelihood probabilities for those states that remain after pruning is much larger than the accumulation of likelihood probabilities for those states that are pruned, so they can be used entirely as denominators of the generalized posterior probability of being
h) The generalized posterior probability of each phoneme is calculated.
In speech recognition systems, each phoneme is represented by a Markov model (HMM). The generalized posterior probability of each phoneme is defined as the arithmetic mean of the posterior probabilities of each state corresponding to the phoneme:
where N is the number of states that make up each HMM. Tau is b 、τ e Respectively mean that the voice input data is
The starting frame number and ending frame number of the previous state, j is the state number. p(s) j |X t ) I.e. the generalized posterior probability obtained in step g).
i) The generalized posterior probability of a phoneme can be used as the confidence score of the phoneme.
The state-graph based confidence likelihood synchronization estimation algorithm of the present invention was tested using a database of telephone names in chinese for testing of actual telephone speech recognition systems. The test task was to evaluate the recognition rate of a recognition system containing a 1278 personal name dictionary. The test speech was normal speech from 6 speakers including 3 men and 3 women. In the test set, 180 out-of-set words are included. Each task grammar includes 213 person names. The confidence score is used to reject those out-of-set words in the test set. Our goal is to increase rejection, i.e., to reduce the false accepted rate of those words that are out of the set.
Two different algorithms are used to calculate confidence. One is defined as a two-Pass (2 Pass) search algorithm, and the other is defined as a one-Pass (1 Pass) algorithm, namely a synchronization estimation algorithm, based on the state diagram confidence synchronization calculation method of the invention, as shown in fig. 2. In a two-pass search algorithm, two different acoustic models are used. The first pass uses an acoustic model containing 5005 states, 16 gaussian models, while the acoustic model used to calculate confidence is a smaller model covering only all phonemes, containing 1005 states and 8 gaussian models. In one search algorithm pass, an acoustic model is used, which contains 5005 states and 16 gaussian models.
The performance curves ROC (receiver operating characteristics) of the two algorithms are shown in fig. 6. It can be seen from the figure that the performance of the one-pass search algorithm used in the present invention is better than the two-pass search algorithm. The equal error rate of the search algorithm adopted by the invention is 16.1%, and the equal error rate of the two-pass search algorithm is 21%. Because only one acoustic model is used in one-pass search algorithm and the model used in the calculation of the confidence coefficient is fine, although the calculation of the acoustic space after pruning is an approximate value, the performance is still not reduced.
In addition, the two methods have different computational complexity, and the speed of the one-time search algorithm is improved by 16% compared with the two-time search algorithm.
Claims (4)
1. A method for fast solving confidence in a speech recognition system is characterized by comprising the following steps:
1) Inputting the voice to be recognized into a voice recognition system;
2) Preprocessing input voice, wherein the preprocessing comprises framing processing;
3) Extracting MFCC feature vectors of each frame of voice;
4) Traversing all the voice frames, and for each frame of voice, calculating the likelihood probability p (x) of each state in the state diagram corresponding to the frame of voice according to the state diagram and the acoustic model in the voice recognition system and the MFCC feature vector of the frame of voice t /s j ) Negative logarithm of the likelihood probability
Wherein x is t Is the feature vector, mu, of a speech frame j Sum Σ j Are respectively the state s j N is the dimension of the feature vector;
5) Storing the likelihood probability p (x) obtained in the step 4) according to the frame number and the state number of the current voice t /s j );
6) Judging whether the current pointer points to a virtual node in the state diagram, if so, entering the step
7) (ii) a If the judgment result is no, pruning is carried out on the current state; the virtual node is a mark for ending a phoneme in the state diagram;
7) Calculating likelihood probability sum of acoustic space after pruningWherein D is * Is the set of all the states retained in the state diagram after pruning;
9) Computing generalized posterior probabilities for each phoneme
Taking the generalized posterior probability of the phoneme as the confidence score of the phoneme;
where N is the number of states that make up each Markov model. Tau. b [j]、τ e [j]Respectively indicating the starting frame number and the ending frame number of the voice input data in the current state, wherein j is a state number.
2. The method of claim 1 wherein preprocessing the input speech in step 2) includes digitizing, pre-emphasizing, high-frequency boosting, framing and windowing the input speech.
3. The method of fast confidence level calculation in a speech recognition system according to claim 1, wherein said extracting speech features in step 3) comprises: and calculating MFCC cepstrum coefficients, cepstrum weighting and calculating differential cepstrum coefficients.
4. The method for fast confidence level estimation in a speech recognition system according to claim 1, wherein the pruning in step 6) is performed by a pruning method based on frame-synchronous beam search.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2006100891355A CN101118745B (en) | 2006-08-04 | 2006-08-04 | Confidence degree quick acquiring method in speech identification system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2006100891355A CN101118745B (en) | 2006-08-04 | 2006-08-04 | Confidence degree quick acquiring method in speech identification system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101118745A true CN101118745A (en) | 2008-02-06 |
CN101118745B CN101118745B (en) | 2011-01-19 |
Family
ID=39054824
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2006100891355A Expired - Fee Related CN101118745B (en) | 2006-08-04 | 2006-08-04 | Confidence degree quick acquiring method in speech identification system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101118745B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101894549A (en) * | 2010-06-24 | 2010-11-24 | 中国科学院声学研究所 | Method for fast calculating confidence level in speech recognition application field |
CN101980336A (en) * | 2010-10-18 | 2011-02-23 | 福州星网视易信息系统有限公司 | Hidden Markov model-based vehicle sound identification method |
CN101393739B (en) * | 2008-10-31 | 2011-04-27 | 清华大学 | Computation method for characteristic value of Chinese speech recognition credibility |
CN101650886B (en) * | 2008-12-26 | 2011-05-18 | 中国科学院声学研究所 | Method for automatically detecting reading errors of language learners |
CN101645271B (en) * | 2008-12-23 | 2011-12-07 | 中国科学院声学研究所 | Rapid confidence-calculation method in pronunciation quality evaluation system |
CN102047322B (en) * | 2008-06-06 | 2013-02-06 | 株式会社雷特龙 | Audio recognition device, audio recognition method, and electronic device |
CN103021408A (en) * | 2012-12-04 | 2013-04-03 | 中国科学院自动化研究所 | Method and device for speech recognition, optimizing and decoding assisted by stable pronunciation section |
CN102142253B (en) * | 2010-01-29 | 2013-05-29 | 富士通株式会社 | Voice emotion identification equipment and method |
CN103811008A (en) * | 2012-11-08 | 2014-05-21 | 中国移动通信集团上海有限公司 | Audio frequency content identification method and device |
CN103810997A (en) * | 2012-11-14 | 2014-05-21 | 北京百度网讯科技有限公司 | Method and device for determining confidence of voice recognition result |
CN106297769A (en) * | 2015-05-27 | 2017-01-04 | 国家计算机网络与信息安全管理中心 | A kind of distinctive feature extracting method being applied to languages identification |
CN106611048A (en) * | 2016-12-20 | 2017-05-03 | 李坤 | Language learning system with online voice assessment and voice interaction functions |
CN107004408A (en) * | 2014-12-09 | 2017-08-01 | 微软技术许可有限责任公司 | For determining the method and system of the user view in spoken dialog based at least a portion of semantic knowledge figure is converted into Probability State figure |
CN109872715A (en) * | 2019-03-01 | 2019-06-11 | 深圳市伟文无线通讯技术有限公司 | A kind of voice interactive method and device |
CN110447068A (en) * | 2017-03-24 | 2019-11-12 | 三菱电机株式会社 | Speech recognition equipment and audio recognition method |
CN110634469A (en) * | 2019-09-27 | 2019-12-31 | 腾讯科技(深圳)有限公司 | Speech signal processing method and device based on artificial intelligence and storage medium |
CN112151020A (en) * | 2019-06-28 | 2020-12-29 | 北京声智科技有限公司 | Voice recognition method and device, electronic equipment and storage medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5566272A (en) * | 1993-10-27 | 1996-10-15 | Lucent Technologies Inc. | Automatic speech recognition (ASR) processing using confidence measures |
US5737489A (en) * | 1995-09-15 | 1998-04-07 | Lucent Technologies Inc. | Discriminative utterance verification for connected digits recognition |
US5794189A (en) * | 1995-11-13 | 1998-08-11 | Dragon Systems, Inc. | Continuous speech recognition |
CN1223985C (en) * | 2002-10-17 | 2005-10-19 | 中国科学院声学研究所 | Phonetic recognition confidence evaluating method, system and dictation device therewith |
CN100514446C (en) * | 2004-09-16 | 2009-07-15 | 北京中科信利技术有限公司 | Pronunciation evaluating method based on voice identification and voice analysis |
GB0426347D0 (en) * | 2004-12-01 | 2005-01-05 | Ibm | Methods, apparatus and computer programs for automatic speech recognition |
-
2006
- 2006-08-04 CN CN2006100891355A patent/CN101118745B/en not_active Expired - Fee Related
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102047322B (en) * | 2008-06-06 | 2013-02-06 | 株式会社雷特龙 | Audio recognition device, audio recognition method, and electronic device |
CN101393739B (en) * | 2008-10-31 | 2011-04-27 | 清华大学 | Computation method for characteristic value of Chinese speech recognition credibility |
CN101645271B (en) * | 2008-12-23 | 2011-12-07 | 中国科学院声学研究所 | Rapid confidence-calculation method in pronunciation quality evaluation system |
CN101650886B (en) * | 2008-12-26 | 2011-05-18 | 中国科学院声学研究所 | Method for automatically detecting reading errors of language learners |
CN102142253B (en) * | 2010-01-29 | 2013-05-29 | 富士通株式会社 | Voice emotion identification equipment and method |
CN101894549A (en) * | 2010-06-24 | 2010-11-24 | 中国科学院声学研究所 | Method for fast calculating confidence level in speech recognition application field |
CN101980336A (en) * | 2010-10-18 | 2011-02-23 | 福州星网视易信息系统有限公司 | Hidden Markov model-based vehicle sound identification method |
CN103811008A (en) * | 2012-11-08 | 2014-05-21 | 中国移动通信集团上海有限公司 | Audio frequency content identification method and device |
CN103810997A (en) * | 2012-11-14 | 2014-05-21 | 北京百度网讯科技有限公司 | Method and device for determining confidence of voice recognition result |
CN103810997B (en) * | 2012-11-14 | 2018-04-03 | 北京百度网讯科技有限公司 | A kind of method and apparatus for determining voice identification result confidence level |
CN103021408A (en) * | 2012-12-04 | 2013-04-03 | 中国科学院自动化研究所 | Method and device for speech recognition, optimizing and decoding assisted by stable pronunciation section |
CN107004408B (en) * | 2014-12-09 | 2020-07-17 | 微软技术许可有限责任公司 | Method and system for determining user intent in spoken dialog based on converting at least a portion of a semantic knowledge graph to a probabilistic state graph |
CN107004408A (en) * | 2014-12-09 | 2017-08-01 | 微软技术许可有限责任公司 | For determining the method and system of the user view in spoken dialog based at least a portion of semantic knowledge figure is converted into Probability State figure |
CN106297769A (en) * | 2015-05-27 | 2017-01-04 | 国家计算机网络与信息安全管理中心 | A kind of distinctive feature extracting method being applied to languages identification |
CN106297769B (en) * | 2015-05-27 | 2019-07-09 | 国家计算机网络与信息安全管理中心 | A kind of distinctive feature extracting method applied to languages identification |
CN106611048A (en) * | 2016-12-20 | 2017-05-03 | 李坤 | Language learning system with online voice assessment and voice interaction functions |
CN110447068A (en) * | 2017-03-24 | 2019-11-12 | 三菱电机株式会社 | Speech recognition equipment and audio recognition method |
CN109872715A (en) * | 2019-03-01 | 2019-06-11 | 深圳市伟文无线通讯技术有限公司 | A kind of voice interactive method and device |
CN112151020A (en) * | 2019-06-28 | 2020-12-29 | 北京声智科技有限公司 | Voice recognition method and device, electronic equipment and storage medium |
CN112151020B (en) * | 2019-06-28 | 2024-06-18 | 北京声智科技有限公司 | Speech recognition method, device, electronic equipment and storage medium |
CN110634469A (en) * | 2019-09-27 | 2019-12-31 | 腾讯科技(深圳)有限公司 | Speech signal processing method and device based on artificial intelligence and storage medium |
CN110634469B (en) * | 2019-09-27 | 2022-03-11 | 腾讯科技(深圳)有限公司 | Speech signal processing method and device based on artificial intelligence and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN101118745B (en) | 2011-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101118745A (en) | Confidence degree quick acquiring method in speech identification system | |
CN103971678B (en) | Keyword spotting method and apparatus | |
US6125345A (en) | Method and apparatus for discriminative utterance verification using multiple confidence measures | |
EP0880126B1 (en) | Speech-silence discrimination based on unsupervised HMM adaptation | |
Lin et al. | OOV detection by joint word/phone lattice alignment | |
CN101645269A (en) | Language recognition system and method | |
Akbacak et al. | Environmental sniffing: noise knowledge estimation for robust speech systems | |
CN112233651B (en) | Dialect type determining method, device, equipment and storage medium | |
Mengistu | Automatic text independent amharic language speaker recognition in noisy environment using hybrid approaches of LPCC, MFCC and GFCC | |
CN102237082B (en) | Self-adaption method of speech recognition system | |
Matsuda et al. | ATR parallel decoding based speech recognition system robust to noise and speaking styles | |
Benıtez et al. | Different confidence measures for word verification in speech recognition | |
Nakamura et al. | Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition | |
KR100586045B1 (en) | Recursive Speaker Adaptation Automation Speech Recognition System and Method using EigenVoice Speaker Adaptation | |
KR20050036301A (en) | Apparatus and method for distinction using pitch and mfcc | |
Khosravani et al. | The Intelligent Voice System for the IberSPEECH-RTVE 2018 Speaker Diarization Challenge. | |
Nazreen et al. | A joint enhancement-decoding formulation for noise robust phoneme recognition | |
Casar et al. | Analysis of HMM temporal evolution for automatic speech recognition and utterance verification. | |
Remes et al. | Missing feature reconstruction and acoustic model adaptation combined for large vocabulary continuous speech recognition | |
Yamada et al. | Improvement of rejection performance of keyword spotting using anti-keywords derived from large vocabulary considering acoustical similarity to keywords. | |
Kosaka et al. | Speaker adaptation based on system combination using speaker-class models. | |
Zacharie et al. | Keyword spotting on word lattices | |
JP3105708B2 (en) | Voice recognition device | |
Scanzio et al. | Word confidence using duration models. | |
Moon et al. | Out-of-vocabulary word rejection algorithm in Korean variable vocabulary word recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20110119 |