[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

JP2004219747A - Device and method for speech recognition, and program - Google Patents

Device and method for speech recognition, and program Download PDF

Info

Publication number
JP2004219747A
JP2004219747A JP2003007378A JP2003007378A JP2004219747A JP 2004219747 A JP2004219747 A JP 2004219747A JP 2003007378 A JP2003007378 A JP 2003007378A JP 2003007378 A JP2003007378 A JP 2003007378A JP 2004219747 A JP2004219747 A JP 2004219747A
Authority
JP
Japan
Prior art keywords
utterance
recognition result
recognition
result
reliability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2003007378A
Other languages
Japanese (ja)
Other versions
JP3695448B2 (en
Inventor
Seiichi Miki
清一 三木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP2003007378A priority Critical patent/JP3695448B2/en
Publication of JP2004219747A publication Critical patent/JP2004219747A/en
Application granted granted Critical
Publication of JP3695448B2 publication Critical patent/JP3695448B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To improve speech recognition performance by obtaining a correct answer even when neither a speech recognition result obtained last or before that, nor a currently obtained speech recognition result is the correct answer, by improving the speech recognition performance by referring to both the speech recognition results. <P>SOLUTION: Provided are: a recognition part 100 which recognizes a speech each time the speech is spoken and calculates a likelihood showing how accurate each recognition result candidate is; a recognition result storage part 110 which stores recognition result candidates by spoken speeches and their likelihoods; a reliability calculation part 120 which calculates reliability as a normalized score according to the likelihoods of the recognition result candidates by the spoken speeches; a result selection part 130 which selects a recognition result out of the respective recognition result candidates of the respective spoken speeches stored according to the reliability; a misrecognition result storage part 150 which stores misrecognition results decided as misrecognition as to recognition results of up to the last spoken speech; and a result filtering part 160 which selects a recognition result again by removing the stored misrecognition results from selected recognition results. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【0001】
【発明の属する技術分野】
本発明は音声認識装置、音声認識方法、及びプログラムに関し、特にユーザが発声を誤認識され、同内容の発声が再度入力された場合に、誤認識した発声の認識結果と再度入力された発声の認識結果とを用いた音声認識技術に関する。
【0002】
【従来の技術】
従来の音声認識装置の一例を示す特開平10−133684号公報によれば、この音声認識装置は以前に認識された認識結果候補と、新たに認識された認識結果候補を使用してそれぞれの発声に一致する確率が最も高い認識結果(ただし、誤認識結果を除く)を選択することが示されている。具体的には、最も高い確率を計算するために両方の認識結果候補から共通の認識結果候補を検出し、それらの確率を乗算した結果の合成確率を用いて選択する。
【0003】
具体例として図7に話し手が「make」と2回発声した場合にそれぞれを認識した認識結果候補とそれに対応する確からしさが示されている。図7を参照すれば、2回目の発声に対応する新たに認識された結果でのみ判断すると、「Fake」の確率=0.4が最も高いので「Fake」が誤って選択される。1回目で「Fake」が誤認識され、その結果を反映して「Fake」を除去したとしても「Fake」の次に高い確率=0.3の「Mace」が誤って選択される。
【0004】
ところが、誤認識の「Fake」を除去し、さらに前回と今回の2回の発声の各認識結果候補の合成確率で見ると、「Make」が0.06(=0.1x0.3)、「Mace」が0.03(=0.3x0.2)、「Bake」が0.01(=0.1x0.1)となるため、最も高い剛性確率の「Make」を正しく選択できる。
【0005】
また、特開2000−250585号公報では、データベースの検索で検索の対象となる音声検索キーの確定を行う際に、入力された音声(例:市町村名)の音声検索キー候補の尤度と、音声検索キーの属性項目の関連情報(例:都道府県名)の質問に対する応答を音声認識した関連情報の尤度とをそれぞれ正規化して乗算し認識尤度とすることが示されている。この手法では、尤度を正規化しているが、誤認識した前回の結果との調整をするためのものではなく、また両尤度を乗算した結果を認識尤度としている。
【0006】
【特許文献1】
特開平10−133684号公報 (第5頁)
【特許文献2】
特開2000−250585号公報
【0007】
【発明が解決しようとする課題】
上述した前回の結果を参照する音声認識装置では、正解が前回に認識された結果と今回認識された結果の両方に出現しないと正解が得られないという問題がある。すなわち、正解が一方にしか出現しなかった場合はその合成確率は乗算の結果”0”になりその結果は最小となり選択されない。この問題は入力された音声と関連情報の入力との結果を乗算する音声認識装置においても解決されていない。
【0008】
本発明の目的は、前回得られた音声認識結果と今回得られた音声認識結果の両方を参照して音声認識性能を向上させつつ、正解が両方に出現しない場合でも正解を得られるようして音声認識性能を向上させた音声認識装置、音声認識方法、及びプログラムを提供することにある。
【0009】
【課題を解決するための手段】
本発明の第1の音声認識装置は、同内容の発声が複数入力される場合に、各発声間で比較可能となるように認識結果候補の確からしさを示す確率値を正規化した値を用いて各発声の各認識結果候補の中から認識結果を選択することを特徴とする。
【0010】
本発明の第2の音声認識装置は、同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出し、各発声の認識結果候補の中から信頼度に基づいて認識結果を選択することを特徴とする。
【0011】
本発明の第3の音声認識装置は、同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出し、前回の発声までの間に誤認識とされた認識結果候補を除去した認識結果候補の中から信頼度に基づいて認識結果を選択することを特徴とする。
【0012】
本発明の第4の音声認識装置は、発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算する認識部と、発声毎の認識結果候補とその尤度とを蓄積する認識結果蓄積部と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する信頼度計算部と、信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択する結果選択部と、を有することを特徴とする。
【0013】
本発明の第5の音声認識装置は、発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算する認識部と、発声毎の認識結果候補とその尤度とを蓄積する認識結果蓄積部と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する信頼度計算部と、信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択する結果選択部と、前回までの発声に対する認識結果に対して誤認識と判定された誤認識結果を蓄積する誤認識情報蓄積部と、結果選択部で選択された認識結果から誤認識情報蓄積部に蓄積された誤認識結果を除去して認識結果を再選択する結果フィルタリング部と、を有することを特徴とする。
【0014】
本発明の第6の音声認識装置は、発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算する認識部と、発声毎の認識結果候補とその尤度とを蓄積する認識結果蓄積部と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する信頼度計算部と、前回までの発声に対する認識結果に対して誤認識と判定された誤認識結果を蓄積する誤認識情報蓄積部と、発声毎の認識結果候補から誤認識結果を除去する結果フィルタリング部と、誤認識結果を除去した後の各発声の各認識結果候補の中から認識結果を選択する結果選択部と、を有することを特徴とする。
【0015】
本発明の第7の音声認識装置は、同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出し、さらに全ての発声の認識結果候補から同じ認識結果候補毎に相加平均を計算した合成信頼度を求め、前回の発声までの間に誤認識とされた認識結果候補を除去した認識結果候補の中から合成信頼度に基づいて認識結果を選択することを特徴とする。
【0016】
本発明の第8の音声認識装置は、発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算する認識部と、発声毎の認識結果候補とその尤度とを蓄積する認識結果蓄積部と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する信頼度計算部と、全ての発声の認識結果候補から同じ認識結果候補毎に相加平均を計算した合成信頼度を求める相加平均計算部と、合成信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択する結果選択部と、前回までの発声に対する認識結果に対して誤認識と判定された誤認識結果を蓄積する誤認識情報蓄積部と、結果選択部で選択された認識結果から誤認識情報蓄積部に蓄積された誤認識結果を除去して再選択する結果フィルタリング部と、を有することを特徴とする。
【0017】
本発明の第1の音声認識方法は、同内容の発声が複数入力される場合に、各発声間で比較可能となるように認識結果候補の確からしさを示す確率値を正規化した値を用いて各発声の各認識結果候補の中から認識結果を選択することを特徴とする。
【0018】
本発明の第2の音声認識方法は、同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出し、各発声の認識結果候補の中から信頼度に基づいて認識結果を選択することを特徴とする。
【0019】
本発明の第3の音声認識方法は、同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出し、前回の発声までの間に誤認識とされた認識結果候補を除去した認識結果候補の中から信頼度に基づいて認識結果を選択することを特徴とする。
【0020】
本発明の第4の音声認識方法は、発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積し、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算し、計算された信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択することを特徴とする。
【0021】
本発明の第5の音声認識方法は、発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積し、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算し、計算された信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択し、前回までの発声に対する認識結果に対して誤認識と判定され蓄積された誤認識結果を選択された認識結果から除去して認識結果を再選択することを特徴とする。
【0022】
本発明の第6の音声認識方法は、発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積し、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算し、前回までの発声に対する認識結果に対して誤認識と判定され蓄積された誤認識結果を蓄積された認識結果候補から除去し、誤認識結果を除去した後の各発声の各認識結果候補の中から認識結果を選択することを特徴とする。
【0023】
本発明の第7の音声認識方法は、同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出し、さらに全ての発声の認識結果候補から同じ認識結果候補毎に相加平均を計算した合成信頼度を求め、前回の発声までの間に誤認識と判定された認識結果候補を除去した認識結果候補の中から合成信頼度に基づいて認識結果を選択することを特徴とする。
【0024】
本発明の第8の音声認識方法は、発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積し、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算し、全ての発声の認識結果候補から同じ認識結果候補毎に相加平均を計算した合成信頼度を求め、合成信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択し、前回までの発声に対する認識結果に対して誤認識と判定され蓄積された誤認識結果を選択された認識結果から除去して再選択することを特徴とする。
【0025】
本発明の第1のプログラムは、同内容の発声が複数入力される場合に、各発声間で比較可能となるように認識結果候補の確からしさを示す確率値を正規化した値を用いて各発声の各認識結果候補の中から認識結果を選択する手順をコンピュータに実行させる。
【0026】
本発明の第2のプログラムは、同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出する手順と、各発声の認識結果候補の中から信頼度に基づいて認識結果を選択する手順とをコンピュータに実行させる。
【0027】
本発明の第3のプログラムは、同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出する手順と、前回の発声までの間に誤認識とされた認識結果候補を除去した認識結果候補の中から信頼度に基づいて認識結果を選択する手順とをコンピュータに実行させる。
【0028】
本発明の第4のプログラムは、発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積する手順と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する手順と、計算された信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択する手順とをコンピュータに実行させる。
【0029】
本発明の第5のプログラムは、発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積する手順と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する手順と、計算された信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択する手順と、前回までの発声に対する認識結果に対して誤認識と判定され蓄積された誤認識結果を選択された認識結果から除去して認識結果を再選択する手順とをコンピュータに実行させる。
【0030】
本発明の第6のプログラムは、発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積する手順と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する手順と、前回までの発声に対する認識結果に対して誤認識と判定され蓄積された誤認識結果を蓄積された認識結果候補から除去する手順と、誤認識結果を除去した後の各発声の各認識結果候補の中から認識結果を選択する手順とをコンピュータに実行させる。
【0031】
本発明の第7のプログラムは、同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出する手順と、さらに全ての発声の認識結果候補から同じ認識結果候補毎に相加平均を計算した合成信頼度を求める手順と、前回の発声までの間に誤認識と判定された認識結果候補を除去した認識結果候補の中から合成信頼度に基づいて認識結果を選択する手順とをコンピュータに実行させる。
【0032】
本発明の第8のプログラムは、発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積する手順と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する手順と、全ての発声の認識結果候補から同じ認識結果候補毎に相加平均を計算した合成信頼度を求める手順と、合成信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択する手順と、前回までの発声に対する認識結果に対して誤認識と判定され蓄積された誤認識結果を選択された認識結果から除去して再選択する手順とをコンピュータに実行させる。
【0033】
【発明の実施の形態】
次に、本発明の実施の形態について図面を参照して詳細に説明する。図1は本発明の第1の実施の形態を示したブロック図である。図1を参照すると、本発明の第1の実施の形態の音声認識装置10は、プログラムで実現されるかプログラムを含む認識部100、信頼度計算部120、結果選択部130、修正部140、結果フィルタリング部160、及び結果提示部170と、記憶手段に設けられる認識結果蓄積部110と誤認識結果蓄積部150とを含んでいる。
【0034】
認識部100はユーザの発声により入力された音声に対して予め決められた方法にて1又は複数の認識結果候補及び認識結果候補毎の確からしさを示す尤度を出力する。この予め決められた方法については特に限定しない。認識結果蓄積部110は認識部100で出力された認識結果候補及び尤度を蓄積しておく記憶手段である。
【0035】
信頼度計算部120はユーザの発声毎の各認識結果候補の尤度に対して正規化を実行し信頼度と呼ぶスコアを算出する。信頼度は発声毎に独立に計算でき、しかもその値は各発声間で比較可能である。結果選択部130は複数回の発声の認識結果候補をまとめて信頼度の高い順に結果を選択する。
【0036】
結果提示部170はユーザの発声に対する音声認識装置10の認識結果をユーザへ提示する機能を持ち、提示内容は表示装置に表示したり、音声で通知したりする方法があるが特に限定しない。修正部140は結果提示部170が提示した音声認識結果に対してユーザが正誤を判定し誤認識を指摘する機能を提供する。ユーザが誤認識を指摘する方法は、例えばキーボードやタッチパネル等の入力手段を操作したり、決められたボタンを押下したり、あるいは「はい」「いいえ」と発声したりするようなする方法でよいが、特に限定しない。
【0037】
誤認識結果蓄積部150はユーザの指摘による誤認識結果を蓄積する記憶手段である。結果フィルタリング部160は結果選択部130で信頼度の高い順に結果を選択された認識結果候補から誤認識結果蓄積部150に蓄積された誤認識結果を除去する機能を持つ。
【0038】
次に、本発明の第1の実施の形態の動作について図面を参照して詳細に説明する。図2は本発明の第1の実施の形態の動作を示したフローチャートである。まず、ユーザの発声した音声が入力されると(S51)、認識部100は、音声認識を行い認識結果候補と認識結果候補毎の確からしさを示す尤度を結果として得て(S52)、結果の認識結果候補と各認識結果候補の尤度とを認識結果蓄積部110に蓄積する(S53)。
【0039】
次に、信頼度計算部120は、認識結果蓄積部110に蓄積されている結果全てに対して認識結果候補毎に信頼度を計算する(S54)。実際には発声毎に計算できるので各発声毎にその結果に対して計算することができる。信頼度P(候補k,発声m)は次のように計算される。
P(候補k,発声m)=exp(L(候補k,発声m))/Z(発声m)
ここで、L(候補k,発声m)は第m回目の発声の第k番目の認識結果候補の尤度であり、Z(発声m)は第m回目の発声の全認識結果候補についてPの和が1になるようにする正規化係数である。
【0040】
Z(発声m)は次式のように計算される。
Z(発声m)=Σexp(L(候補k,発声m))
kは発声mの全認識結果候補
Lは認識部100により計算される尤度であるが、例えば発声長で正規化したり、テキストデータから学習される、認識結果の出易さを示す言語スコアを適当な重み付けをして加えたりすることもできる。
【0041】
結果選択部130は、信頼度計算部120で算出された信頼度の高い順に認識結果候補を選択する(S55)。この時、認識結果蓄積部110に蓄積されている認識結果候補全てを対象としているが、例えば各発声で上位n位までの認識結果候補のみを対象としてもよいし、最近の数回の発声の結果のみを対象としてもよいし、それらを組み合わせてもよい。
【0042】
結果フィルタリング部160は、結果選択部130で選択された認識結果候補に対して誤認識結果蓄積部150に蓄積されている誤認識結果と一致する認識結果候補を取り除いて選択し直す(S56)。最終的に最も高い信頼度の認識結果候補から順番に1又は数個の認識結果候補が認識結果として選択され結果提示部170によりユーザに提示される(S57)。
【0043】
ユーザは結果の正誤を判定して入力する(S58)。ユーザが全て誤認識と入力すると、修正部140は今回提示した認識結果である認識結果候補を誤認識結果蓄積部150に蓄積し(S59)、次のユーザの発声の入力を処理する。この時、提示される結果は必ずしも一つとは限らない。すなわち、複数提示される場合はステップS59において蓄積される誤認識は複数になる。
【0044】
誤認識でなかった場合すなわち音声認識に成功した場合、結果提示部170は誤認識結果蓄積部150及び認識結果蓄積部110を初期化し(S60)、対象とするユーザの発声内容に対する音声認識が終了する。必要に応じて次のユーザの発声内容の音声認識を開始できる。
【0045】
このように、本発明の第1の実施の形態の音声認識装置10では、信頼度計算部120を持つことにより、発声毎に各認識結果候補について信頼度を計算でき、その値は他の発声の結果の信頼度と直接比較できるため、全ての発声に正解が含まれていなくても信頼度に応じて適切な認識結果を選択でき、よりよい認識精度を得ることができる。
【0046】
次に、具体例を用いて本実施の形態の動作を説明する。具体例ではユーザが「いわい」と2回発声した場合の動作例を示す。まず、ユーザが「いわい」と発声すると(S51)、認識部100は音声認識を実行し結果として認識結果候補とその尤度を得る(S52)。
【0047】
図3は1回目と2回目のそれぞれの結果の認識結果候補とその尤度を示した図である。図3の結果の1回目が認識結果蓄積部110に蓄積される(S53)。取り消し線は1回目の発声の認識結果が誤認識となったため、2回目の発声の際に取り除かれたことを説明するために付けたものであり、蓄積される結果としては含まれない。図3では、1回目では「ひろい」の尤度が最大であり、正解の「いわい」の尤度はその次となっている。
【0048】
信頼度計算部120は、正規化を実行し図4に示す1回目の結果を得る(S54)。図4は、図3に示した認識結果候補の尤度を正規化した値である信頼度を認識結果候補毎に示した図である。図4では1回目と2回目のそれぞれの値を示しているが、この時点では1回目の結果のみが計算される。
【0049】
結果選択部130は、1回目の信頼度の順に認識結果候補を選択する(S55)。結果フィルタリング部160は、誤認識結果蓄積部150を参照するが1回目であるため蓄積はないので選択された認識結果候補は取り除かれることなく(S57)、「ひらい」の信頼度が最大のため、結果提示部170は「ひらい」を認識結果としてユーザに提示する(S58)。
【0050】
ユーザは提示内容を見て誤認識と判断して入力する。修正部140は誤認識の入力を受けて「ひらい」を誤認識結果蓄積部150に蓄積し(S59)、ユーザの2回目の発声の入力を待つ(S51)。
【0051】
ユーザが2回目の発声を入力すると認識部100は音声認識を実行し図3に示す2回目の結果を得て(S52)、認識結果蓄積部110に蓄積する(S53)。信頼度計算部120は、認識結果蓄積部110に蓄積された1回目と2回目の結果を正規化して図4の結果を得る(S54)。
【0052】
結果選択部130は、1回目と2回目の全ての結果の信頼度の大きい順に選択する(S55)。図4を参照すれば、1回目の「ひらい」の信頼度が0.275と最大で、2番目が1回目の「いわい」の信頼度の0.273となる。3番目以降は1回目の「いまい」、1回目の「しらい」、2回目の「ひらい」、2回目の「しらい」と続く。
【0053】
結果フィルタリング部160は、誤認識結果蓄積部150に蓄積されている「ひらい」を選択された結果から取り除く(S57)。すなわち、図4に取り消し線で示された「ひらい」が取り除かれ、1回目の「いわい」の信頼度が最大となる。結果提示部170は「いわい」を認識結果としてユーザに提示する(S58)。ユーザは提示結果が正解であることを入力するので、結果提示部170は認識結果蓄積部110と誤認識結果蓄積部150とに蓄積されている認識結果と誤認識結果の情報をクリアし(S60)、新たな内容のユーザ発声に備える。
【0054】
このように、上記具体例では正解の「いわい」は1回目の発声にしか含まれないため、従来のように1回目と2回目の結果を乗算してしまうと「いわい」の合成確率は”0”となり正解を得ることができない。また、1回目の結果を使用せず誤認識結果を2回目の認識結果から取り除いても、やはり2回目の発声の上位に正解が含まれないため正解が得られない。
【0055】
本手法では1回目と2回目の結果をそれぞれ正規化し信頼度を計算し、1回目と2回目の誤認識結果候補に対して信頼度の高い順に選択し、さらに誤認識結果を反映させることにより正解を得ることができる。我々は本手法について実際に音声認識実験を行った。日本のほとんどの姓を対象とした認識実験を行い、一回目が誤認識であった発声について同内容の発声をもう一度行ってもらった場合の認識率において、2回目の発声の認識結果から1回目の誤認識結果を取り除く従来の方法で52.8%であった認識率を、56.5%に向上させることができた。このように、従来ある信頼度とこれまでの誤認識結果を取り除く方法の単純な組み合わせでは得られないような顕著な効果が本手法により得られている。
【0056】
次に、本発明の第2の実施の形態について説明する。図5は本発明の第2の実施の形態の音声認識装置20の構成を示すブロック図である。図5を参照すると、音声認識装置20は本発明の第1の実施の形態の音声認識装置10と比べて結果フィルタリング部260と結果選択部230の実行手順が結果フィルタリング部160と結果選択部130と異なる。なお、図5では図1と同じ機能の構成要素は図1と同じ符号を付しているのでこれらの構成要素の説明は省略する。
【0057】
結果フィルタリング部260は、信頼度計算部120により信頼度が計算された認識結果候補に対して誤認識結果蓄積部150に蓄積された誤認識結果を除去する。結果選択部230は、誤認識結果が除去された認識結果候補をまとめて信頼度の高い順に選択する。結果フィルタリング部260と結果選択部230は通常プログラムで実現される。
【0058】
図6は本発明の第2の実施の形態の動作を示したフローチャートである。図6のフローチャートと図2に示すフローチャートと比べると、図1のステップS51〜S54、S57〜S60は図6のステップS61〜S64、S67〜S70にそれぞれ相当し、図2のステップS55、S56の実行順序が図6ではステップS65,S66で逆になっている。すなわち、図2では先に信頼度に基づいて認識結果候補を選択してから(S55)から誤認識結果を除去(S56)しているが、図6では先に結果フィルタリング部260で認識結果候補から誤認識結果を除去して(S65)から結果選択部230で認識結果候補を選択する。
【0059】
このように本発明の第2の実施の形態では、第1の実施の形態と同じ効果を得るとともに、結果選択部230で選択を行う前に誤認識を除去した認識結果候補の順位情報を利用することができる。例えば、誤認識を除去した各発声の最上位認識結果候補同士を信頼度に基づいて比較するなどである。
【0060】
次に、本発明の第3の実施の形態について説明する。図7は本発明の第3の実施の形態の音声認識装置30の機能ブロック図である。図7を参照すると、音声認識装置30は、本発明の第1の実施の形態の音声認識装置10と比べて相加平均計算部380が新たに追加されている。なお、図1と同じ機能の構成要素については同じ符号を付しているのでこれらの構成要素の説明は省略する。
【0061】
相加平均計算部380は、信頼度計算部120により信頼度が計算された認識結果候補に対して発声毎の信頼度の相加平均を新たな信頼度として付与する。相加平均計算部380は通常プログラムで実現され、図示しないが音声認識装置30の演算回路を制御して相加平均の計算を実行する。
【0062】
相加平均の計算において、発声の中に該当する認識結果候補が含まれない場合は、信頼度を0として相加平均を求めてもよく、また該当する認識結果候補が含まれる発声のみで平均を求めてもよい。例えば、全部で5つの発声があって認識結果候補が2発声にしか含まれていなければその2発声分で信頼度の平均を求めればよい。
【0063】
図8は本発明の第3の実施の形態の動作を示したフローチャートである。図8のフローチャートと図2に示すフローチャートと比べると、図1のステップS51〜S60は図8のステップS71〜S80にそれぞれ相当し、図8ではステップS74とS75の間にステップS81が追加される。ステップS81では、ステップS74で計算された各発声の各認識結果候補の信頼度について、相加平均計算部380が、同じ認識結果候補の信頼度の相加平均を計算し、この値を新たに信頼度とする。
【0064】
例えば、図4の信頼度に対して認識結果候補の出現回数の相加平均を取る場合は、「ひらい」は(0.275+0.176)/2=0.226となり、「いわい」は2回目には出現しないため1回目の値のままとなり、「しらい」は(0.214+0.126)=0.170となる。この結果0.273の「いわい」が最大となり、2回目の発声にして正解となる認識結果を得ることができる。
【0065】
また、図4の信頼度に対して発声回数で相加平均を取る場合は、「ひらい」は(0.275+0.176)/2=0.226、「いわい」は(0.273+0)/2=0.137、「しらい」は(0.214+0.126)=0.170となる。この結果0.226の「ひらい」が最大で次が0.170の「しらい」となるが、ステップS76において1回目の結果で誤認識となった「ひらい」が除去されるので「しらい」が選択される。この場合は2回目で正解となる認識結果が得られないため、「しらい」は新たに誤認識とユーザに判定されて誤認識結果蓄積部150に蓄積され、3回目の発声の処理に入ることになる。
【0066】
上記の例では、各発声に対して同じ重みとして相加平均を計算しているが、相加平均する際に発声の時間順に従って重み付けをして相加平均を計算することもできる。例えば、図4で2回目の信頼度には”1”の重みを掛け、1回目の信頼度には0〜1の間の数値(例えば”0.8”)の重みを掛けるような手法であるが、重みの設定は限定しない。また、相加平均の計算も上記2例以外でもよく限定するものではない。
【0067】
以上説明した本発明の第1、第2、又は第3の構成の他に、第2の実施の形態の構成に対して相加平均計算部380の機能を追加した新たな構成も容易に実現できることは明らかであり、どの構成においても各発声の尤度を乗算することはしていないために、正解が認識結果候補として得られない発声があっても、正解を得ることが可能である。
【0068】
【発明の効果】
本発明によれば、新たに得られた発声の認識結果を正規化したスコア(信頼度)と、以前の発声の認識結果を正規化したスコアとを乗算せずに合成確率を求めるようにしたので、正解が全ての発声に出現しない場合でも正解を得ることができるようになり、認識率を改善できるという効果がある。
【図面の簡単な説明】
【図1】本発明の第1の実施の形態の構成を示すブロック図である。
【図2】本発明の第1の実施の形態の動作を示すフローチャートである。
【図3】本発明の第1の実施の形態の尤度の具体例を示す図である。
【図4】本発明の第1の実施の形態の信頼度の具体例を示す図である。
【図5】本発明の第2の実施の形態の構成を示すブロック図である。
【図6】本発明の第2の実施の形態の動作を示すフローチャートである。
【図7】本発明の第3の実施の形態の構成を示すブロック図である。
【図8】本発明の第3の実施の形態の動作を示すフローチャートである。
【図9】従来の技術の説明のための具体例を示す図である。
【符号の説明】
10 音声認識装置
100 認識部
110 認識結果蓄積部
120 信頼度計算部
130 結果選択部
140 修正部
150 誤認識結果蓄積部
160 結果フィルタリング部
170 結果提示部
20 音声認識装置
230 結果選択部
260 結果フィルタリング部
30 音声認識装置
380 相加平均計算部
[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speech recognition device, a speech recognition method, and a program, and particularly to a recognition result of a misrecognized utterance and a re-input utterance when a user misrecognizes a speech and re-inputs the same utterance. The present invention relates to a speech recognition technology using a recognition result.
[0002]
[Prior art]
According to Japanese Patent Application Laid-Open No. Hei 10-133684, which shows an example of a conventional speech recognition apparatus, this speech recognition apparatus uses a previously recognized recognition result candidate and a newly recognized recognition result candidate to generate each utterance. It is shown that the recognition result having the highest probability of matching (excluding the erroneous recognition result) is selected. Specifically, in order to calculate the highest probability, a common recognition result candidate is detected from both recognition result candidates, and the candidate is selected using a composite probability of a result obtained by multiplying the detected result.
[0003]
As a specific example, FIG. 7 shows recognition result candidates recognized when a speaker utters “make” twice and the likelihood corresponding to the recognition result candidates. Referring to FIG. 7, when the determination is made only based on the newly recognized result corresponding to the second utterance, “Fake” is erroneously selected because the probability of “Fake” = 0.4 is the highest. “Fake” is erroneously recognized at the first time, and even if “Fake” is removed by reflecting the result, “Make” having the next highest probability of “Fake” = 0.3 is erroneously selected.
[0004]
However, when “Fake” of erroneous recognition is removed, and the synthesis probability of each recognition result candidate of the previous and current two utterances is 0.06 (= 0.1 × 0.3), “Make” is 0.06 (= 0.1 × 0.3). Since “Make” is 0.03 (= 0.3 × 0.2) and “Bake” is 0.01 (= 0.1 × 0.1), “Make” having the highest rigidity probability can be correctly selected.
[0005]
Further, in Japanese Patent Application Laid-Open No. 2000-250585, when a voice search key to be searched is determined in a database search, the likelihood of a voice search key candidate of an input voice (eg, a municipal name) is determined. It is shown that the response to the question of the related information (eg, the name of the prefecture) of the attribute item of the voice search key is normalized and multiplied by the likelihood of the related information obtained by voice recognition to obtain the recognition likelihood. In this method, the likelihood is normalized, but this is not for adjusting the result of the previous misrecognition, and the result of multiplying both likelihoods is used as the recognition likelihood.
[0006]
[Patent Document 1]
JP-A-10-133684 (page 5)
[Patent Document 2]
JP 2000-250585 A
[0007]
[Problems to be solved by the invention]
The above-described speech recognition apparatus that refers to the previous result has a problem that the correct answer cannot be obtained unless the correct answer appears in both the previously recognized result and the currently recognized result. That is, when the correct answer appears only on one side, the synthesis probability becomes "0" as a result of the multiplication, and the result becomes the minimum and is not selected. This problem has not been solved even in a speech recognition device that multiplies the result of the input speech and the result of the input of the related information.
[0008]
An object of the present invention is to improve the speech recognition performance by referring to both the speech recognition result obtained last time and the speech recognition result obtained this time, and to obtain a correct answer even when a correct answer does not appear in both. An object of the present invention is to provide a speech recognition device, a speech recognition method, and a program that have improved speech recognition performance.
[0009]
[Means for Solving the Problems]
The first speech recognition apparatus of the present invention uses a value obtained by normalizing a probability value indicating the likelihood of a recognition result candidate so that a comparison can be made between utterances when a plurality of utterances having the same content are input. And selecting a recognition result from each recognition result candidate of each utterance.
[0010]
The second speech recognition device of the present invention normalizes the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for an utterance for each utterance when the same utterance is input a plurality of times. And calculating the reliability based on the reliability based on the reliability.
[0011]
The third speech recognition device of the present invention normalizes the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for an utterance for each utterance when the same utterance is input a plurality of times. The recognition result is calculated based on the reliability, and the recognition result is selected from the recognition result candidates from which the recognition result candidates that have been erroneously recognized until the previous utterance are removed.
[0012]
A fourth speech recognition apparatus of the present invention comprises: a recognition unit that recognizes speech for each utterance and calculates likelihood indicating certainty for each of a plurality of candidate recognition results; a recognition result candidate for each utterance; A recognition result accumulating unit for accumulating degrees, a reliability calculating unit for calculating a reliability that is a score normalized based on the likelihood of a recognition result candidate for each utterance, and each utterance accumulated based on the reliability And a result selecting unit for selecting a recognition result from among the respective recognition result candidates.
[0013]
A fifth speech recognition apparatus of the present invention comprises: a recognition unit that recognizes speech for each utterance and calculates likelihood indicating certainty for each of a plurality of candidate recognition results; a recognition result candidate for each utterance; A recognition result accumulating unit for accumulating degrees, a reliability calculating unit for calculating a reliability that is a score normalized based on the likelihood of a recognition result candidate for each utterance, and each utterance accumulated based on the reliability A result selecting unit that selects a recognition result from among the recognition result candidates of the above, an erroneous recognition information storage unit that stores erroneous recognition results determined as erroneous recognition with respect to the recognition result of the previous utterance, and a result selection unit And a result filtering unit that removes the erroneous recognition result stored in the erroneous recognition information storage unit from the recognition result selected in step (1) and reselects the recognition result.
[0014]
A sixth speech recognition apparatus of the present invention comprises: a recognition unit that recognizes speech for each utterance and calculates likelihood indicating certainty for each of a plurality of candidate recognition results; a recognition result candidate for each utterance; A recognition result accumulating unit that accumulates a degree, a reliability calculating unit that calculates a reliability that is a score normalized based on the likelihood of the recognition result candidate for each utterance, and a recognition result for the utterance up to the previous time. A misrecognition information storage unit that stores misrecognition results determined to be misrecognized, a result filtering unit that removes misrecognition results from recognition result candidates for each utterance, and each recognition of each utterance after removing the misrecognition results And a result selection unit for selecting a recognition result from among the result candidates.
[0015]
The seventh speech recognition device of the present invention normalizes the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for an utterance for each utterance when the same utterance is input a plurality of times. From the recognition result candidates of all utterances, the arithmetic reliability is calculated for each of the same recognition result candidates to obtain a combined reliability, and the recognition result candidates that have been erroneously recognized until the previous utterance are determined. A recognition result is selected from the removed recognition result candidates based on the synthesis reliability.
[0016]
An eighth speech recognition apparatus according to the present invention includes a recognition unit that recognizes speech for each utterance and calculates likelihood indicating certainty for each of a plurality of candidate recognition results, a recognition result candidate for each utterance, and its likelihood. A recognition result accumulating unit that accumulates degrees, a reliability calculating unit that calculates a reliability that is a score normalized based on the likelihood of the recognition result candidate for each utterance, and the same recognition from all utterance recognition result candidates An arithmetic mean calculating section for calculating a synthetic reliability by calculating an arithmetic average for each result candidate; and a result selecting section for selecting a recognition result from among the recognition result candidates for each utterance accumulated based on the synthetic reliability. An erroneous recognition information storage unit that stores erroneous recognition results determined to be erroneous recognition with respect to recognition results for previous utterances, and an erroneous recognition information stored in the erroneous recognition information storage unit based on the recognition result selected by the result selection unit. Result filtering to remove and reselect recognition results And having a part, a.
[0017]
The first speech recognition method of the present invention uses a value obtained by normalizing a probability value indicating the likelihood of a recognition result candidate so that a comparison can be made between utterances when a plurality of utterances having the same content are input. And selecting a recognition result from each recognition result candidate of each utterance.
[0018]
In a second speech recognition method of the present invention, when an utterance having the same content is input a plurality of times, the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance is normalized for each utterance. And calculating the reliability based on the reliability based on the reliability.
[0019]
In the third speech recognition method of the present invention, when an utterance having the same content is input a plurality of times, the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance is normalized for each utterance. The recognition result is calculated based on the reliability, and the recognition result is selected from the recognition result candidates from which the recognition result candidates that have been erroneously recognized until the previous utterance are removed.
[0020]
According to a fourth speech recognition method of the present invention, a speech is recognized for each utterance, a likelihood indicating the certainty of each of a plurality of candidate recognition results is calculated and stored, and the likelihood of the recognition result candidate is calculated for each utterance. Is calculated based on the calculated reliability, and the recognition result is selected from the recognition result candidates of each utterance accumulated based on the calculated reliability.
[0021]
According to a fifth speech recognition method of the present invention, a speech is recognized for each utterance, a likelihood indicating the certainty of each of a plurality of candidate recognition results is calculated and stored, and the likelihood of the recognition result candidate is calculated for each utterance. Calculates the reliability, which is a score normalized based on, and selects a recognition result from among the recognition result candidates of each utterance accumulated based on the calculated reliability, and determines the recognition result for the previous utterance. On the other hand, the present invention is characterized in that the recognition result determined as erroneous recognition is removed from the selected recognition result, and the recognition result is reselected.
[0022]
The sixth speech recognition method of the present invention recognizes a speech for each utterance, calculates and stores a likelihood indicating certainty for each of a plurality of candidate recognition results, and stores a likelihood of the recognition result candidate for each utterance. Calculates the reliability, which is a score normalized based on, and removes the accumulated recognition result candidates that have been determined to be misrecognition from the recognition result for the previous utterance from the accumulated recognition result candidates, and removes the misrecognition result. The recognition result is selected from among the recognition result candidates of each utterance after removing the utterance.
[0023]
According to a seventh speech recognition method of the present invention, when an utterance having the same content is input a plurality of times, the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance is normalized for each utterance. Calculates the reliability, and then calculates the arithmetic mean for each of the same recognition result candidates from the recognition result candidates of all utterances. The recognition result is selected from the recognition result candidates from which the is removed based on the synthesis reliability.
[0024]
According to an eighth speech recognition method of the present invention, a speech is recognized for each utterance, a likelihood indicating the certainty of each of a plurality of candidate recognition results is calculated and stored, and the likelihood of the recognition result candidate is calculated for each utterance. Is calculated based on the calculated reliability, and an arithmetic average is calculated from the recognition result candidates of all utterances for each of the same recognition result candidates to obtain a composite reliability, which is accumulated based on the composite reliability. A recognition result is selected from each recognition result candidate of each utterance, and the recognition result of the previous utterance is determined to be erroneous recognition, and the accumulated erroneous recognition results are removed from the selected recognition result and reselected. It is characterized by the following.
[0025]
When a plurality of utterances having the same content are input, the first program according to the present invention employs a normalized value of a probability value indicating the likelihood of a recognition result candidate so that the utterances can be compared. A computer is caused to execute a procedure for selecting a recognition result from each of the utterance recognition result candidates.
[0026]
According to a second program of the present invention, when an utterance having the same content is input a plurality of times, the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance is normalized for each utterance, and the The computer is caused to execute a procedure for calculating the degree and a procedure for selecting a recognition result based on the reliability from the recognition result candidates of each utterance.
[0027]
According to a third program of the present invention, when an utterance having the same content is input a plurality of times, the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance is normalized for each utterance to obtain a reliable The computer is made to execute a procedure for calculating the degree and a procedure for selecting a recognition result based on the reliability from the recognition result candidates from which the recognition result candidates that have been erroneously recognized until the previous utterance are removed.
[0028]
A fourth program of the present invention is a program for recognizing a speech for each utterance, calculating and accumulating likelihood indicating certainty for each of a plurality of candidate recognition results, and a likelihood of the recognition result candidate for each utterance. The computer performs a procedure of calculating a reliability that is a score normalized based on, and a procedure of selecting a recognition result from among the recognition result candidates of each utterance accumulated based on the calculated reliability. .
[0029]
According to a fifth program of the present invention, there is provided a program for recognizing a speech for each utterance, calculating and storing a likelihood indicating certainty for each of a single or a plurality of recognition result candidates, and a likelihood of the recognition result candidate for each utterance. Calculating the confidence level, which is a score normalized based on the utterance, selecting the recognition result from among the recognition result candidates of each utterance accumulated based on the calculated reliability, and deciding the previous utterance And re-selecting the recognition result by removing from the selected recognition result the accumulated recognition error result which is determined to be a false recognition with respect to the recognition result with respect to.
[0030]
According to a sixth program of the present invention, there is provided a program for recognizing a speech for each utterance, calculating and storing a likelihood indicating certainty for each of a single or a plurality of recognition result candidates, and a likelihood of the recognition result candidate for each utterance. A procedure for calculating the reliability, which is a score normalized based on the utterance, and a procedure for removing the accumulated erroneous recognition results determined as erroneous recognition with respect to the recognition result for the previous utterance from the accumulated recognition result candidates. And selecting a recognition result from among the recognition result candidates of each utterance after removing the erroneous recognition result.
[0031]
According to a seventh program of the present invention, when an utterance of the same content is input a plurality of times, the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance is normalized for each utterance to obtain a reliable The procedure for calculating the degree, the procedure for calculating the composite reliability by calculating the arithmetic mean for the same recognition result candidate from all the recognition result candidates for all utterances, and the recognition determined to be erroneous recognition until the previous utterance Selecting a recognition result based on the combined reliability from the recognition result candidates from which the result candidates have been removed.
[0032]
According to an eighth program of the present invention, there is provided a program for recognizing a speech for each utterance, calculating and storing a likelihood indicating certainty for each of a plurality of recognition result candidates, and a likelihood of the recognition result candidate for each utterance. A procedure for calculating a reliability which is a score normalized on the basis of the above, a procedure for calculating a composite reliability by calculating an arithmetic mean for each of the same recognition result candidates from the recognition result candidates of all utterances, The procedure for selecting a recognition result from among the recognition result candidates of each utterance accumulated and the recognition result determined as erroneous recognition with respect to the recognition result for the previous utterance from the selected recognition result. Removing and reselecting the computer.
[0033]
BEST MODE FOR CARRYING OUT THE INVENTION
Next, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a first embodiment of the present invention. Referring to FIG. 1, a speech recognition apparatus 10 according to a first embodiment of the present invention includes a recognition unit 100 realized or including a program, a reliability calculation unit 120, a result selection unit 130, a correction unit 140, It includes a result filtering unit 160, a result presenting unit 170, a recognition result storage unit 110 and an erroneous recognition result storage unit 150 provided in a storage unit.
[0034]
The recognizing unit 100 outputs one or a plurality of recognition result candidates and a likelihood indicating the likelihood of each recognition result candidate by a predetermined method for the voice input by the user's utterance. The predetermined method is not particularly limited. The recognition result storage unit 110 is a storage unit that stores the recognition result candidates and the likelihoods output by the recognition unit 100.
[0035]
The reliability calculation unit 120 performs normalization on the likelihood of each recognition result candidate for each utterance of the user, and calculates a score called reliability. The confidence can be calculated independently for each utterance, and its value can be compared between each utterance. The result selection unit 130 collects a plurality of recognition result candidates of utterances and selects a result in descending order of reliability.
[0036]
The result presenting unit 170 has a function of presenting the recognition result of the voice recognition device 10 to the user's utterance to the user, and there is a method of displaying the content of the presentation on the display device or notifying by voice, but is not particularly limited. The correction unit 140 provides a function that allows the user to determine whether the speech recognition result presented by the result presentation unit 170 is correct or not and to point out erroneous recognition. As a method for the user to point out misrecognition, for example, a method of operating input means such as a keyboard or a touch panel, pressing a predetermined button, or saying “yes” or “no” may be used. However, there is no particular limitation.
[0037]
The misrecognition result accumulation unit 150 is a storage unit that accumulates misrecognition results indicated by the user. The result filtering unit 160 has a function of removing the erroneous recognition results stored in the erroneous recognition result storage unit 150 from the recognition result candidates whose results have been selected in descending order of reliability by the result selection unit 130.
[0038]
Next, the operation of the first exemplary embodiment of the present invention will be described in detail with reference to the drawings. FIG. 2 is a flowchart showing the operation of the first embodiment of the present invention. First, when a voice uttered by the user is input (S51), the recognition unit 100 performs voice recognition to obtain a recognition result candidate and a likelihood indicating the likelihood of each recognition result candidate (S52). And the likelihood of each recognition result candidate are stored in the recognition result storage unit 110 (S53).
[0039]
Next, the reliability calculation unit 120 calculates the reliability of each recognition result candidate for all the results stored in the recognition result storage unit 110 (S54). Actually, since the calculation can be performed for each utterance, the calculation can be performed for the result of each utterance. The reliability P (candidate k, utterance m) is calculated as follows.
P (candidate k, utterance m) = exp (L (candidate k, utterance m)) / Z (utterance m)
Here, L (candidate k, utterance m) is the likelihood of the k-th recognition result candidate of the m-th utterance, and Z (utterance m) is P of all the m-th utterance recognition result candidates. This is a normalization coefficient that makes the sum equal to 1.
[0040]
Z (utterance m) is calculated as follows.
Z (utterance m) = Σexp (L (candidate k, utterance m))
k is a candidate for all recognition results of utterance m
L is a likelihood calculated by the recognizing unit 100. For example, L is normalized by the utterance length, or a linguistic score, which is learned from text data and indicates the easiness of the recognition result, is added with appropriate weighting. You can also.
[0041]
The result selection unit 130 selects recognition result candidates in descending order of the reliability calculated by the reliability calculation unit 120 (S55). At this time, all the recognition result candidates stored in the recognition result storage unit 110 are targeted. However, for example, only the recognition result candidates up to the top n in each utterance may be targeted, or several recent utterances may be targeted. Only the results may be targeted, or they may be combined.
[0042]
The result filtering unit 160 removes the recognition result candidate selected by the result selection unit 130 and removes the recognition result candidate that matches the erroneous recognition result stored in the erroneous recognition result storage unit 150 (S56). Finally, one or several recognition result candidates are selected as recognition results in order from the recognition result candidate having the highest reliability, and presented to the user by the result presentation unit 170 (S57).
[0043]
The user determines whether the result is correct or not and inputs the result (S58). When the user inputs all incorrect recognitions, the correction unit 140 stores the recognition result candidates, which are the recognition results presented this time, in the false recognition result storage unit 150 (S59), and processes the input of the next user's utterance. At this time, the presented result is not always one. That is, when a plurality of recognitions are presented, a plurality of erroneous recognitions are accumulated in step S59.
[0044]
If the recognition is not erroneous, that is, if the speech recognition is successful, the result presentation unit 170 initializes the erroneous recognition result accumulation unit 150 and the recognition result accumulation unit 110 (S60), and the speech recognition for the utterance content of the target user ends. I do. The voice recognition of the utterance content of the next user can be started as needed.
[0045]
As described above, in the speech recognition apparatus 10 according to the first embodiment of the present invention, by having the reliability calculation unit 120, the reliability can be calculated for each recognition result candidate for each utterance, and the value is calculated for another utterance. Can be directly compared with the reliability of the result, even if all utterances do not include the correct answer, an appropriate recognition result can be selected according to the reliability, and better recognition accuracy can be obtained.
[0046]
Next, the operation of the present embodiment will be described using a specific example. In a specific example, an operation example in the case where the user utters “Iwai” twice is shown. First, when the user utters "I'm sorry" (S51), the recognition unit 100 executes speech recognition and obtains a recognition result candidate and its likelihood as a result (S52).
[0047]
FIG. 3 is a diagram showing recognition result candidates of the first and second results and their likelihoods. The first result of FIG. 3 is stored in the recognition result storage unit 110 (S53). The strike-through line is added to explain that the recognition result of the first utterance was erroneously recognized and was removed at the time of the second utterance, and is not included as a stored result. In FIG. 3, the likelihood of “hiro” is the maximum at the first time, and the likelihood of “Iwai” of the correct answer is next.
[0048]
The reliability calculation unit 120 performs the normalization and obtains the first result shown in FIG. 4 (S54). FIG. 4 is a diagram showing, for each recognition result candidate, the reliability which is a value obtained by normalizing the likelihood of the recognition result candidate shown in FIG. FIG. 4 shows the values of the first time and the second time, but at this time, only the result of the first time is calculated.
[0049]
The result selection unit 130 selects recognition result candidates in the order of the first reliability (S55). The result filtering unit 160 refers to the misrecognition result accumulation unit 150, but does not accumulate because it is the first time. Therefore, the selected recognition result candidate is not removed (S57), and the reliability of "open" is maximized. Then, the result presenting unit 170 presents “open” to the user as a recognition result (S58).
[0050]
The user sees the content of the presentation and determines that the recognition is erroneous, and inputs. The correction unit 140 receives the input of erroneous recognition and accumulates “open” in the erroneous recognition result accumulation unit 150 (S59), and waits for the input of the user's second utterance (S51).
[0051]
When the user inputs the second utterance, the recognition unit 100 executes the voice recognition, obtains the second result shown in FIG. 3 (S52), and stores it in the recognition result storage unit 110 (S53). The reliability calculation unit 120 normalizes the first and second results stored in the recognition result storage unit 110 to obtain the result of FIG. 4 (S54).
[0052]
The result selecting unit 130 selects the first and second results in descending order of reliability (S55). Referring to FIG. 4, the reliability of the first “open” is 0.275, which is the maximum, and the second is the reliability of the first “Iwai”, which is 0.273. After the third, the first "Imai", the first "Shirai", the second "Hirai", the second "Shirai" and so on.
[0053]
The result filtering unit 160 removes “hirai” accumulated in the misrecognition result accumulation unit 150 from the selected result (S57). In other words, the "openness" indicated by the strikethrough line in FIG. 4 is removed, and the reliability of the first "iwai" is maximized. The result presenting unit 170 presents “Iwai” to the user as a recognition result (S58). Since the user inputs that the presentation result is correct, the result presentation unit 170 clears the information of the recognition result and the misrecognition result stored in the recognition result accumulation unit 110 and the misrecognition result accumulation unit 150 (S60). ), To prepare for user utterance of new contents.
[0054]
As described above, in the above specific example, since the correct answer “Iwai” is included only in the first utterance, if the first and second results are multiplied as in the related art, the composite probability of “Iwai” is “ 0 "and no correct answer can be obtained. Further, even if the erroneous recognition result is removed from the second recognition result without using the first result, the correct answer cannot be obtained because the correct answer is not included in the second utterance.
[0055]
In this method, the first and second results are each normalized to calculate the reliability, and the first and second false recognition result candidates are selected in descending order of reliability, and the false recognition result is reflected. You can get the correct answer. We conducted a speech recognition experiment on this method. A recognition experiment was conducted on most surnames in Japan, and the recognition rate when the first utterance was incorrectly recognized and the same utterance was repeated again was the first from the recognition result of the second utterance. The recognition rate which was 52.8% by the conventional method for removing the false recognition result of the above was improved to 56.5%. As described above, a remarkable effect that cannot be obtained by a simple combination of the conventional reliability and the method of removing the misrecognition result is obtained by the present method.
[0056]
Next, a second embodiment of the present invention will be described. FIG. 5 is a block diagram showing the configuration of the speech recognition device 20 according to the second embodiment of the present invention. Referring to FIG. 5, the speech recognition device 20 is different from the speech recognition device 10 according to the first embodiment of the present invention in that the execution procedure of the result filtering unit 260 and the result selection unit 230 is different from the result filtering unit 160 and the result selection unit 130. And different. In FIG. 5, components having the same functions as those in FIG. 1 are denoted by the same reference numerals as those in FIG. 1, and thus description of these components is omitted.
[0057]
The result filtering unit 260 removes the erroneous recognition results stored in the erroneous recognition result storage unit 150 from the recognition result candidates whose reliability has been calculated by the reliability calculation unit 120. The result selection unit 230 collectively selects the recognition result candidates from which the erroneous recognition results have been removed, in order of decreasing reliability. The result filtering unit 260 and the result selecting unit 230 are realized by a normal program.
[0058]
FIG. 6 is a flowchart showing the operation of the second embodiment of the present invention. Compared with the flowchart of FIG. 6 and the flowchart of FIG. 2, steps S51 to S54 and S57 to S60 in FIG. 1 correspond to steps S61 to S64 and S67 to S70 in FIG. In FIG. 6, the execution order is reversed in steps S65 and S66. That is, in FIG. 2, the recognition result candidate is selected based on the reliability first, and then the erroneous recognition result is removed (S56) from (S55). However, in FIG. Then, the erroneous recognition result is removed from (S65), and the result selection unit 230 selects a recognition result candidate from (S65).
[0059]
As described above, in the second embodiment of the present invention, the same effect as that of the first embodiment is obtained, and the order information of the recognition result candidates from which the erroneous recognition is removed before the selection is performed by the result selection unit 230 is used. can do. For example, the top recognition result candidates of each utterance from which the erroneous recognition has been removed are compared based on the reliability.
[0060]
Next, a third embodiment of the present invention will be described. FIG. 7 is a functional block diagram of the speech recognition device 30 according to the third embodiment of the present invention. Referring to FIG. 7, the speech recognition device 30 is different from the speech recognition device 10 according to the first embodiment of the present invention in that an arithmetic mean calculation unit 380 is newly added. The components having the same functions as those in FIG. 1 are denoted by the same reference numerals, and the description of these components will be omitted.
[0061]
The arithmetic mean calculating unit 380 gives an arithmetic mean of the reliability of each utterance to the recognition result candidate whose reliability has been calculated by the reliability calculating unit 120 as a new reliability. Arithmetic average calculation section 380 is usually realized by a program, and controls an arithmetic circuit (not shown) of speech recognition device 30 to calculate an arithmetic average.
[0062]
In the calculation of the arithmetic mean, if the corresponding recognition result candidate is not included in the utterance, the arithmetic mean may be obtained by setting the reliability to 0, or the average is obtained only by the utterance including the corresponding recognition result candidate. May be requested. For example, if there are five utterances in total and the recognition result candidate is included in only two utterances, the average of the reliability may be obtained for the two utterances.
[0063]
FIG. 8 is a flowchart showing the operation of the third embodiment of the present invention. Compared with the flowchart of FIG. 8 and the flowchart of FIG. 2, steps S51 to S60 in FIG. 1 correspond to steps S71 to S80 in FIG. 8, respectively, and in FIG. 8, step S81 is added between steps S74 and S75. . In step S81, regarding the reliability of each recognition result candidate of each utterance calculated in step S74, the arithmetic average calculation unit 380 calculates an arithmetic average of the reliability of the same recognition result candidate, and newly calculates this value. Assume reliability.
[0064]
For example, in the case of taking the arithmetic mean of the number of appearances of the recognition result candidate with respect to the reliability of FIG. 4, "open" is (0.275 + 0.176) /2=0.226, and "open" is the second time. Does not appear in, the value remains at the first time, and “Shirai” is (0.214 + 0.126) = 0.170. As a result, the “sore” of 0.273 is maximized, and a recognition result that is correct when the second utterance is obtained can be obtained.
[0065]
In addition, in the case of taking an arithmetic mean with the number of utterances for the reliability of FIG. 4, "open" is (0.275 + 0.176) /2=0.226, and "Iwai" is (0.273 + 0) / 2. = 0.137, and "Shirai" is (0.214 + 0.126) = 0.170. As a result, the “hirai” of 0.226 is the maximum and the “hirai” of the next is 0.170. However, in the step S76, the “hirai” which is erroneously recognized in the first result is removed. Is selected. In this case, since a recognition result that is a correct answer cannot be obtained in the second time, “Shirai” is newly determined to be erroneous recognition by the user, is stored in the erroneous recognition result storage unit 150, and the process of the third utterance is started. Will be.
[0066]
In the above example, the arithmetic mean is calculated as the same weight for each utterance. However, the arithmetic average may be calculated by performing weighting according to the chronological order of the utterances. For example, in FIG. 4, the second reliability is weighted by “1”, and the first reliability is weighted by a numerical value between 0 and 1 (eg, “0.8”). However, the setting of the weight is not limited. The arithmetic mean calculation is not limited to the above two examples.
[0067]
In addition to the first, second, or third configuration of the present invention described above, a new configuration in which the function of the arithmetic average calculator 380 is added to the configuration of the second embodiment can be easily realized. It is clear that this is possible, and since the likelihood of each utterance is not multiplied in any configuration, it is possible to obtain a correct answer even if there is an utterance for which a correct answer cannot be obtained as a recognition result candidate.
[0068]
【The invention's effect】
According to the present invention, a synthesis probability is obtained without multiplying a score (reliability) obtained by normalizing a newly obtained utterance recognition result and a score obtained by normalizing a previous utterance recognition result. Therefore, even when the correct answer does not appear in all utterances, the correct answer can be obtained, and the recognition rate can be improved.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a first exemplary embodiment of the present invention.
FIG. 2 is a flowchart showing an operation of the first exemplary embodiment of the present invention.
FIG. 3 is a diagram illustrating a specific example of likelihood according to the first embodiment of this invention;
FIG. 4 is a diagram illustrating a specific example of reliability according to the first embodiment of this invention;
FIG. 5 is a block diagram showing a configuration of a second exemplary embodiment of the present invention.
FIG. 6 is a flowchart illustrating an operation of the second exemplary embodiment of the present invention.
FIG. 7 is a block diagram illustrating a configuration of a third exemplary embodiment of the present invention.
FIG. 8 is a flowchart showing the operation of the third embodiment of the present invention.
FIG. 9 is a diagram showing a specific example for explaining a conventional technique.
[Explanation of symbols]
10 Speech recognition device
100 Recognition unit
110 Recognition result storage unit
120 Reliability calculator
130 Result selector
140 Correction unit
150 Accident recognition result accumulation unit
160 Result filtering unit
170 Result presentation section
20 Speech recognition device
230 Result selector
260 Result Filtering Unit
30 Voice recognition device
380 arithmetic mean calculator

Claims (24)

同内容の発声が複数入力される場合に、認識結果候補の確からしさを示す値を各発声間で比較可能となるように正規化した値を用いて各発声の各認識結果候補の中から認識結果を選択することを特徴とする音声認識装置。When multiple utterances with the same content are input, the value indicating the probability of the recognition result candidate is recognized from each recognition result candidate of each utterance using a value normalized so that it can be compared between each utterance A speech recognition device characterized by selecting a result. 同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出し、各発声の認識結果候補の中から信頼度に基づいて認識結果を選択することを特徴とする音声認識装置。When an utterance of the same content is input a plurality of times, the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance is normalized for each utterance to calculate reliability, and each utterance is recognized. A speech recognition device, wherein a recognition result is selected from result candidates based on reliability. 同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出し、前回の発声までの間に誤認識とされた認識結果候補を除去した認識結果候補の中から信頼度に基づいて認識結果を選択することを特徴とする音声認識装置。When an utterance of the same content is input a plurality of times, the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance is normalized for each utterance to calculate reliability, and until the previous utterance A speech recognition apparatus characterized in that a recognition result is selected based on reliability from recognition result candidates from which recognition result candidates that have been erroneously recognized during the period are removed. 発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算する認識部と、発声毎の認識結果候補とその尤度とを蓄積する認識結果蓄積部と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する信頼度計算部と、信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択する結果選択部と、を有することを特徴とする音声認識装置。A recognition unit that recognizes speech for each utterance and calculates likelihood indicating certainty for each of a plurality of recognition result candidates, and a recognition result accumulation unit that accumulates recognition result candidates for each utterance and their likelihoods, A reliability calculation unit that calculates a reliability that is a score normalized based on the likelihood of the recognition result candidate for each utterance, and a recognition result from among the recognition result candidates of each utterance accumulated based on the reliability. And a result selection unit for selecting. 発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算する認識部と、発声毎の認識結果候補とその尤度とを蓄積する認識結果蓄積部と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する信頼度計算部と、信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択する結果選択部と、前回までの発声に対する認識結果に対して誤認識と判定された誤認識結果を蓄積する誤認識情報蓄積部と、結果選択部で選択された認識結果から誤認識情報蓄積部に蓄積された誤認識結果を除去して認識結果を再選択する結果フィルタリング部と、を有することを特徴とする音声認識装置。A recognition unit that recognizes speech for each utterance and calculates likelihood indicating certainty for each of a plurality of recognition result candidates, and a recognition result accumulation unit that accumulates recognition result candidates for each utterance and their likelihoods, A reliability calculation unit that calculates a reliability that is a score normalized based on the likelihood of the recognition result candidate for each utterance, and a recognition result from among the recognition result candidates of each utterance accumulated based on the reliability. A result selection unit to select, a misrecognition information storage unit for storing misrecognition results determined to be misrecognition with respect to recognition results for previous utterances, and misrecognition information storage from the recognition results selected by the result selection unit A result filtering unit that removes an erroneous recognition result accumulated in the unit and reselects a recognition result. 発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算する認識部と、発声毎の認識結果候補とその尤度とを蓄積する認識結果蓄積部と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する信頼度計算部と、前回までの発声に対する認識結果に対して誤認識と判定された誤認識結果を蓄積する誤認識情報蓄積部と、発声毎の認識結果候補から誤認識結果を除去する結果フィルタリング部と、誤認識結果を除去した後の各発声の各認識結果候補の中から認識結果を選択する結果選択部と、を有することを特徴とする音声認識装置。A recognition unit that recognizes speech for each utterance and calculates likelihood indicating certainty for each of a plurality of recognition result candidates, and a recognition result accumulation unit that accumulates recognition result candidates for each utterance and their likelihoods, A reliability calculation unit that calculates a reliability that is a score normalized based on the likelihood of a recognition result candidate for each utterance, and accumulates erroneous recognition results determined as erroneous recognition with respect to recognition results for previous utterances Error recognition information storage unit, a result filtering unit that removes false recognition results from recognition result candidates for each utterance, and a result that selects a recognition result from each recognition result candidate of each utterance after removing the false recognition results And a selecting unit. 同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出し、さらに全ての発声の認識結果候補から同じ認識結果候補毎に相加平均を計算した合成信頼度を求め、前回の発声までの間に誤認識とされた認識結果候補を除去した認識結果候補の中から合成信頼度に基づいて認識結果を選択することを特徴とする音声認識装置。When an utterance of the same content is input a plurality of times, the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance is normalized for each utterance to calculate reliability, and all utterances are further calculated. Calculates the arithmetic reliability for each recognition result candidate for the same recognition result candidate from the recognition result candidates, and removes the recognition result candidates that were erroneously recognized until the previous utterance. A speech recognition apparatus, wherein a recognition result is selected based on a speech. 発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算する認識部と、発声毎の認識結果候補とその尤度とを蓄積する認識結果蓄積部と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する信頼度計算部と、全ての発声の認識結果候補から同じ認識結果候補毎に相加平均を計算した合成信頼度を求める相加平均計算部と、合成信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択する結果選択部と、前回までの発声に対する認識結果に対して誤認識と判定された誤認識結果を蓄積する誤認識情報蓄積部と、結果選択部で選択された認識結果から誤認識情報蓄積部に蓄積された誤認識結果を除去して再選択する結果フィルタリング部と、を有することを特徴とする音声認識装置。A recognition unit that recognizes speech for each utterance and calculates likelihood indicating certainty for each of a plurality of recognition result candidates, and a recognition result accumulation unit that accumulates recognition result candidates for each utterance and their likelihoods, A reliability calculation unit that calculates a reliability that is a score normalized based on the likelihood of a recognition result candidate for each utterance, and a synthesis that calculates an arithmetic average for each recognition result candidate from all utterance recognition result candidates An arithmetic mean calculating unit for obtaining the reliability, a result selecting unit for selecting a recognition result from among the recognition result candidates of each utterance accumulated based on the combined reliability, and a recognition result for the previous utterance. A misrecognition information storage unit that accumulates misrecognition results determined as misrecognition, and a result filtering unit that removes the misrecognition results accumulated in the misrecognition information storage unit from the recognition result selected by the result selection unit and reselects the result. And a part Voice recognition device. 同内容の発声が複数入力される場合に、認識結果候補の確からしさを示す値を各発声間で比較可能となるように正規化した値を用いて各発声の各認識結果候補の中から認識結果を選択することを特徴とする音声認識方法。When multiple utterances with the same content are input, the value indicating the probability of the recognition result candidate is recognized from each recognition result candidate of each utterance using a value normalized so that it can be compared between each utterance A speech recognition method comprising selecting a result. 同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出し、各発声の認識結果候補の中から信頼度に基づいて認識結果を選択することを特徴とする音声認識方法。When an utterance of the same content is input a plurality of times, the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance is normalized for each utterance to calculate reliability, and each utterance is recognized. A speech recognition method characterized by selecting a recognition result from among result candidates based on reliability. 同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出し、前回の発声までの間に誤認識とされた認識結果候補を除去した認識結果候補の中から信頼度に基づいて認識結果を選択することを特徴とする音声認識方法。When an utterance of the same content is input a plurality of times, the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance is normalized for each utterance to calculate reliability, and until the previous utterance A speech recognition method characterized by selecting a recognition result based on reliability from recognition result candidates from which recognition result candidates that have been erroneously recognized during the period are removed. 発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積し、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算し、計算された信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択することを特徴とする音声認識方法。A reliability that is a score that is obtained by recognizing a speech for each utterance, calculating and storing a likelihood indicating the certainty of each of a plurality of candidate recognition results, and normalizing the likelihood of the recognition result candidate for each utterance. , And selecting a recognition result from among the recognition result candidates of each utterance accumulated based on the calculated reliability. 発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積し、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算し、計算された信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択し、前回までの発声に対する認識結果に対して誤認識と判定され蓄積された誤認識結果を選択された認識結果から除去して認識結果を再選択することを特徴とする音声認識方法。A reliability that is a score that is obtained by recognizing a speech for each utterance, calculating and storing a likelihood indicating the certainty of each of a plurality of candidate recognition results, and normalizing the likelihood of the recognition result candidate for each utterance. Is calculated, and a recognition result is selected from among the recognition result candidates for each utterance accumulated based on the calculated reliability. A speech recognition method comprising removing a recognition result from a selected recognition result and reselecting the recognition result. 発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積し、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算し、前回までの発声に対する認識結果に対して誤認識と判定され蓄積された誤認識結果を蓄積された認識結果候補から除去し、誤認識結果を除去した後の各発声の各認識結果候補の中から認識結果を選択することを特徴とする音声認識方法。A reliability that is a score that is obtained by recognizing a speech for each utterance, calculating and storing a likelihood indicating the certainty of each of a plurality of candidate recognition results, and normalizing the likelihood of the recognition result candidate for each utterance. Is calculated, and the recognition result of the previous utterance is determined to be erroneous recognition, the accumulated erroneous recognition result is removed from the accumulated recognition result candidates, and each recognition result of each utterance after removing the erroneous recognition result is calculated. A speech recognition method characterized by selecting a recognition result from candidates. 同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出し、さらに全ての発声の認識結果候補から同じ認識結果候補毎に相加平均を計算した合成信頼度を求め、前回の発声までの間に誤認識と判定された認識結果候補を除去した認識結果候補の中から合成信頼度に基づいて認識結果を選択することを特徴とする音声認識方法。When an utterance of the same content is input a plurality of times, the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance is normalized for each utterance to calculate reliability, and all utterances are further calculated. Calculates the arithmetic reliability for each of the same recognition result candidates from the recognition result candidates, and removes the recognition result candidates that were determined to be incorrectly recognized up to the previous utterance. A speech recognition method comprising selecting a recognition result based on a degree. 発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積し、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算し、全ての発声の認識結果候補から同じ認識結果候補毎に相加平均を計算した合成信頼度を求め、合成信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択し、前回までの発声に対する認識結果に対して誤認識と判定され蓄積された誤認識結果を選択された認識結果から除去して再選択することを特徴とする音声認識方法。A reliability that is a score that is obtained by recognizing a speech for each utterance, calculating and storing a likelihood indicating the certainty of each of a plurality of candidate recognition results, and normalizing the likelihood of the recognition result candidate for each utterance. Is calculated, and an arithmetic mean is calculated for each of the same recognition result candidates from the recognition result candidates of all utterances, and a synthetic reliability is obtained, and recognition is performed from among the recognition result candidates of each utterance accumulated based on the synthetic reliability. A speech recognition method comprising: selecting a result; removing the accumulated misrecognition result determined as a misrecognition with respect to the recognition result of the previous utterance from the selected recognition result; and reselecting the result. 同内容の発声が複数入力される場合に、認識結果候補の確からしさを示す値を各発声間で比較可能となるように正規化した値を正規化した値を用いて各発声の各認識結果候補の中から認識結果を選択する手順をコンピュータに実行させるプログラム。When multiple utterances with the same content are input, each recognition result of each utterance is normalized using a value obtained by normalizing a value indicating the likelihood of the recognition result candidate so that the utterance can be compared between the utterances. A program that causes a computer to execute a procedure for selecting a recognition result from candidates. 同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出する手順と、各発声の認識結果候補の中から信頼度に基づいて認識結果を選択する手順とをコンピュータに実行させるプログラム。When an utterance having the same content is input a plurality of times, a procedure for normalizing likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance for each utterance to calculate reliability, and for each utterance And selecting a recognition result based on the reliability from among the recognition result candidates. 同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出する手順と、前回の発声までの間に誤認識とされた認識結果候補を除去した認識結果候補の中から信頼度に基づいて認識結果を選択する手順とをコンピュータに実行させるプログラム。When an utterance having the same content is input a plurality of times, a procedure of normalizing the likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance for each utterance to calculate the reliability, Selecting a recognition result based on reliability from recognition result candidates from which recognition result candidates that have been erroneously recognized until utterance are removed. 発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積する手順と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する手順と、計算された信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択する手順とをコンピュータに実行させるプログラム。A procedure for recognizing speech for each utterance and calculating and storing a likelihood indicating certainty for each of a plurality of candidate recognition results, and a score normalized for each utterance based on the likelihood of the recognition result candidate. A program for causing a computer to execute a procedure for calculating the reliability and a procedure for selecting a recognition result from among the recognition result candidates of each utterance accumulated based on the calculated reliability. 発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積する手順と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する手順と、計算された信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択する手順と、前回までの発声に対する認識結果に対して誤認識と判定され蓄積された誤認識結果を選択された認識結果から除去して認識結果を再選択する手順とをコンピュータに実行させるプログラム。A procedure for recognizing a speech for each utterance and calculating and storing likelihood indicating certainty for each of a plurality of candidate recognition results, and a score normalized for each utterance based on the likelihood of the recognition result candidate. A procedure for calculating the reliability, a procedure for selecting a recognition result from among the recognition result candidates of each utterance accumulated based on the calculated reliability, and a step of erroneously recognizing the recognition result for the previous utterance. Removing the determined and accumulated erroneous recognition results from the selected recognition results and reselecting the recognition results. 発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積する手順と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する手順と、前回までの発声に対する認識結果に対して誤認識と判定され蓄積された誤認識結果を蓄積された認識結果候補から除去する手順と、誤認識結果を除去した後の各発声の各認識結果候補の中から認識結果を選択する手順とをコンピュータに実行させるプログラム。A procedure for recognizing speech for each utterance and calculating and storing a likelihood indicating certainty for each of a plurality of candidate recognition results, and a score normalized for each utterance based on the likelihood of the recognition result candidate. A procedure for calculating the degree of reliability, a procedure for removing the accumulated misrecognition result determined as misrecognition from the recognition result for the previous utterance from the accumulated recognition result candidates, and a procedure for removing the misrecognition result. Selecting a recognition result from among the recognition result candidates of each utterance. 同内容の発声が複数回入力される場合に、発声に対する音声認識の結果として得た認識結果候補毎の確からしさを示す尤度を発声毎に正規化して信頼度を算出する手順と、さらに全ての発声の認識結果候補から同じ認識結果候補毎に相加平均を計算した合成信頼度を求める手順と、前回の発声までの間に誤認識と判定された認識結果候補を除去した認識結果候補の中から合成信頼度に基づいて認識結果を選択する手順とをコンピュータに実行させるプログラム。When a utterance having the same content is input a plurality of times, a procedure of normalizing likelihood indicating the likelihood of each recognition result candidate obtained as a result of speech recognition for the utterance for each utterance to calculate the reliability, and further all Of calculating the arithmetic mean for each of the same recognition result candidates from the recognition result candidates of the utterance of the utterance, and determining the recognition result candidates from which the recognition result candidates determined to be erroneously recognized until the previous utterance are removed. Selecting a recognition result based on the synthesis reliability from among them. 発声毎に音声を認識し単一または複数の認識結果候補毎の確かさを示す尤度を計算して蓄積する手順と、発声毎に認識結果候補の尤度に基づいて正規化したスコアである信頼度を計算する手順と、全ての発声の認識結果候補から同じ認識結果候補毎に相加平均を計算した合成信頼度を求める手順と、合成信頼度に基づいて蓄積された各発声の各認識結果候補の中から認識結果を選択する手順と、前回までの発声に対する認識結果に対して誤認識と判定され蓄積された誤認識結果を選択された認識結果から除去して再選択する手順とをコンピュータに実行させるプログラム。A procedure for recognizing speech for each utterance and calculating and storing a likelihood indicating certainty for each of a plurality of candidate recognition results, and a score normalized for each utterance based on the likelihood of the recognition result candidate. A procedure for calculating the reliability, a procedure for calculating a composite reliability by calculating an arithmetic mean for each of the same recognition result candidates from the recognition result candidates for all utterances, and a method for recognizing each utterance accumulated based on the composite reliability A procedure for selecting a recognition result from among the candidate results and a procedure for removing and re-selecting the accumulated recognition errors determined as misrecognition from the recognition results for the previous utterance from the selected recognition results. A program to be executed by a computer.
JP2003007378A 2003-01-15 2003-01-15 Speech recognition apparatus, speech recognition method, and program Expired - Lifetime JP3695448B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003007378A JP3695448B2 (en) 2003-01-15 2003-01-15 Speech recognition apparatus, speech recognition method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003007378A JP3695448B2 (en) 2003-01-15 2003-01-15 Speech recognition apparatus, speech recognition method, and program

Publications (2)

Publication Number Publication Date
JP2004219747A true JP2004219747A (en) 2004-08-05
JP3695448B2 JP3695448B2 (en) 2005-09-14

Family

ID=32897501

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003007378A Expired - Lifetime JP3695448B2 (en) 2003-01-15 2003-01-15 Speech recognition apparatus, speech recognition method, and program

Country Status (1)

Country Link
JP (1) JP3695448B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011203434A (en) * 2010-03-25 2011-10-13 Fujitsu Ltd Voice recognition device and voice recognition method
WO2018043137A1 (en) * 2016-08-31 2018-03-08 ソニー株式会社 Information processing device and information processing method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05108091A (en) * 1991-10-17 1993-04-30 Ricoh Co Ltd Speech recognition device
JPH08263091A (en) * 1995-03-22 1996-10-11 N T T Data Tsushin Kk Device and method for recognition
JPH10133684A (en) * 1996-10-31 1998-05-22 Microsoft Corp Method and system for selecting alternative word during speech recognition
JPH11149294A (en) * 1997-11-17 1999-06-02 Toyota Motor Corp Voice recognition device and voice recognition method
JPH11194793A (en) * 1997-12-26 1999-07-21 Nec Corp Voice word processor
JP2000250585A (en) * 1999-02-25 2000-09-14 Nippon Telegr & Teleph Corp <Ntt> Interactive database retrieving method and device and recording medium recorded with interactive database retrieving program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05108091A (en) * 1991-10-17 1993-04-30 Ricoh Co Ltd Speech recognition device
JPH08263091A (en) * 1995-03-22 1996-10-11 N T T Data Tsushin Kk Device and method for recognition
JPH10133684A (en) * 1996-10-31 1998-05-22 Microsoft Corp Method and system for selecting alternative word during speech recognition
JPH11149294A (en) * 1997-11-17 1999-06-02 Toyota Motor Corp Voice recognition device and voice recognition method
JPH11194793A (en) * 1997-12-26 1999-07-21 Nec Corp Voice word processor
JP2000250585A (en) * 1999-02-25 2000-09-14 Nippon Telegr & Teleph Corp <Ntt> Interactive database retrieving method and device and recording medium recorded with interactive database retrieving program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011203434A (en) * 2010-03-25 2011-10-13 Fujitsu Ltd Voice recognition device and voice recognition method
WO2018043137A1 (en) * 2016-08-31 2018-03-08 ソニー株式会社 Information processing device and information processing method
CN109643545A (en) * 2016-08-31 2019-04-16 索尼公司 Information processing equipment and information processing method

Also Published As

Publication number Publication date
JP3695448B2 (en) 2005-09-14

Similar Documents

Publication Publication Date Title
CN105723449B (en) speech content analysis system and speech content analysis method
US6134527A (en) Method of testing a vocabulary word being enrolled in a speech recognition system
US8024188B2 (en) Method and system of optimal selection strategy for statistical classifications
JP5366169B2 (en) Speech recognition system and program for speech recognition system
US9043209B2 (en) Language model creation device
US8050929B2 (en) Method and system of optimal selection strategy for statistical classifications in dialog systems
US7949524B2 (en) Speech recognition correction with standby-word dictionary
JP5440177B2 (en) Word category estimation device, word category estimation method, speech recognition device, speech recognition method, program, and recording medium
JP4728972B2 (en) Indexing apparatus, method and program
US20070100814A1 (en) Apparatus and method for detecting named entity
JP2001092496A (en) Continuous voice recognition device and recording medium
WO2012001458A1 (en) Voice-tag method and apparatus based on confidence score
JPWO2010128560A1 (en) Speech recognition apparatus, speech recognition method, and speech recognition program
CN114155839A (en) Voice endpoint detection method, device, equipment and storage medium
JP2016177045A (en) Voice recognition device and voice recognition program
JP2008046633A (en) Speech recognition by statistical language using square-root discounting
JP3695448B2 (en) Speech recognition apparatus, speech recognition method, and program
JP5201973B2 (en) Voice search device
JP2004046106A (en) Speech recognition device and speech recognition program
JP4604424B2 (en) Speech recognition apparatus and method, and program
JP3621922B2 (en) Sentence recognition apparatus, sentence recognition method, program, and medium
JP6497651B2 (en) Speech recognition apparatus and speech recognition program
KR100449912B1 (en) Apparatus and method for detecting topic in speech recognition system
JP2002259912A (en) Online character string recognition device and online character string recognition method
JPH09311694A (en) Speech recognition device

Legal Events

Date Code Title Description
A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20050125

RD01 Notification of change of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7421

Effective date: 20050310

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20050322

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20050513

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20050607

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20050620

R150 Certificate of patent or registration of utility model

Ref document number: 3695448

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090708

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100708

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110708

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110708

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120708

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120708

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130708

Year of fee payment: 8

EXPY Cancellation because of completion of term