JP4498906B2

JP4498906B2 - Voice recognition device

Info

Publication number: JP4498906B2
Application number: JP2004351488A
Authority: JP
Inventors: 玲子岡田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2004-12-03
Filing date: 2004-12-03
Publication date: 2010-07-07
Anticipated expiration: 2024-12-03
Also published as: JP2006162782A

Description

この発明は、例えばナビゲーションシステムを音声で操作するために使用される音声認識装置に関し、特に音声の認識率を向上させる技術に関する。 The present invention relates to a voice recognition device used for operating a navigation system with voice, for example, and more particularly to a technique for improving a voice recognition rate.

従来、ユーザが発話した音声を認識して認識結果を出力する音声認識装置が知られている。通常の声認識装置においては、音声認識処理で用いられる認識辞書が用意されており、認識辞書の内部には認識対象とする語彙が格納されている。音声認識処理では、ユーザが発話した語彙を分析し、認識辞書に格納されている語彙と照らし含わせることにより、認識結果を提示する。 2. Description of the Related Art Conventionally, a voice recognition device that recognizes voice spoken by a user and outputs a recognition result is known. In a normal voice recognition device, a recognition dictionary used in voice recognition processing is prepared, and a vocabulary to be recognized is stored inside the recognition dictionary. In the speech recognition process, the vocabulary spoken by the user is analyzed, and compared with the vocabulary stored in the recognition dictionary, the recognition result is presented.

音声認識に関連する技術として、特許文献１は、入力すべきデータをあらかじめ複数のグループに分類して辞書中に収容するようにした音声入力識別方式を開示している。この音声入力識別方式では、音声に対応する文字を、清音グループ、濁音グループおよび半濁音グループといった３つのグループに分類して辞書に格納しておき、スイッチによって選択された辞書を用いて認識処理を行うことにより、指定した任意の１音の文字を音声入力することを可能としている。 As a technique related to speech recognition, Patent Document 1 discloses a speech input identification method in which data to be input is classified into a plurality of groups in advance and stored in a dictionary. In this speech input identification method, characters corresponding to speech are classified into three groups, a clear sound group, a muddy sound group, and a semi-voiced sound group, stored in a dictionary, and recognition processing is performed using the dictionary selected by the switch. By doing so, it is possible to input a voice of any one specified character.

また、特許文献２は、携帯型電子辞書に、アルファベットボタンを持たせず辞書引きするという操作方法を用い、リアルタイムで辞書引きを可能とする電子辞書装置を開示している。この電子辞書装置では、１つの音声認識ボタンを複数の操作方法に用いるものであり、音声認識開始や中止や候補提示の操作を、複数のスイッチを用いずに単一のボタンで可能にしたものである。音声認識ボタンを１秒以上押すと音声入力を行い、０．５秒以内に１度押すことにより誤認識時に認識結果の次の候補提示を行い、０．５秒以内に２度音声認識ボタンを押すことにより、音声認識の中断を行う。 Further, Patent Document 2 discloses an electronic dictionary device that enables dictionary lookup in real time using an operation method of dictionary lookup without having an alphabet button on a portable electronic dictionary. In this electronic dictionary device, one voice recognition button is used for a plurality of operation methods, and voice recognition start / stop and candidate presentation operations can be performed with a single button without using a plurality of switches. It is. When the voice recognition button is pressed for 1 second or more, voice input is performed. When the voice recognition button is pressed once within 0.5 seconds, the next candidate of the recognition result is presented at the time of erroneous recognition. The voice recognition button is pressed twice within 0.5 seconds. Voice recognition is interrupted by pressing.

特開昭５５−２０４０号公報JP-A-55-2040 特開平１０−１７１４９２号公報JP-A-10-171492

ところで、上述した従来の音声認識装置では、多数の認識辞書を用いて音声認識を行うことによって音声認識処理で認識可能となる語彙が多くなり、一度に多くの認識語彙を受け付けることが可能となる。しかしながら、認識対象とする語彙を多くすればするほど、誤認識が多く発生するという問題がある。 By the way, in the conventional speech recognition apparatus described above, the vocabulary that can be recognized by the speech recognition processing increases by performing speech recognition using a large number of recognition dictionaries, and it becomes possible to accept many recognition vocabularies at a time. . However, there is a problem that misrecognition occurs more as the vocabulary to be recognized increases.

なお、上述した特許文献１に開示された技術では、文字を１文字ずつしか入力できないため、連続語の入力を一度にすることができず、複数文字を入力する場合には、入力語の文字数だけ操作を繰り返すという手間が必要になる。また、辞書を切り替えるためには、各辞書に対応したスイッチを辞書の数だけ用意しなければならない。また、特許文献２に開示された技術では、音声認識ボタン操作により、単に認識開始、訂正、中断を可能にしただけであり、利用する辞書を指定したり切り替えることによって認識率を向上させることはできない。 In the technique disclosed in Patent Document 1 described above, since characters can be input only one character at a time, it is not possible to input continuous words at a time. When inputting a plurality of characters, the number of characters in the input word It only takes time to repeat the operation. Further, in order to switch dictionaries, it is necessary to prepare as many switches as the number of dictionaries corresponding to each dictionary. In addition, the technology disclosed in Patent Document 2 simply enables recognition start, correction, and interruption by operating a voice recognition button, and improving the recognition rate by specifying or switching a dictionary to be used is not possible. Can not.

この発明は、上述した問題を解消するためになされたものであり、その課題は、簡単な構成によって高い認識率で目的とする語彙を認識させることができる音声認識装置を提供することにある。 The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a speech recognition apparatus capable of recognizing a target vocabulary at a high recognition rate with a simple configuration.

この発明に係る音声認識装置は、音声を認識するための複数の認識辞書と、音声認識を開始させる認識開始ボタンと、認識開始ボタンが操作された回数に応じて複数の認識辞書の１つを有効に設定する制御手段と、制御手段により有効に設定された認識辞書を用いて音声認識を行う音声認識手段とを備え、制御手段は、１つの認識辞書が選択された状態で認識開始ボタンが所定時間以上押下され続けたときに、該選択された認識辞書を常時有効に設定し、音声認識手段は、有効および常時有効に設定された認識辞書を用いて音声認識を行うものである。 The speech recognition apparatus according to the present invention includes a plurality of recognition dictionaries for recognizing speech, a recognition start button for starting speech recognition, and one of the plurality of recognition dictionaries according to the number of times the recognition start button is operated. A control means for enabling the voice recognition means for performing voice recognition using the recognition dictionary that is set effectively by the control means, and the control means has a recognition start button with one recognition dictionary selected. when kept pressed for a predetermined time or more, always enabled the selected recognition dictionary, the speech recognition means, Ru der to perform speech recognition using an effective and constantly enabled by recognition dictionary.

この発明によれば、音声認識に使用する辞書を認識開始ボタンの押下によって選択した後、１つの認識辞書が選択された状態で認識開始ボタンが所定時間以上押下され続けたときに、該選択された認識辞書を常時有効に設定し、認識処理を行わせるように構成したので、確実な認識辞書の選択による語彙の絞り込みが可能となり、高い認識率で目的とする語彙を認識できる。また、音声認識に使用する認識辞書は、音声認識を開始させるための認識開始ボタンを共用して、該認識開始ボタンを押下した回数によって、選択するように構成したので、音声認識装置を簡単且つ安価に構成できる。

According to the present invention, after a dictionary to be used for speech recognition is selected by pressing the recognition start button, the selection is performed when the recognition start button is continuously pressed for a predetermined time or more with one recognition dictionary selected. and the recognition dictionary is enabled at all times, since it is configured so as to perform recognition processing, it is possible to reliably narrowing vocabulary by the selection of the recognition dictionary can recognize vocabulary of interest at a high recognition rate. In addition, the recognition dictionary used for speech recognition shares the recognition start button for starting speech recognition and is configured to be selected according to the number of times the recognition start button is pressed. Can be configured at low cost.

以下、この発明の実施の形態を、図面を参照しながら詳細に説明する。
実施の形態１．
図１は、この発明の実施の形態１に係る音声認識装置の構成を示すブロック図である。この音声認識装置は、音声認識辞書１、認識辞書管理手段２、音声認識手段３、制御手段４、出力情報制御手段４、手動入力手段６、キーコード判別手段７、音声入力手段８、画面出力手段９、音声出力手段１０、リモートコントローラ（以下、「リモコン」と略する）１１、マイクロフォン（以下、「マイク」と略する）１２、モニタ１３およびスピーカ１４から構成されている。リモコン１１には、音声認識を開始させるための認識開始ボタン１５が設けられている。この認識開始ボタン１５が押下されることにより、認識処理が開始され、マイク１２に向かって発話された音声の認識処理が行われる。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 1 of the present invention. This voice recognition device includes a voice recognition dictionary 1, a recognition dictionary management means 2, a voice recognition means 3, a control means 4, an output information control means 4, a manual input means 6, a key code discrimination means 7, a voice input means 8, and a screen output. A means 9, an audio output means 10, a remote controller (hereinafter abbreviated as “remote control”) 11, a microphone (hereinafter abbreviated as “microphone”) 12, a monitor 13 and a speaker 14. The remote control 11 is provided with a recognition start button 15 for starting voice recognition. When the recognition start button 15 is pressed, the recognition process is started, and the voice uttered toward the microphone 12 is recognized.

音声認識辞書１は、複数の認識辞書から構成されている。この実施の形態１に係る音声認識装置では、音声認識辞書１は、語彙の種別によって分類された４個の認識辞書＃１〜＃４から構成されている。具体的には、認識辞書＃１は住所に使用される語彙（県名、市町村名など）を含む「住所辞書」、認識辞書＃２は施設名に使用される語彙（東京タワー等）を含む「施設名辞書」、認識辞書＃３は電話番号に使用される語彙（数字）を含む「電話番号辞書」、認識辞書＃４は楽曲の曲名に使用される語彙（曲名）を含む「曲名辞書」から構成されている。 The speech recognition dictionary 1 is composed of a plurality of recognition dictionaries. In the speech recognition apparatus according to the first embodiment, the speech recognition dictionary 1 is composed of four recognition dictionaries # 1 to # 4 classified according to vocabulary types. Specifically, the recognition dictionary # 1 includes a vocabulary (prefecture name, municipality name, etc.) used for an address, and the recognition dictionary # 2 includes a vocabulary (Tokyo Tower, etc.) used for a facility name. “Facility name dictionary”, recognition dictionary # 3 is “phone number dictionary” including vocabulary (numbers) used for telephone numbers, and recognition dictionary # 4 is “song name dictionary including vocabularies (song names) used for song names. ].

認識辞書管理手段２は、音声認識辞書１を構成する複数の認識辞書＃１〜＃４を管理する。この認識辞書管理手段２は、図２に示すような管理テーブル２１を保持している。管理テーブル２１は、辞書番号（＃１〜＃４）、辞書名（住所、施設名、電話番号、曲名）および状態（無効または有効）を記憶している。そして、制御手段４から送られてくる辞書切替要求に応じて、認識対象とする１つの認識辞書を選択し、その選択した認識辞書の状態を「有効」に設定するとともに他の認識辞書を「無効」に設定する。また、認識辞書管理手段２は、音声認識が行われる際に、「有効」に設定された認識辞書の内容を読み出して音声認識手段３に送る。 The recognition dictionary management means 2 manages a plurality of recognition dictionaries # 1 to # 4 constituting the speech recognition dictionary 1. This recognition dictionary management means 2 holds a management table 21 as shown in FIG. The management table 21 stores a dictionary number (# 1 to # 4), a dictionary name (address, facility name, telephone number, song name) and state (invalid or valid). Then, in response to the dictionary switching request sent from the control means 4, one recognition dictionary to be recognized is selected, the state of the selected recognition dictionary is set to “valid”, and other recognition dictionaries are set to “ Set to Disable. Also, the recognition dictionary management means 2 reads the contents of the recognition dictionary set to “valid” and sends it to the voice recognition means 3 when the voice recognition is performed.

音声認識手段３は、音声認識辞書１を参照して音声認識処理を実行する。この音声認識手段３は、音声分析処理部３１とマッチング処理部３２とから構成されている。音声分析処理部３１は、制御手段４から認識開始の指示がなされた場合に、音声入力手段８から送られてくる音声データを分析する。この音声分析処理部３１における分析結果は、マッチング処理部３２に送られる。マッチング処理部３２は、音声分析処理部３１から送られてくる分析結果と音声認識辞書１から認識辞書管理手段２を介して送られてくる語彙とを比較するマッチング処理を実行する。マッチング処理部３２におけるマッチング処理によって得られた認識結果、具体的には認識語彙およびその正解確率（以下、「スコア」という）は制御手段４に送られる。 The voice recognition means 3 executes voice recognition processing with reference to the voice recognition dictionary 1. The voice recognition unit 3 includes a voice analysis processing unit 31 and a matching processing unit 32. The voice analysis processing unit 31 analyzes the voice data sent from the voice input unit 8 when an instruction to start recognition is given from the control unit 4. The analysis result in the voice analysis processing unit 31 is sent to the matching processing unit 32. The matching processing unit 32 executes a matching process for comparing the analysis result sent from the voice analysis processing unit 31 with the vocabulary sent from the voice recognition dictionary 1 via the recognition dictionary management means 2. The recognition result obtained by the matching processing in the matching processing unit 32, specifically, the recognition vocabulary and its correct answer probability (hereinafter referred to as “score”) are sent to the control means 4.

制御手段４は、認識辞書管理手段２、音声認識手段３、出力情報制御手段５、手動入力手段６およびキーコード判別手段７との間でデータを送受することにより音声認識装置の全体を制御する。この制御手段４の詳細は後述する。 The control unit 4 controls the entire speech recognition apparatus by transmitting and receiving data to and from the recognition dictionary management unit 2, the speech recognition unit 3, the output information control unit 5, the manual input unit 6, and the key code determination unit 7. . Details of the control means 4 will be described later.

出力情報制御手段５は、画面表示や音声出力を制御する。この出力情報制御手段５は、出力情報テーブル５１、画面出力生成手段５２および音声出力生成手段５３から構成されている。出力情報テーブル５１は、図３に示すように、出力される情報の番号に対応させて、モニタ１３に表示される画面表示文字および発話例文字とスピーカ１４から音声で出力される音声ガイダンス文字を記憶している。この出力情報テーブル５１に記憶されている画面表示文字および発話例は画面出力生成手段５２によって読み出され、音声ガイダンス文字は音声出力生成手段５３によって読み出される。 The output information control means 5 controls screen display and audio output. The output information control unit 5 includes an output information table 51, a screen output generation unit 52, and an audio output generation unit 53. As shown in FIG. 3, the output information table 51 includes screen display characters and utterance example characters displayed on the monitor 13 and voice guidance characters output from the speaker 14 in correspondence with the number of information to be output. I remember it. The screen display characters and utterance examples stored in the output information table 51 are read by the screen output generation means 52, and the voice guidance characters are read by the voice output generation means 53.

画面出力生成手段５２は、制御手段４から送られてくる画面表示情報（詳細は後述する）に従って出力情報テーブル５１から読み出した画面表示文字および発話例文字に基づき画面出力データを生成する。また、画面出力生成手段５２は、制御手段４から送られてくる認識語彙に基づき画面出力データを生成する。この画面出力生成手段５２で生成された画面出力データは、画面出力手段９に送られる。音声出力生成手段５３は、制御手段４から送られてくる画面表示情報に従って出力情報テーブル５１から読み出した音声ガイダンス文字に基づき音声出力データを生成する。また、音声出力生成手段５３は、制御手段４から送られてくる認識語彙に基づき音声出力データを生成する。この音声出力生成手段５３で生成された音声出力データは、音声出力手段１０に送られる。 The screen output generation means 52 generates screen output data based on the screen display characters and utterance example characters read from the output information table 51 in accordance with screen display information (details will be described later) sent from the control means 4. Further, the screen output generation means 52 generates screen output data based on the recognized vocabulary sent from the control means 4. The screen output data generated by the screen output generation means 52 is sent to the screen output means 9. The voice output generation means 53 generates voice output data based on the voice guidance characters read from the output information table 51 in accordance with the screen display information sent from the control means 4. The voice output generation unit 53 generates voice output data based on the recognized vocabulary sent from the control unit 4. The audio output data generated by the audio output generation unit 53 is sent to the audio output unit 10.

手動入力手段６は、リモコン１１のキーが押されることにより該リモコン１１から送られてくるキーコードを受け付けてキーイベントを発生する。この手動入力手段６で発生されたキーイベントおよびキーコードは制御手段４に送られる。なお、キーイベントおよびキーコードを発生させる手段としては、リモコン１１の代わりに、タッチパネル、押釦スイッチといった他の入力手段を用いることもできる。 The manual input means 6 receives a key code sent from the remote controller 11 when a key on the remote controller 11 is pressed and generates a key event. The key event and key code generated by the manual input means 6 are sent to the control means 4. As a means for generating a key event and a key code, other input means such as a touch panel and a push button switch can be used instead of the remote controller 11.

キーコード判別手段７は、制御手段４から送られてくるキーコードを判別する。例えば、キーコード判別手段７は、制御手段４から認識開始ボタン１５のキーコードが送られてきた場合は、そのキーコードが認識開始ボタン１５に対応する旨を判別し、判別結果を制御手段４に返送する。 The key code discrimination means 7 discriminates the key code sent from the control means 4. For example, when the key code of the recognition start button 15 is sent from the control unit 4, the key code determination unit 7 determines that the key code corresponds to the recognition start button 15, and determines the determination result as the control unit 4. Return to

音声入力手段８は、例えばＡ／Ｄ変換器から構成されており、人が発話することによってマイク１２で生成された音声信号を入力し、音声認識手段３で取り扱い可能なデジタル形式の音声データに変換する。この音声入力手段８で音声信号を変換することにより得られた音声データは、音声認識手段３の音声分析処理部３１に送られる。 The voice input means 8 is composed of, for example, an A / D converter, inputs a voice signal generated by the microphone 12 when a person speaks, and converts the voice signal into digital voice data that can be handled by the voice recognition means 3. Convert. The voice data obtained by converting the voice signal by the voice input unit 8 is sent to the voice analysis processing unit 31 of the voice recognition unit 3.

画面出力手段９は、例えばＤ／Ａ変換器から構成されており、出力情報制御手段５から送られてくる画面出力データをアナログの映像信号に変換し、モニタ１３に送る。これにより、モニタ１３に画面出力データに応じた文字や絵から成る画像が表示される。モニタ１３は、例えば液晶ディスプレイ装置やＣＲＴ装置から構成することができる。 The screen output means 9 is composed of, for example, a D / A converter, converts the screen output data sent from the output information control means 5 into an analog video signal, and sends it to the monitor 13. As a result, an image composed of characters and pictures corresponding to the screen output data is displayed on the monitor 13. The monitor 13 can be composed of, for example, a liquid crystal display device or a CRT device.

音声出力手段１０は、例えばＤ／Ａ変換器から構成されており、出力情報制御手段５から送られてくる音声出力データをアナログの音声信号に変換し、スピーカ１４に送る。これにより、スピーカ１４から音声出力データに応じた音声が出力される。 The audio output means 10 is composed of, for example, a D / A converter, converts the audio output data sent from the output information control means 5 into an analog audio signal, and sends it to the speaker 14. As a result, sound corresponding to the sound output data is output from the speaker 14.

次に、制御手段４の詳細を説明する。制御手段４は、切替回数カウント処理部４１、切替対応テーブル４２、辞書切替処理部４３、認識エンジン制御処理部４４、画面切替処理部４５および認識結果判定処理部４６から構成されている。 Next, details of the control means 4 will be described. The control means 4 includes a switching number count processing unit 41, a switching correspondence table 42, a dictionary switching processing unit 43, a recognition engine control processing unit 44, a screen switching processing unit 45, and a recognition result determination processing unit 46.

切替回数カウント処理部４１は、手動入力手段６からのキーイベントを受け、該キーイベントと同時に送られてきたキーコードをキーコード判別手段７に送る。そして、キーコード判別手段７から認識開始ボタン１５のキーコードであることが送り返されてきた場合に、認識開始ボタン１５が押された回数、つまり認識辞書の切替回数をカウントする。また、切替回数カウント処理部４１は、カウントされた切替回数に応じて切替対応テーブル４２を参照することにより認識対象とする認識辞書を決定し、その辞書番号を辞書切替処理部４３に送るとともに、その辞書番号に対応する画面表示情報を画面切替処理部４５に送る。 The switching count processing unit 41 receives a key event from the manual input means 6 and sends the key code sent simultaneously with the key event to the key code determination means 7. When the key code discriminating means 7 returns that the key code of the recognition start button 15 is returned, the number of times the recognition start button 15 is pressed, that is, the number of times of switching the recognition dictionary is counted. Further, the switching number counting processing unit 41 determines a recognition dictionary to be recognized by referring to the switching correspondence table 42 according to the counted number of switchings, and sends the dictionary number to the dictionary switching processing unit 43. The screen display information corresponding to the dictionary number is sent to the screen switching processing unit 45.

切替対応テーブル４２は、図４に示すように、切替回数（認識開始ボタン１５の押下回数（ｎ））に対応させて、「認識対象」、「辞書番号」および「画面表示情報」を記憶している。「認識対象」は、切替回数が住所、施設名、電話番号または曲名の何れに対応するかを表している。「辞書番号」は、切替回数が認識辞書＃１〜＃４の何れに対応するかを表している。「画面表示情報」は、切替回数が図３に示した出力される情報の番号１〜４の何れに対応するかを表している。この切替対応テーブル４２は、上述したように、切替回数カウント処理部４１によって参照される。 As shown in FIG. 4, the switching correspondence table 42 stores “recognition target”, “dictionary number”, and “screen display information” in association with the number of times of switching (the number of times the recognition start button 15 is pressed (n)). ing. “Recognition target” indicates whether the number of times of switching corresponds to an address, a facility name, a telephone number, or a song name. The “dictionary number” indicates which of the recognition dictionaries # 1 to # 4 corresponds to the number of times of switching. “Screen display information” indicates which of the numbers 1 to 4 of the output information shown in FIG. The switching correspondence table 42 is referred to by the switching number count processing unit 41 as described above.

辞書切替処理部４３は、切替回数カウント処理部４１から送られてくる辞書番号に応じて、該辞書番号で指定される認識辞書に切り替えるべき旨の辞書切替要求を発生し、認識辞書管理手段２に送る。また、辞書切替処理部４３は、辞書切替要求を発生した場合に、その旨を認識エンジン制御処理部４４に通知する。 The dictionary switching processing unit 43 generates a dictionary switching request for switching to the recognition dictionary specified by the dictionary number in accordance with the dictionary number sent from the switching number counting processing unit 41, and the recognition dictionary management means 2 Send to. Further, when a dictionary switching request is generated, the dictionary switching processing unit 43 notifies the recognition engine control processing unit 44 to that effect.

認識エンジン制御処理部４４は、辞書切替処理部４３から送られてくる辞書切替要求が発生された旨の通知に応答して、音声認識手段３の音声分析処理部３１に対して認識開始の要求および認識停止の要求を送る。これら認識開始の要求および認識停止の要求に応じて、音声認識手段３は、音声認識処理を開始または停止する。 The recognition engine control processing unit 44 requests the speech analysis processing unit 31 of the speech recognition means 3 to start recognition in response to the notification that the dictionary switching request is sent from the dictionary switching processing unit 43. And send a request to stop recognition. In response to the recognition start request and the recognition stop request, the speech recognition means 3 starts or stops the speech recognition processing.

画面切替処理部４５は、切替回数カウント処理部４１から送られてくる画面表示情報に応じて、該画面表示情報で指定される画面に切り替えるべき旨の画面切替要求を発生し、出力情報制御手段５に送る。また、画面切替処理部４５は、認識結果判定処理部４６から送られてくる認識語彙を出力情報制御手段５に送る。 The screen switching processing unit 45 generates a screen switching request indicating that the screen should be switched to the screen specified by the screen display information according to the screen display information sent from the switching number counting processing unit 41, and outputs information control means Send to 5. Further, the screen switching processing unit 45 sends the recognition vocabulary sent from the recognition result determination processing unit 46 to the output information control means 5.

認識結果判定処理部４６は、音声認識手段３のマッチング処理部３２から認識結果として送られてくる認識語彙およびそのスコアに基づき最終的な判定を行って認識語彙を確定する。この認識結果判定処理部４６における判定によって確定された認識語彙は、画面切替処理部４５に送られる。また、認識結果判定処理部４６によって確定された認識語彙は、詳細な説明は省略するが、種々のアプリケーションで使用される。 The recognition result determination processing unit 46 determines the recognition vocabulary by making a final determination based on the recognition vocabulary sent from the matching processing unit 32 of the speech recognition means 3 as a recognition result and its score. The recognized vocabulary determined by the determination in the recognition result determination processing unit 46 is sent to the screen switching processing unit 45. The recognition vocabulary determined by the recognition result determination processing unit 46 is used in various applications, although a detailed description is omitted.

次に、上記のように構成される、この発明の実施の形態１に係る音声認識装置の動作を、図５に示すフローチャートを参照しながら説明する。 Next, the operation of the speech recognition apparatus according to Embodiment 1 of the present invention configured as described above will be described with reference to the flowchart shown in FIG.

音声認識装置が起動されると、まず、辞書番号ｎが「１」に初期化される（ステップＳＴ１）。次いで、リモコン１１からのキー入力の有無が調べられる（ステップＳＴ２）。すなわち、制御手段４は、手動入力手段６からキーイベントが送られてきたかどうかを調べる。ここで、キー入力がないことが判断されると、このステップＳＴ２を繰り返し実行しながらキー入力がなされるのを待って待機状態に入る。 When the speech recognition apparatus is activated, first, the dictionary number n is initialized to “1” (step ST1). Next, it is checked whether or not there is a key input from the remote controller 11 (step ST2). That is, the control unit 4 checks whether a key event has been sent from the manual input unit 6. Here, if it is determined that there is no key input, step ST2 is repeatedly executed while waiting for a key input to be entered, and a standby state is entered.

このステップＳＴ２の繰り返し実行による待機状態において、キー入力がなされると、認識開始ボタン１５の押下であるかどうかが調べられる（ステップＳＴ３）。すなわち、制御手段４の切替回数カウント処理部４１は、手動入力手段６からキーイベントとともに送られてくるキーコードをキーコード判別手段７へ送る。そして、これに応答してキーコード判別手段７から送り返されてくる判別結果に基づいて認識開始ボタン１５のキーイベントであるかどうかを調べる。 When a key is input in the standby state due to repeated execution of step ST2, it is checked whether or not the recognition start button 15 is pressed (step ST3). That is, the switching number count processing unit 41 of the control unit 4 sends the key code sent together with the key event from the manual input unit 6 to the key code determination unit 7. Then, in response to this, it is checked whether or not it is a key event of the recognition start button 15 based on the discrimination result sent back from the key code discrimination means 7.

このステップＳＴ３で認識開始ボタン１５の押下であることが判断されると、認識辞書の切り替えが行われる（ステップＳＴ４）。すなわち、切替回数カウント処理部４１は、切替回数ｎをカウントアップし、切替対応テーブル４２を参照して認識対象とする認識辞書を決定する。そして、決定された認識辞書の辞書番号を辞書切替処理部４３に送るとともに、その辞書番号に対応する画面表示情報を画面切替処理部４５に送る。辞書切替処理部４３は、この辞書番号に応じて辞書切替要求を発生し、認識辞書管理手段２に送る。認識辞書管理手段２は、この辞書切替要求に応じて認識対象とする１つの認識辞書を選択し、その選択した認識辞書の状態を「有効」に設定するとともに他の認識辞書を「無効」に設定する。なお、最初は切替回数ｎが「１」に設定されるので、認識辞書＃１、つまり住所辞書が選択される。 If it is determined in step ST3 that the recognition start button 15 is pressed, the recognition dictionary is switched (step ST4). That is, the switching number count processing unit 41 counts up the switching number n, and refers to the switching correspondence table 42 to determine a recognition dictionary to be recognized. Then, the dictionary number of the determined recognition dictionary is sent to the dictionary switching processing unit 43, and screen display information corresponding to the dictionary number is sent to the screen switching processing unit 45. The dictionary switching processing unit 43 generates a dictionary switching request according to the dictionary number and sends it to the recognition dictionary management means 2. The recognition dictionary management means 2 selects one recognition dictionary to be recognized in response to the dictionary switching request, sets the state of the selected recognition dictionary to “valid”, and sets other recognition dictionaries to “invalid”. Set. Since the number n of switching is initially set to “1”, recognition dictionary # 1, that is, an address dictionary is selected.

次いで、画面表示および音声出力の切り替えが行われる（ステップＳＴ５）。すなわち、画面切替処理部４５は、切替回数カウント処理部４１から送られてきた画面表示情報に応じて画面切替要求を生成し、出力情報制御手段５に送る。次いで、画面表示および音声出力が行われる（ステップＳＴ６）。すなわち、画面切替要求を受け取った出力情報制御手段５の画面出力生成手段５２は、この画面切替要求に応答して、出力情報テーブル５１（図３参照）からｎ番目の認識辞書に対応する画面表示文字および発話例文字を読み出して画面出力データを生成し、画面出力手段９に送る。これにより、モニタ１３に、画面表示文字および発話例文字が表示される。最初は切替回数が「１」に設定されて住所辞書が選択されているので、図６（ａ）に示すような、住所を認識するための画面がモニタ１３に表示される。また、出力情報制御手段５の音声出力生成手段５３は、画面切替処理部４５からの画面切替要求に応答して、出力情報テーブル５１からｎ番目の認識辞書に対応する音声ガイダンス文字を読み出して音声出力データを生成し、音声出力手段１０に送る。これにより、スピーカ１４から、音声ガイダンスが出力される。最初は切替回数が「１」に設定されて住所辞書が選択されているので、図６（ａ）に示すような、「住所をお話ください」という音声が出力される。 Next, switching between screen display and audio output is performed (step ST5). That is, the screen switching processing unit 45 generates a screen switching request according to the screen display information sent from the switching number counting processing unit 41 and sends it to the output information control means 5. Next, screen display and audio output are performed (step ST6). That is, the screen output generation unit 52 of the output information control unit 5 that has received the screen switching request responds to the screen switching request and displays the screen corresponding to the nth recognition dictionary from the output information table 51 (see FIG. 3). The characters and utterance example characters are read to generate screen output data, which is sent to the screen output means 9. Thereby, the screen display characters and the utterance example characters are displayed on the monitor 13. Since the number of switching is initially set to “1” and the address dictionary is selected, a screen for recognizing an address as shown in FIG. In addition, the voice output generation unit 53 of the output information control unit 5 reads the voice guidance characters corresponding to the nth recognition dictionary from the output information table 51 in response to the screen switching request from the screen switching processing unit 45, and generates a voice. Output data is generated and sent to the audio output means 10. Thereby, voice guidance is output from the speaker 14. At first, since the number of switching is set to “1” and the address dictionary is selected, the voice “Please tell me the address” as shown in FIG. 6A is output.

次いで、音声認識が開始される（ステップＳＴ７）。すなわち、制御手段４の認識エンジン制御処理部４４は、辞書切替処理部４３から辞書切替要求が発生された旨が通知された場合は、認識開始命令を発生し、音声認識手段３の音声分析処理部３１に送る。これにより、音声認識処理が開始される。 Next, voice recognition is started (step ST7). That is, the recognition engine control processing unit 44 of the control unit 4 generates a recognition start command when the dictionary switching processing unit 43 is notified that a dictionary switching request has been generated, and the speech analysis processing of the speech recognition unit 3 Send to part 31. Thereby, the voice recognition process is started.

次いで、一定時間内（例えば１秒以内等）にキー入力がなされたかどうかが調べられる（ステップＳＴ８）。ここで、キー入力がなされたことが判断された場合には、辞書番号ｎがインクリメント（＋１）され（ステップＳＴ９）、その後、シーケンスはステップＳＴ３に戻る。以下、上述した処理が繰り返される。従って、認識対象辞書は、認識開始ボタン１５が押される毎に、認識辞書＃１（住所辞書）→認識辞書＃２（施設名辞書）→認識辞書＃３（電話番号辞書）→認識辞書＃４（曲名辞書）→認識辞書＃１（住所辞書）・・・とサイクリックに変化する。また、モニタ１３への画面表示およびスピーカ１４からの音声出力は、図６（ａ）に示す状態→図６（ｂ）に示す状態→図６（ｃ）に示す状態→図６（ｄ）に示す状態→図６（ａ）に示す状態・・・とサイクリックに変化する。図６（ａ）は、認識開始ボタン１５が１、５、９、・・・回押下された時、図６（ｂ）は、認識開始ボタン１５が２、６、１０、・・・回押下された時、図６（ｃ）は、認識開始ボタン１５が３、７、１１、・・・回押下された時、図６（ｄ）は、認識開始ボタン１５が４、８、１２、・・・回押下された時の表示画面および音声出力の状態を示している。 Next, it is checked whether or not a key input has been made within a certain time (for example, within 1 second) (step ST8). If it is determined that a key input has been made, the dictionary number n is incremented (+1) (step ST9), and then the sequence returns to step ST3. Thereafter, the above-described processing is repeated. Therefore, each time the recognition start button 15 is pressed, the recognition target dictionary is recognized as dictionary # 1 (address dictionary) → recognition dictionary # 2 (facility name dictionary) → recognition dictionary # 3 (phone number dictionary) → recognition dictionary # 4. (Song name dictionary) → Recognition dictionary # 1 (address dictionary)... Further, the screen display on the monitor 13 and the sound output from the speaker 14 are as shown in FIG. 6 (a) → the state shown in FIG. 6 (b) → the state shown in FIG. 6 (c) → FIG. 6 (d). The state shown in FIG. 6A changes cyclically to the state shown in FIG. 6A shows that when the recognition start button 15 is pressed 1, 5, 9,... Times, FIG. 6B shows that the recognition start button 15 is pressed 2, 6, 10,. 6 (c) shows that when the recognition start button 15 is pressed 3, 7, 11,... Times, FIG. 6 (d) shows that the recognition start button 15 is 4, 8, 12,.・・ Shows the display screen and audio output status when pressed once.

上記ステップＳＴ８において、一定時間内にキー入力がなされなかったことが判断されると、音声認識処理が実行される（ステップＳＴ１０）。すなわち、マイク１２から入力された音声が音声入力手段８で音声データに変換されて音声分析処理部３１に送られる。音声分析処理部３１は、音声入力手段８から送られてくる音声データを分析し、分析結果をマッチング処理部３２に送る。マッチング処理部３２は、音声分析処理部３１から送られてくる分析結果とその時点で選択されている認識辞書の語彙とを比較するマッチング処理を実行し、得られた認識語彙およびそのスコアを認識結果として制御手段４の認識結果判定処理部４６に送る。認識結果判定処理部４６は、音声認識手段３のマッチング処理部３２から認識結果として送られてくる認識語彙およびそのスコアに基づき認識語彙を確定し、画面切替処理部４５に送る。画面切替処理部４５は、認識結果判定処理部４６から送られてくる認識語彙を出力情報制御手段５に送る。 If it is determined in step ST8 that no key input has been made within a predetermined time, a speech recognition process is executed (step ST10). That is, the voice input from the microphone 12 is converted into voice data by the voice input unit 8 and sent to the voice analysis processing unit 31. The voice analysis processing unit 31 analyzes the voice data sent from the voice input unit 8 and sends the analysis result to the matching processing unit 32. The matching processing unit 32 executes a matching process for comparing the analysis result sent from the speech analysis processing unit 31 with the vocabulary of the recognition dictionary selected at that time, and recognizes the obtained recognition vocabulary and its score. As a result, the result is sent to the recognition result determination processing unit 46 of the control means 4. The recognition result determination processing unit 46 determines the recognition vocabulary based on the recognition vocabulary and the score sent as the recognition result from the matching processing unit 32 of the speech recognition unit 3 and sends the recognition vocabulary to the screen switching processing unit 45. The screen switching processing unit 45 sends the recognition vocabulary sent from the recognition result determination processing unit 46 to the output information control means 5.

次いで、認識結果の提示が行われる（ステップＳＴ１１）。すなわち、画面切替処理部４５から認識語彙を受け取った出力情報制御手段５の画面出力生成手段５２は、この認識語彙に基づいて画面出力データを生成し、画面出力手段９に送る。これにより、モニタ１３に、認識結果の文字が表示される。また、出力情報制御手段５の音声出力生成手段５３は、画面切替処理部４５からの認識語彙に基づいて、音声出力データを生成し、音声出力手段１０に送る。これにより、スピーカ１４から、認識結果が音声により出力される。 Next, a recognition result is presented (step ST11). That is, the screen output generation unit 52 of the output information control unit 5 that has received the recognized vocabulary from the screen switching processing unit 45 generates screen output data based on the recognized vocabulary and sends it to the screen output unit 9. Thereby, the character of the recognition result is displayed on the monitor 13. Further, the audio output generation unit 53 of the output information control unit 5 generates audio output data based on the recognized vocabulary from the screen switching processing unit 45 and sends it to the audio output unit 10. Thereby, the recognition result is output from the speaker 14 by voice.

上記ステップＳＴ３において、認識開始ボタン１５の押下でないことが判断された場合は、音声認識処理が既に開始されている状態であれば音声認識処理が終了され（ステップＳＴ１３）、そのキーコードに応じた処理が実行される（ステップＳＴ１４）。 If it is determined in step ST3 that the recognition start button 15 has not been pressed, the voice recognition process is terminated if the voice recognition process has already been started (step ST13). Processing is executed (step ST14).

上記のように構成される実施の形態１に係る音声認識装置では、ユーザは、住所検索を行いたい場合は、認識開始ボタン１５を１度押下してからマイク１２に向けて発話し、曲名を検索したい場合は、認識開始ボタン１５を４度押下してからマイク１２に向けて発話する。これにより、認識開始ボタン１５の押下と一度の発話により、高い認識率で目的とする語彙を認識させることができる。 In the speech recognition apparatus according to the first embodiment configured as described above, when the user wants to perform an address search, the user presses the recognition start button 15 once, speaks to the microphone 12, and sets the song title. When searching, the user presses the recognition start button 15 four times and speaks to the microphone 12. Thereby, the target vocabulary can be recognized with a high recognition rate by pressing the recognition start button 15 and once speaking.

以上説明したように、この発明の実施の形態１に係る音声認識装置によれば、音声認識に使用する辞書を認識開始ボタン１５の押下によって選択した後、認識処理を行わせるように構成したので、確実な認識辞書の選択による語彙の絞り込みが可能となり、高い認識率で目的とする語彙を認識できる。また、認識辞書の切り替えと同時に、切り替えられた認識辞書をモニタ１３に表示するとともに音声で出力するようにしたので、認識可能な語彙をユーザに提示できる。また、音声認識に使用する認識辞書は、音声認識を開始させるための認識開始ボタン１５を共用して、該認識開始ボタン１５を押下した回数によって選択するように構成したので、音声認識装置を簡単且つ安価に構成できる。さらに、一音でない連続語に対しても、認識開始ボタン１５の押下と、一度の連続語の発話により、高い認識率で目的とする語彙を認識させることができる。 As described above, according to the speech recognition apparatus according to the first embodiment of the present invention, since the dictionary used for speech recognition is selected by pressing the recognition start button 15, the recognition process is performed. The vocabulary can be narrowed down by selecting a certain recognition dictionary, and the target vocabulary can be recognized with a high recognition rate. Simultaneously with the switching of the recognition dictionaries, the switched recognition dictionaries are displayed on the monitor 13 and output by voice, so that a recognizable vocabulary can be presented to the user. In addition, since the recognition dictionary used for speech recognition is configured to share the recognition start button 15 for starting speech recognition and is selected according to the number of times the recognition start button 15 is pressed, the speech recognition apparatus can be simplified. Moreover, it can be configured at low cost. Furthermore, the target vocabulary can be recognized at a high recognition rate by pressing the recognition start button 15 and uttering a continuous word once even for continuous words that are not one sound.

実施の形態２．
この発明の実施の形態２に係る音声認識装置は、音声認識辞書１を、語彙の先頭の表音文字によって分類された複数の認識辞書によって構成したものである。 Embodiment 2. FIG.
In the speech recognition apparatus according to Embodiment 2 of the present invention, the speech recognition dictionary 1 is constituted by a plurality of recognition dictionaries classified by the first phonetic character in the vocabulary.

この実施の形態２に係る音声認識装置の構成は、音声認識辞書１の構成を除けば、図１に示した実施の形態１に係る音声認識装置のそれと同じである。音声認識辞書１は、５０音表の行によって分類された１０個の認識辞書＃１〜＃１０から構成されている。認識辞書＃１はア行（あ、い、う、え、お）で始まる語彙（例えば、江ノ島水族館）を含む「ア行辞書」、認識辞書＃２はカ行（か、き、く、け、こ）で始まる語彙（例えば、葛西臨海公園）を含む「カ行辞書」、同様に、認識辞書＃３は「サ行辞書」、認識辞書＃４は「タ行辞書」、認識辞書＃５は「ナ行辞書」、認識辞書＃６は「ハ行辞書」、認識辞書＃７は「マ行辞書」、認識辞書＃８は「ヤ行辞書」、認識辞書＃９は「ラ行辞書」および認識辞書＃１０は「ワ行辞書」から構成できる。 The configuration of the speech recognition apparatus according to the second embodiment is the same as that of the speech recognition apparatus according to the first embodiment shown in FIG. 1 except for the configuration of the speech recognition dictionary 1. The speech recognition dictionary 1 is made up of ten recognition dictionaries # 1 to # 10 classified according to the 50-sound table rows. Recognition dictionary # 1 is an “a-line dictionary” that includes vocabularies (eg, Enoshima Aquarium) that begin with a-line (a, i, u, e, o), and recognition dictionary # 2 is k-line (ka, ki, k, ke , This) includes a vocabulary beginning with (for example, Kasai Rinkai Park), similarly, recognition dictionary # 3 is "sa line dictionary", recognition dictionary # 4 is "ta line dictionary", recognition dictionary # 5 Is a “na line dictionary”, a recognition dictionary # 6 is a “ha line dictionary”, a recognition dictionary # 7 is a “ma line dictionary”, a recognition dictionary # 8 is a “ya line dictionary”, and a recognition dictionary # 9 is a “la line dictionary”. The recognition dictionary # 10 can be composed of a “wa line dictionary”.

上記のように構成される実施の形態２に係る音声認識装置の動作は、認識開始ボタン１５の押下によって音声認識辞書１を構成する１０個の認識辞書＃１〜＃１０が順次切り替えられる点を除けば、上述した実施の形態１に係る音声認識装置の動作と同じである。 The operation of the speech recognition apparatus according to the second embodiment configured as described above is that the ten recognition dictionaries # 1 to # 10 constituting the speech recognition dictionary 1 are sequentially switched when the recognition start button 15 is pressed. Except for this, the operation is the same as that of the speech recognition apparatus according to Embodiment 1 described above.

以上説明したように、この発明の実施の形態２に係る音声認識装置によれば、認識開始ボタン１５が押下されるたびに、認識辞書が順に切り替えられ、また同時に、モニタ１３の画面およびスピーカ１４には、選択された認識辞書に応じた内容（例えば、「あいうえおで始まる語彙を発話できます」等）が出力される。従って、ユーザは、発話したい語彙が含まれる認識辞書に切り替えてから発話する。例えば、「葛西臨海公園」を発話したい場合には、カ行の辞書であるので、認識開始ボタン１５を２度押下してから発話する。これにより、認識開始ボタン１５の押下と一度の発話により、高い認識率で目的とする語彙を認識させることができる。 As described above, according to the speech recognition apparatus according to the second embodiment of the present invention, the recognition dictionary is sequentially switched every time the recognition start button 15 is pressed, and at the same time, the screen of the monitor 13 and the speaker 14 are switched. Is output in accordance with the selected recognition dictionary (for example, “can speak a vocabulary beginning with Aiueo”). Therefore, the user speaks after switching to the recognition dictionary including the vocabulary he / she wants to utter. For example, when the user wants to utter “Kasai Rinkai Park”, the utterance is made after pressing the recognition start button 15 twice because the dictionary is a Ka-line. Thereby, the target vocabulary can be recognized with a high recognition rate by pressing the recognition start button 15 and once speaking.

なお、この実施の形態２では、音声認識辞書１を、５０音表の行によって分類された１０個の認識辞書＃１〜＃１０から構成しているが、５０音表の列（アカサタナハマヤラワ等）によって分類された５個の認識辞書＃１〜＃５、各表音文字単位で分類された５０個の認識辞書＃１〜＃５０、または任意の表音文字をグループ化した複数の認識辞書から構成することもできる。 In the second embodiment, the speech recognition dictionary 1 is composed of ten recognition dictionaries # 1 to # 10 classified by 50-note table rows. Etc.), 5 recognition dictionaries # 1 to # 5 classified according to the above, 50 recognition dictionaries # 1 to # 50 classified according to each phonetic character, or a plurality of recognitions obtained by grouping arbitrary phonetic characters It can also consist of a dictionary.

実施の形態３．
この発明の実施の形態３に係る音声認識装置は、特定の認識辞書を常に認識対象辞書に設定できるようにしたものである。特定の認識辞書としては、例えば優先度の高い語彙を含む認識辞書を割り当てることができる。 Embodiment 3 FIG.
The speech recognition apparatus according to Embodiment 3 of the present invention is such that a specific recognition dictionary can always be set as a recognition target dictionary. As the specific recognition dictionary, for example, a recognition dictionary including a high-priority vocabulary can be assigned.

この実施の形態３に係る音声認識装置の構成は、認識辞書管理手段２に保持されている管理テーブル２１の記憶内容を除けば、図１に示した実施の形態１に係る音声認識装置の構成と同じである。 The configuration of the speech recognition apparatus according to the third embodiment is the same as that of the speech recognition apparatus according to the first embodiment shown in FIG. 1 except for the stored contents of the management table 21 held in the recognition dictionary management means 2. Is the same.

図７は、認識辞書管理手段２に保持されている管理テーブル２１の構成を示す図である。この管理テーブル２１は、辞書番号（＃１〜＃４）、辞書名（住所、施設名、電話番号、曲名）および状態（無効、有効または常時有効）を記憶している。そして、実施の形態１に係る管理テーブル２１と同様に、制御手段４から送られてくる辞書切替要求に応じて、認識対象とする１つの認識辞書が選択され、その選択された認識辞書の状態が「有効」に設定されるとともに、他の認識辞書は「無効」に設定される。ただし、「常時有効」に設定されている認識辞書は、認識開始ボタン１５の通常の押下によって発生される辞書切替要求によっては変更されない。 FIG. 7 is a diagram showing the configuration of the management table 21 held in the recognition dictionary management means 2. The management table 21 stores a dictionary number (# 1 to # 4), a dictionary name (address, facility name, telephone number, song name) and state (invalid, valid or always valid). Then, similarly to the management table 21 according to the first embodiment, one recognition dictionary to be recognized is selected in response to the dictionary switching request sent from the control unit 4, and the state of the selected recognition dictionary Is set to “valid”, and other recognition dictionaries are set to “invalid”. However, the recognition dictionary set to “always valid” is not changed by a dictionary switching request generated by the normal depression of the recognition start button 15.

所望の認識辞書を「常時有効」に設定する場合は、その認識辞書に対応する画面が表示されている状態で認識開始ボタン１５の長時間押しが行われる。例えば、曲名辞書を常時有効にするためには、図６（ｄ）に示す画面がモニタ１３に出力されている状態で、認識開始ボタン１５を長時間押しする。これにより、曲名辞書が「常時有効」に設定されて常に認識対象辞書とされる。従って、曲名辞書を常に認識対象辞書としながら、認識辞書を切り替えることができるので、例えば曲名辞書と施設名辞書といった複数の認識辞書を同時に認識対象辞書とすることができる。認識辞書管理手段２は、音声認識が行われる際に、有効および常時有効に設定された認識辞書の内容を音声認識手段３に送る。 When the desired recognition dictionary is set to “always enabled”, the recognition start button 15 is pressed for a long time while a screen corresponding to the recognition dictionary is displayed. For example, in order to make the song name dictionary always valid, the recognition start button 15 is pressed for a long time while the screen shown in FIG. As a result, the song name dictionary is set to “always valid” and is always set as the recognition target dictionary. Therefore, since the recognition dictionary can be switched while always using the song name dictionary as the recognition target dictionary, a plurality of recognition dictionaries such as a song name dictionary and a facility name dictionary can be simultaneously set as the recognition target dictionary. The recognition dictionary management means 2 sends the contents of the recognition dictionary set to valid and always valid to the voice recognition means 3 when voice recognition is performed.

次に、この発明の実施の形態３に係る音声認識装置の動作を、図８に示すフローチャートを参照しながら説明する。この図８に示したフローチャートは、図５に示した実施の形態１のフローチャートにステップＳＴ２１およびＳＴ２２が追加されて構成されている。以下では、図５に示したフローチャートと相違する部分についてのみ説明する。 Next, the operation of the speech recognition apparatus according to Embodiment 3 of the present invention will be described with reference to the flowchart shown in FIG. The flowchart shown in FIG. 8 is configured by adding steps ST21 and ST22 to the flowchart of the first embodiment shown in FIG. Hereinafter, only the parts different from the flowchart shown in FIG. 5 will be described.

ステップＳＴ３で認識開始ボタン１５のキーイベントであることが判断されると、次いで、認識開始ボタン１５の長時間押しがなされているかどうかが調べられる（ステップＳＴ２１）。具体的には、切替回数カウント処理部４１は、手動入力手段６からキー押下のキーイベントを受け取ってからキーリリースのキーイベントを受け取るまでの時間を計測し、この時間が所定値以上である場合に長時間押しがなされたと判断する。このステップＳＴ２１で、認識開始ボタン１５の長時間押しであることが判断されると、辞書を「常時有効」に設定する処理が行われる（ステップＳＴ２２）。すなわち、切替回数カウント処理部４１は、その時点で選択されている画面に対応する認識辞書の辞書番号を辞書切替処理部４３に送る。辞書切替処理部４３は、この辞書番号に応じて常時有効設定要求を発生し、認識辞書管理手段２に送る。認識辞書管理手段２は、この常時有効設定要求に応じて該当する認識辞書の状態を「常時有効」に設定する。ステップＳＴ２１で、認識開始ボタン１５の長時間押しでないことが判断されると、シーケンスはステップＳＴ４に進み、図５に示すフローチャートを参照して既に説明した処理が実行される。 If it is determined in step ST3 that it is a key event for the recognition start button 15, it is then checked whether or not the recognition start button 15 has been pressed for a long time (step ST21). Specifically, the switching count processing unit 41 measures the time from when a key depression key event is received from the manual input means 6 until the key release key event is received, and when this time is equal to or greater than a predetermined value. It is determined that the button has been pressed for a long time. If it is determined in step ST21 that the recognition start button 15 is pressed for a long time, a process of setting the dictionary to “always valid” is performed (step ST22). In other words, the switching count processing unit 41 sends the dictionary number of the recognition dictionary corresponding to the screen selected at that time to the dictionary switching processing unit 43. The dictionary switching processing unit 43 generates a constantly valid setting request according to this dictionary number and sends it to the recognition dictionary management means 2. The recognition dictionary management unit 2 sets the state of the corresponding recognition dictionary to “always valid” in response to the always valid setting request. If it is determined in step ST21 that the recognition start button 15 is not pressed for a long time, the sequence proceeds to step ST4, and the processing already described with reference to the flowchart shown in FIG. 5 is executed.

以上説明したように、この発明の実施の形態３に係る音声認識装置によれば、例えば優先度の高い語彙を含む認識辞書を認識開始ボタン１５の長時間押しによって「常時有効」に設定するとともに、他の認識辞書は認識開始ボタン１５の押下により切り替えることができるので、認識対象とする語彙の範囲を広げることができる。また、音声認識を開始させるための認識開始ボタン１５を共用して、該認識開始ボタン１５を長時間押下することによって選択するように構成したので、音声認識装置を簡単且つ安価に構成できる。 As described above, according to the speech recognition apparatus according to the third embodiment of the present invention, for example, a recognition dictionary including a high-priority vocabulary is set to “always valid” by pressing the recognition start button 15 for a long time. Since other recognition dictionaries can be switched by pressing the recognition start button 15, the range of vocabulary to be recognized can be expanded. Further, since the recognition start button 15 for starting the voice recognition is shared and the selection is made by pressing the recognition start button 15 for a long time, the voice recognition apparatus can be configured easily and inexpensively.

なお、常時有効に設定された認識辞書を無効状態に戻すのは、例えば、その認識辞書に対応する画面がモニタ１３に出力されている状態で、再度、認識開始ボタン１５を長時間押して行うように構成できる。 The recognition dictionary set to be always valid is returned to the invalid state by, for example, pressing the recognition start button 15 again for a long time while the screen corresponding to the recognition dictionary is being output to the monitor 13. Can be configured.

実施の形態４．
この発明の実施の形態４に係る音声認識装置は、実施の形態２に係る音声認識装置において、修飾語句を付けた語彙を発話させて音声認識し、この認識結果から語彙を絞り込むようにしたものである。 Embodiment 4 FIG.
A speech recognition apparatus according to Embodiment 4 of the present invention is a speech recognition apparatus according to Embodiment 2, in which a vocabulary with a modifier is uttered and speech recognition is performed, and the vocabulary is narrowed down based on the recognition result. It is.

上述したように、認識辞書管理手段２に保持されている管理テーブル２１に記憶されている複数の認識辞書の名称はモニタ１３上に表示される。従って、ユーザは、音声認識装置が保持している認識辞書の分類を把握できるので、辞書名を修飾語句として付加した認識語彙の発話、つまり、辞書名と認識語彙との発話を行う。例えば、「カ行の葛西臨海公園」という発話語彙をする。 As described above, the names of a plurality of recognition dictionaries stored in the management table 21 held in the recognition dictionary management unit 2 are displayed on the monitor 13. Therefore, since the user can grasp the classification of the recognition dictionary held by the speech recognition apparatus, the user speaks the recognized vocabulary with the dictionary name added as a modifier, that is, utters the dictionary name and the recognized vocabulary. For example, the utterance vocabulary “Kasai Rinkai Park” is used.

この実施の形態４に係る音声認識装置の構成は、図９に示すように、制御手段４の認識結果判定処理部４６から認識辞書管理手段２に辞書選択指令が送られる点が実施の形態２に係る音声認識装置の構成と異なる。また、認識辞書管理手段２の機能および認識結果判定処理部４６の機能が、実施の形態２に係る音声認識装置のそれらと異なる。以下では、実施の形態２に係る音声認識装置と異なる点についてのみ説明する。 As shown in FIG. 9, the configuration of the speech recognition apparatus according to the fourth embodiment is that a dictionary selection command is sent from the recognition result determination processing unit 46 of the control means 4 to the recognition dictionary management means 2. This is different from the configuration of the speech recognition apparatus according to the above. The functions of the recognition dictionary management unit 2 and the recognition result determination processing unit 46 are different from those of the speech recognition apparatus according to the second embodiment. Hereinafter, only differences from the speech recognition apparatus according to Embodiment 2 will be described.

この実施の形態４に係る音声認識装置の認識結果判定処理部４６および認識辞書管理手段２は、実施の形態２に係る音声認識装置のそれらに以下の機能が追加されることにより構成されている。すなわち、認識結果判定処理部４６は、音声認識手段３のマッチング処理部３２から送られてくる認識結果が、「ア行」、「カ行」、・・・という語彙である場合は、各語彙に対応する認識辞書を選択すべく旨の辞書選択指令を認識辞書管理手段２に通知する。認識辞書管理手段２は、認識結果判定処理部４６から送られてくる辞書選択指令に応じて認識対象とする１つの認識辞書を選択し、その選択した認識辞書の状態を有効に設定するとともに他の認識辞書を無効に設定する。 The recognition result determination processing unit 46 and the recognition dictionary management unit 2 of the speech recognition apparatus according to the fourth embodiment are configured by adding the following functions to those of the speech recognition apparatus according to the second embodiment. . That is, when the recognition result sent from the matching processing unit 32 of the speech recognition means 3 is the vocabulary “A line”, “K line”,. Is notified to the recognition dictionary management means 2 to select a recognition dictionary corresponding to. The recognition dictionary management means 2 selects one recognition dictionary to be recognized in accordance with the dictionary selection command sent from the recognition result determination processing unit 46, sets the selected recognition dictionary to be valid, and sets the other Disable the recognition dictionary.

次に、この発明の実施の形態４に係る音声認識装置の動作を説明する。今、「カ行の葛西臨海公園」と発話されたとすると、音声認識手段３は、まず、「カ行」という語彙を認識し、その旨を認識結果判定処理部４６に通知する。認識結果判定処理部４６は、「カ行」という語彙を認識した旨の通知を受けると、「カ行」の辞書を選択すべき旨の辞書選択指令を認識辞書管理手段２に送る。認識辞書管理手段２は、この辞書選択指令を受けて、「カ行」の認識辞書を選択する。これにより、次の音声認識では、「カ行」の認識辞書が使用される。 Next, the operation of the speech recognition apparatus according to Embodiment 4 of the present invention will be described. Assuming that the speech “Ka-Kasai Rinkai Park” is spoken, the speech recognition means 3 first recognizes the vocabulary “Ka-Line” and notifies the recognition result determination processing unit 46 of that. When receiving the notification that the vocabulary “ka line” has been recognized, the recognition result determination processing unit 46 sends a dictionary selection command to the effect that the “ka line” dictionary should be selected to the recognition dictionary management means 2. In response to this dictionary selection command, the recognition dictionary management means 2 selects a recognition dictionary for “K”. Thereby, in the next speech recognition, the recognition dictionary of “K” is used.

次に、音声認識手段３は、「葛西臨海公園」という語彙を認識し、その認識語彙とスコアを認識結果判定処理部４６に通知する。認識結果判定処理部４６は、音声認識手段３から認識結果として送られてくる認識語彙およびそのスコアに基づき認識語彙を確定し、画面切替処理部４５に送る。画面切替処理部４５は、認識結果判定処理部４６から送られてくる認識語彙を出力情報制御手段５に送る。これにより、モニタ１３に、認識結果の「葛西臨海公園」という文字が表示され、また、スピーカ１４から、認識結果の「葛西臨海公園」という音声が出力される。 Next, the voice recognition unit 3 recognizes the vocabulary “Kasai Rinkai Park” and notifies the recognition result determination processing unit 46 of the recognized vocabulary and the score. The recognition result determination processing unit 46 determines the recognition vocabulary based on the recognition vocabulary sent from the voice recognition means 3 as the recognition result and its score, and sends the recognition vocabulary to the screen switching processing unit 45. The screen switching processing unit 45 sends the recognition vocabulary sent from the recognition result determination processing unit 46 to the output information control means 5. As a result, the recognition result “Kasai Rinkai Park” is displayed on the monitor 13, and the recognition result “Kasai Rinkai Park” is output from the speaker 14.

以上説明したように、この発明の実施の形態４に係る音声認識装置によれば、ユーザの一度の発話により、辞書を絞り込んで認識処理が実行されるので、高い認識結果を得ることができる。 As described above, according to the speech recognition apparatus according to the fourth embodiment of the present invention, the recognition process is executed by narrowing down the dictionary by one utterance of the user, so that a high recognition result can be obtained.

実施の形態５．
この発明の実施の形態５に係る音声認識装置は、実施の形態２に係る音声認識装置において、音声認識手段３が認識結果を唯一に特定できなかった場合に、ユーザに語彙を絞り込むための言葉を付加した発話を促すようにしたものである。 Embodiment 5 FIG.
The speech recognition device according to Embodiment 5 of the present invention is a speech recognition device according to Embodiment 2, in which the speech recognition means 3 narrows the vocabulary to the user when the speech recognition means 3 cannot uniquely identify the recognition result. Is intended to encourage utterances.

この実施の形態５に係る音声認識装置の構成は、認識結果判定処理部４６の機能を除き、実施の形態４に係る音声認識装置の構成と同じである。以下では、実施の形態４に係る音声認識装置と異なる点についてのみ説明する。 The configuration of the speech recognition apparatus according to the fifth embodiment is the same as the configuration of the speech recognition apparatus according to the fourth embodiment except for the function of the recognition result determination processing unit 46. Hereinafter, only differences from the speech recognition apparatus according to Embodiment 4 will be described.

すなわち、認識結果判定処理部４６は、音声認識手段３のマッチング処理部３２から認識結果として送られてくる認識語彙およびそのスコアが所定の条件を満たしていない場合、例えば、スコアが所定値より小さい複数の認識語彙が得られた場合に、再発話を促すメッセージを生成して画面切替処理部４５に送る。画面切替処理部４５は、認識結果判定処理部４６から送られてくるメッセージを出力情報制御手段５に送る。これにより、モニタ１３に、再発話を促す文字が表示され、また、スピーカ１４から、再発話を促す音声が出力される。例えば、ユーザが「赤坂」と発話し、音声認識手段３から認識結果として「赤坂」と「高坂」の２通りの結果が得られた場合に、「あいうえおの赤坂ですか、たちつてとの高坂ですか？」というメッセージをモニタ１３に表示するとともにスピーカから音声で出力する。 That is, the recognition result determination processing unit 46, for example, if the recognition vocabulary sent from the matching processing unit 32 of the speech recognition means 3 and the score thereof do not satisfy a predetermined condition, the score is smaller than a predetermined value. When a plurality of recognized vocabularies are obtained, a message for prompting re-speech is generated and sent to the screen switching processing unit 45. The screen switching processing unit 45 sends the message sent from the recognition result determination processing unit 46 to the output information control means 5. As a result, the character prompting the re-speech is displayed on the monitor 13, and the sound promoting the re-speech is output from the speaker 14. For example, when the user speaks “Akasaka” and the speech recognition means 3 obtains two recognition results “Akasaka” and “Takasaka”, “Ai no Akasaka? Is displayed on the monitor 13 and is output from the speaker by voice.

これに応答して、ユーザが例えば「あいうえおの赤坂」と再発話を行った場合は、音声認識手段３は、まず、「あいうえおの赤坂」という語彙を認識し、その旨を認識結果判定処理部４６に通知する。認識結果判定処理部４６は、「あいうえおの赤坂」という語彙を認識した旨の通知を受けると、「ア行」の辞書を選択すべき旨の辞書選択指令を認識辞書管理手段２に送る。認識辞書管理手段２は、この辞書選択指令を受けて、「カ行」の認識辞書を選択する。これにより、「ア行」の認識辞書が使用される状態に設定される。次いで、音声認識手段３は、「ア行」の認識辞書を使用して再度認識処理を実行する。これにより、「赤坂」という語彙が最終的な認識結果として得られる。 In response to this, when the user re-speaks, for example, “Aiue no Akasaka”, the speech recognition means 3 first recognizes the vocabulary “Aiue no Akasaka” and recognizes that fact as a recognition result determination processing unit. 46 is notified. When the recognition result determination processing unit 46 receives a notification that the vocabulary “Aiue no Akasaka” has been recognized, it sends a dictionary selection command to the recognition dictionary management means 2 to select the “A line” dictionary. In response to this dictionary selection command, the recognition dictionary management means 2 selects a recognition dictionary for “K”. As a result, the recognition dictionary “A” is set to be used. Next, the voice recognition means 3 executes the recognition process again using the recognition dictionary of “A”. Thereby, the vocabulary “Akasaka” is obtained as a final recognition result.

以上説明したように、この発明の実施の形態５に係る音声認識装置によれば、再発話を促すメッセージに応答して選択された認識辞書を用いた認識結果を取得するように構成したので、認識結果を絞り込むことが可能となり、認識率を向上させることができる。 As described above, according to the speech recognition apparatus according to the fifth embodiment of the present invention, because the recognition result using the recognition dictionary selected in response to the message prompting the recurrent speech is acquired, The recognition result can be narrowed down, and the recognition rate can be improved.

この発明の実施の形態１に係る音声認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the speech recognition apparatus which concerns on Embodiment 1 of this invention. この発明の実施の形態１に係る音声認識装置で使用される管理テーブルの記憶内容を示す図である。It is a figure which shows the memory content of the management table used with the speech recognition apparatus which concerns on Embodiment 1 of this invention. この発明の実施の形態１に係る音声認識装置で使用される出力情報テーブルの記憶内容を示す図である。It is a figure which shows the memory content of the output information table used with the speech recognition apparatus which concerns on Embodiment 1 of this invention. この発明の実施の形態１に係る音声認識装置で使用される切替対応テーブルの記憶内容を示す図である。It is a figure which shows the memory content of the switching corresponding table used with the speech recognition apparatus which concerns on Embodiment 1 of this invention. この発明の実施の形態１に係る音声認識装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech recognition apparatus which concerns on Embodiment 1 of this invention. この発明の実施の形態１に係る音声認識装置で出力される画面表示および音声出力の例を説明するための図である。It is a figure for demonstrating the example of the screen display output by the speech recognition apparatus which concerns on Embodiment 1 of this invention, and an audio | voice output. この発明の実施の形態３に係る音声認識装置で使用される管理テーブルの記憶内容を示す図である。It is a figure which shows the memory content of the management table used with the speech recognition apparatus which concerns on Embodiment 3 of this invention. この発明の実施の形態３に係る音声認識装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech recognition apparatus which concerns on Embodiment 3 of this invention. この発明の実施の形態４に係る音声認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the speech recognition apparatus which concerns on Embodiment 4 of this invention.

Explanation of symbols

１音声認識辞書、２認識辞書管理手段、３音声認識手段、４制御手段、５出力情報制御手段、６手動入力手段、７キーコード判別手段、８音声入力手段、９画面出力手段（出力手段）、１０音声出力手段（出力手段）、１１リモコン、１２マイク、１３モニタ（出力手段）、１４スピーカ（出力手段）、１５認識開始ボタン、２１管理テーブル、３１音声分析処理部、３２マッチング処理部、４１切替回数カウント処理部、４２切り替え対応テーブル、４３辞書切替処理部、４４認識エンジン制御処理部、４５画面切替処理部、４６認識結果判定処理部、５１出力情報テーブル、５２画面出力生成手段、５３音声出力生成手段。 DESCRIPTION OF SYMBOLS 1 Voice recognition dictionary, 2 Recognition dictionary management means, 3 Voice recognition means, 4 Control means, 5 Output information control means, 6 Manual input means, 7 Key code discrimination means, 8 Voice input means, 9 Screen output means (output means) DESCRIPTION OF SYMBOLS 10 Voice output means (output means), 11 Remote control, 12 Microphone, 13 Monitor (output means), 14 Speaker (output means), 15 Recognition start button, 21 Management table, 31 Voice analysis process part, 32 Matching process part, 41 switching frequency count processing unit, 42 switching correspondence table, 43 dictionary switching processing unit, 44 recognition engine control processing unit, 45 screen switching processing unit, 46 recognition result determination processing unit, 51 output information table, 52 screen output generation unit, 53 Audio output generation means.

Claims

Multiple recognition dictionaries for recognizing speech,
A recognition start button for starting speech recognition;
Control means for effectively setting one of the plurality of recognition dictionaries according to the number of times the recognition start button is operated;
Voice recognition means for performing voice recognition using a recognition dictionary effectively set by the control means ,
The control means sets the selected recognition dictionary to be always valid when the recognition start button is continuously pressed for a predetermined time or more with one recognition dictionary selected.
The speech recognition apparatus according to claim 1, wherein the speech recognition means performs speech recognition using a recognition dictionary set to be valid and always valid .

Multiple recognition dictionaries classified by vocabulary type, address dictionary including vocabulary used for address, facility name dictionary including vocabulary used for facility name, phone number dictionary including vocabulary used for phone number The speech recognition apparatus according to claim 1, further comprising a song name dictionary including vocabulary used for music names.

The speech recognition apparatus according to claim 1, wherein the plurality of recognition dictionaries are classified according to the first phonetic character of the vocabulary.

A plurality of recognition dictionaries are classified according to the rows of the 50 syllabary, “A line dictionary”, “K line dictionary”, “Sa line dictionary”, “Ta line dictionary”, “N line dictionary”, “Ha line dictionary”. 4. A speech recognition apparatus according to claim 3, comprising: a “ma line dictionary”, a “ya line dictionary”, a “la line dictionary”, and a “wa line dictionary”.

Output means for outputting information representing the classification of a plurality of recognition dictionaries,
The control means effectively sets one of a plurality of recognition dictionaries according to the utterance of the information representing the classification when the utterance of the information representing the classification output to the output means is recognized by the speech recognition means. The speech recognition apparatus according to claim 1.

An output means for outputting information representing a message;
When a plurality of recognized vocabulary words are obtained by voice recognition by the voice recognition means, the control means outputs a message prompting the utterance to narrow down the vocabulary to the output means, and responds to the utterance made in response to the message To enable one of multiple recognition dictionaries,
2. The speech recognition apparatus according to claim 1, wherein the speech recognition means performs speech recognition again using a recognition dictionary that is effectively set by the control means.