JP2004062517A

JP2004062517A - Voice controller, voice control method and program

Info

Publication number: JP2004062517A
Application number: JP2002219831A
Authority: JP
Inventors: Minako Miyamoto; 宮本　美奈子
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2002-07-29
Filing date: 2002-07-29
Publication date: 2004-02-26

Abstract

<P>PROBLEM TO BE SOLVED: To provide a convenient user interface enabling inputs to be received from user interfaces for both voice and GUI, thus using them alternatively according to situations. <P>SOLUTION: A voice controller includes a dictionary retaining means 3 for retaining a dictionary in which the rules of converting voice into character strings are stored; a voice recognition means 2 for converting input voice into characters by consulting the dictionary; a command conversion means 4 for converting the recognized character strings into command strings and setting the command strings in the GUI; a GUI supervising mans 5 for supervising the state of the GUI; a coordination data retaining means 6 for retaining coordination data that coordinates the state of the GUI recognized by the GUI supervising means 5 with the method of voice recognition and the dictionary which the voice recognition means 2 consults; and a coordination changing means 7 for reading the coordination data corresponding to the state of the GUI from the coordination data retaining means 6 and changing the method of voice recognition and the dictionary which the voice recognition means 2 consults. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、音声制御装置、音声制御方法、およびプログラムに関し、特に音声認識技術を用いたユーザインタフェースに関する。
【０００２】
【従来の技術】
従来の技術としては、特開平１０−２２２３３７の「コンピュータシステム」のように、音声によりウインドウやアプリケーション制御を行うものが知られている。この方式では、ウインドウやアプリケーションの制御コマンドとキーワードとを対応付けておき、キーワードを音声認識用の辞書として登録しておく。音声入力よりキーワードが認識されると、キーワードに対応した制御コマンドが実行される。
【０００３】
こうした音声認識では、共通の辞書と各アプリケーション用の辞書を用意しておく。アプリケーション毎に辞書を用意することにより、同じ発声でもアプリケーションごとに違う操作を割り当てることが可能になる。アプリケーション用の辞書は、音声により対象のアプリケーションが起動されたり、ウインドウが切り替わったりすると切り替わる仕組みになっている。
【０００４】
【発明が解決しようとする課題】
前述の方式では、ＧＵＩの状態を監視するのでなく、入力された音声の認識結果によりアプリケーション用の辞書を切り替えている。このような場合、マウスやタッチパネルなどの音声入力以外の方法でウインドウやアプリケーションの選択状態を変更すると、対応した辞書への切り替えは起こらない。そのため、アプリケーションやウインドウの選択状態と認識用辞書のとの対応関係が擦れてしまうため、音声での制御を続行することが不可能になるという問題点があった。
【０００５】
一般に、音声を用いたユーザインタフェースは、多項目から一つのを選択する際には、選択候補を表示する空間や選択の手順を削減できるという利点がある。たとえば、多項目からの選択をプルダウンメニューなどのグラフィカルユーザインタフェース（ＧＵＩ）で実現しようとすると、メニューに階層構造をもたせて何度もメニュー表示と選択とを繰り返したり、表示した膨大な候補の中から該当する項目を探すために時間を要する。
【０００６】
本発明の目的は、音声とＧＵＩの両方のユーザインタフェースからの入力可能にすることにより、状況に応じて使い分けることができる使い勝手のよいユーザインタフェースを実現する音声制御装置、音声制御方法、およびプログラム提供することである。
【０００７】
【課題を解決するための手段】
本発明の第１の音声制御装置は、グラフィカルユーザインタフェース（ＧＵＩ）と、
音声から文字列へ変換する規則を格納した辞書を保持する辞書保持手段と、入力した音声を前記辞書を参照して文字に変換する音声認識手段と、音声認識手段で認識された文字列をコマンド列に変換しＧＵＩに設定するコマンド変換手段と、ＧＵＩの状態を監視するＧＵＩ監視手段と、ＧＵＩ監視手段が認識したＧＵＩの状態と前記音声認識手段が参照する音声認識の方式や辞書とを対応付ける対応付けデータを保持する対応付けデータ保持手段と、ＧＵＩの状態に対応する対応付けデータを対応付けデータ保持手段から読み込み音声認識手段で参照する音声認識方式や辞書を切り替える対応付け変更手段とを有することを特徴とする。
【０００８】
本発明の第２の音声制御装置は、第１の音声制御装置において、選択されたＧＵＩの識別情報を取得するＧＵＩ取得手段と、前記音声認識手段の認識結果を前記ＧＵＩに設定する方法を含む設定方法データを保持する設定方法データベースと、ＧＵＩの識別情報および前記ＧＵＩが選択された場合に使用する辞書と設定方法データベースに保持されるＧＵＩへの設定方法とを関連付けた対応付けデータを生成し前記対応付けデータ保持手段へ格納する関連付け手段とを有することを特徴とする。
【０００９】
本発明の第１の音声制御方法は、音声から文字列へ変換する規則を格納した辞書を参照して入力した音声文字に変換する音声認識ステップと、音声認識ステップで認識された文字列をコマンド列に変換しグラフィカルユーザインタフェース（ＧＵＩ）に設定するコマンド変換ステップと、ＧＵＩの状態と前記音声認識手段が参照する音声認識の方式や辞書とを対応付ける対応付けデータを保持する対応付けデータ保持手段からＧＵＩの状態に応じた対応付けデータを読み込み音声認識ステップで参照する音声認識方式や辞書を切り替える対応付け変更ステップとを有することを特徴とする。
【００１０】
本発明の第２の音声制御方法は、第１の音声制御方法において、選択されたＧＵＩの識別情報を取得するＧＵＩ取得ステップと、ＧＵＩの識別情報および前記ＧＵＩが選択された場合に使用する辞書と前記音声認識ステップの認識結果を前記ＧＵＩに設定する方法とを関連付けた対応付けデータを生成し前記対応付けデータ保持手段へ格納する関連付けステップとを有することを特徴とする。
【００１１】
本発明の第１のプログラムは、音声から文字列へ変換する規則を格納した辞書を参照して入力した音声文字に変換する音声認識ステップと、音声認識ステップで認識された文字列をコマンド列に変換しグラフィカルユーザインタフェース（ＧＵＩ）に設定するコマンド変換ステップと、ＧＵＩの状態と前記音声認識手段が参照する音声認識の方式や辞書とを対応付ける対応付けデータを保持する対応付けデータ保持手段からＧＵＩの状態に応じた対応付けデータを読み込み音声認識ステップで参照する音声認識方式や辞書を切り替える対応付け変更ステップとをコンピュータに実行させる。
【００１２】
本発明の第２のプログラムは、第１のプログラムにおいて、選択されたＧＵＩの識別情報を取得するＧＵＩ取得ステップと、ＧＵＩの識別情報および前記ＧＵＩが選択された場合に使用する辞書と前記音声認識ステップの認識結果を前記ＧＵＩに設定する方法とを関連付けた対応付けデータを生成し前記対応付けデータ保持手段へ格納する関連付けステップとをコンピュータに実行させる。
【００１３】
【発明の実施の形態】
次に、本発明の第１の実施の形態について図面を参照して詳細に説明する。
図１を参照すると、本発明の第１の実施の形態は、キーボードなどの入力手段と表示装置などの出力手段と様々なグラフィカルユーザインタフェース（ＧＵＩ）とを含むＧＵＩ１と、マイク等の音声入力機器を通じて入力された音声データを文字に変換する音声認識手段２と、音声データを文字に変換するための辞書を保持する辞書保持手段３と、音声認識手段２において変換された文字をコマンドに変換しＧＵＩ１に入力するコマンド変換手段４と、ＧＵＩ１の状態を監視するＧＵＩ監視手段５と、ＧＵＩ１の状態と音声認識手段２における音声認識の方式やアクティブにする辞書との対応付けを行う対応付けデータを保持する対応付けデータ保持手段６と、ＧＵＩ１の状態が変化した場合、ＧＵＩ監視手段５からの通知をうけて対応付けデータ保持手段６より現在アクティブになっているＧＵＩ１に対応する対応付けデータを読み込み、音声認識手段２の認識方式やアクティブにする辞書を切り替える対応付け変更手段７より構成される。
【００１４】
図２〜図６を参照して、本実施の形態の動作について詳細に説明する。
図２は、ＧＵＩ１の動作を説明するための図である。ＧＵＩ１はユーザとの様々なインタフェースに対応した複数のＧＵＩを含んでおり、図２はＧＵＩの一例として、画面にウインドウ２１、ウインドウ２２、ウインドウ２３が起動されており、ウインドウ２１がアクティブになっていることを示している。また、ウインドウ２１は、テキストボックス２０２、リストボックス２０３、ボタン２０４より構成されている。
【００１５】
ウインドウ２１では、テキストボックス２０２にキーボードで駅名を入力し、リストボックス２０３より時刻を選択し、ボタン２０４をクリックして検索を行う操作を示している。
【００１６】
図３は、対応付けデータ保持手段６に格納されたデータの一例である。
対応付けデータは、アクティブになっているＧＵＩに応じた辞書を切り替えるための規則が格納されている。図３の例では、データ１からデータＮのＮ個の対応付けデータが格納されている。各対応付けデータは、アクティブになっているアプリケーション名、ウインドウ名、ＧＵＩ名、切り替えるべき辞書名、コマンド変換方法より構成される。コマンド変換方法とは、音声が認識された場合、認識結果を用いてＧＵＩを制御する方法を意味している。
【００１７】
図３を参照すると、データ１は、アプリケーション１のウインドウ２１上のテキストボックス２０２がアクティブになったとき、一般地名強化辞書１に辞書を切り替え、上記が実行された場合、認識結果がテキストボックス２０２に挿入されること意味している。
【００１８】
データ２では、アプリケーション１のウインドウ２１上のリストボックス２０３がアクティブになったとき、時刻辞書２に辞書を切り替え、上記が実行された場合、認識結果がテキストボックス２０２の表示が認識結果に応じて切り替えられること意味している。
【００１９】
データ３では、アプリケーション１のウインドウ２１においてテキストボックス２０２およびリストボックス２０３がともにアクティブでないとき、メニュー・ボタン辞書３に切り替え、上記が実行された場合、認識結果に該当するＧＵＩ名のボタンまたは、メニューをクリックすることを意味している。
【００２０】
データＮでは、データＮ以外で指定された以外のＧＵＩの状態にアプリケーション辞書Ｎを起動し、上記が起動されたとき認識結果に該当するアプリケーション名のアプリケーションを起動することを意味している。
【００２１】
図４は、辞書保持手段３に格納される辞書の一例である。図４では、音声認識方式がディクテーションの場合の辞書とＣＦＧの場合の辞書と２種類の例を説明する。
【００２２】
図４（ａ）および（ｂ）はディクテーション用の辞書の一例である。ディクテーション用の辞書では、個々の単語の読みや表記や品詞や出現頻度等を記録するための単語列データと、単語間の出現頻度とを持つことにより、比較的広範囲の発話の認識が可能になる。
【００２３】
図４（ａ）は、前述の読みや表記や出現頻度等の個々の単語を記録するための単語列データの一例である。図４（ａ）に示すように、単語列データは、辞書名、登録する品詞、読み、表記、単語出現頻度より構成される。
【００２４】
この例では、単語列データに「固有名詞　地名」、「名詞」、「助詞」、「助動詞」の４種類の品詞が登録されている。「固有名詞　地名」として、「とうきょう」、「おおさか」、「きょうと」の読みを持つ３つの単語が登録されており、それぞれの表記は、「東京」、「京都」、「大阪」となっている。また、これらの単語出現頻度は、２．０となっている。
【００２５】
「名詞」として、「えき」、「し」、「まち」の３つの単語が登録されており、それぞれの表記は、「駅」、「市」、「町」となっている。また、これらの単語出現頻度は、「えき」が１．０、「し」および「まち」が０．５になっている。「助詞」として、「で」、「は」の２つの単語が登録されている。「助動詞」として、「です」、「ます」の２つの単語が登録されている。
【００２６】
なお、単語出現頻度は、個々の単語ごとの出現のしやすさで、値が大きいほど認識結果として出現する確率が高いことを意味している。前述の例では、固有名詞は、４つの品詞の中でもっとも出現しやすいことを示している。また、名詞の中では、「えき」は「まち」や「し」に比べて出現しやすいことを意味している。
【００２７】
図４（ｂ）は、ディクテーション用の品詞間の出現頻度を示した表である。この表では、「固有名詞　地名」と「名詞」の出現頻度を１．２とし、助動詞と助詞の出現頻度を０．１としている。これは、「固有名詞　地名」と「地名」の組み合わせの方が「助動詞」と「助詞」の組み合わせよりも出現しやすいことを示している。
【００２８】
図４（ｃ）および（ｄ）は、ＣＦＧ用の辞書の一例である。ＣＦＧ用の辞書では、単語列と単語列の組み合わせで記述する。
図４（ｃ）は、単語列の一例である。この列では、「時刻」、「接頭語」、「語尾」の３種類の単語列の登録例を示している。各単語列は、読みと表記より構成されている。単語列「時刻」では、４つの単語を登録しており、それぞれの読みは、「ろくじ」、「ごぜんろくじ」、「あさろくじ」、「あさのろくじ」、対応する表記は、「６：００」としている。
【００２９】
図４（ｄ）は、文法の一例である。文法は認識可能な単語列の順列で定義する。この例では、３つの単語列の順列を示しており、文法１では単語列「接頭語」と単語列「時刻」と単語列「語尾」の組み合わせが認識できることを示しており、このような組み合わせを定義することで「あのー、あさのろくじです」を認識することができる。同様に、文法２では、単語列「接頭語」と単語列「時刻」の組み合わせが認識できることを示しており、このような組み合わせを定義することで「ええと、あさのろくじ」を認識することができる。文法３では、単語列「時刻」と単語列「語尾」の組み合わせが認識できることを示しており、このような組み合わせを定義することで「ろくじにしてください」を認識することができる。
【００３０】
図５は、ＧＵＩが変更された場合の動作を説明するためのフローチャートである。前述の具体例を用いて、ウインドウ２１のアクティブなＧＵＩがテキストボックス２０２から同じウインドウ２１内のリストボックス２０３に変更された場合を説明する。
【００３１】
ＧＵＩ監視手段５では、保存されたＧＵＩの情報と現在アクティブなＧＵＩの情報とを比較し、異なっていれば、アクティブなＧＵＩが変更されたと判断する。ＧＵＩの情報は、アプリケーション名、ウインドウ名、ＧＵＩ名、ＧＵＩの識別番号で記述する。ＧＵＩの識別番号は、起動中のＧＵＩを識別するための番号で、この値を記録しておけば、同じウインドウ名、同じＧＵＩ名のＧＵＩがあっても、ＧＵＩが変更されたことがわかる。なお、ＧＵＩの変更がなければ終了する（ステップＡ１）。
【００３２】
ＧＵＩが変更された場合、現在のＧＵＩの情報をＧＵＩ監視手段５と対応付け変更手段７に送り、更新する（ステップＡ２）。
【００３３】
対応付け変更手段７では、これを受けて、対応付けデータ保持手段６から対応するデータを読み込む（ステップＡ３）。前述の例で説明すると、アプリケーション名がアプリ１、ウインドウ名がウインドウ２１、ＧＵＩ名がリストボックス２０３の場合に対応する対応付けデータであるデータ２が対応付けデータ保持手段６より読み込まれる。
【００３４】
また、対応付け変更手段７は、音声認識手段２に、辞書変更命令を送る。辞書変更命令では、切り替えるべき辞書名も併せて送る。音声認識手段２はこれを受けて、辞書を切り替える（ステップＡ４）。前述の例で説明すると、音声認識手段２では、選択する辞書であるデータ２の時刻辞書２に変更する。
【００３５】
さらに、対応付け変更手段７は、ステップＡ３で読み込んだ対応付けデータに記述されたコマンド変換方法をコマンド変換手段４に送り、コマンド変換手段４では、コマンド変換方法を変更する（ステップＡ５）。前述の例で説明すると、コマンド変換手段４に、データ２のコマンド変換方法が送られる。これにより、コマンド変換手段４は、音声認識手段２より認識結果として値が送られてきた場合、値と同名に表示を変更するようになる。
【００３６】
図６は、音声認識手段２に音声が入力された場合の動作を説明するためのフローチャートである。前述の例に従って、テキストボックス２０２がアクティブになっている場合を一例として説明する。
【００３７】
音声認識手段２に音声が入力されると、現在選択中の辞書を使用して音声認識を行う（ステップＢ１）。音声認識の結果、認識結果が得られない場合は終了する。
【００３８】
音声が認識された場合、入力した音声と最も近い「よみ」が認識結果として返す（ステップＢ２）。読みに対応する値が設定されているので、この値を取得する（ステップＢ３）。前述の図４で説明したように、選択中の辞書が一般地名強化辞書１で、読みが「とうきょう」の場合、値として「東京駅」が取得される。
【００３９】
コマンド変換手段４では、あらかじめ、現在のＧＵＩの状態に対応した対応付けデータを対応付け変更手段７を通して読み込んでいる。前述の値を受けて対応付けデータに記述されたＧＵＩへの設定方法にＧＵＩに設定する（ステップＢ４）。前述の例に従って説明すると、テキストボックス２０２がアクティブになっている場合、コマンド変換手段４には、データ１が読み込まれている。ステップＡ３で取得した値が「東京」である場合、コマンド変換手段４は、データ１に従って、テキストボックス２０２に「東京」を設定する。
【００４０】
次に、本発明の第２の実施の形態について図面を参照して説明する。第２の実施の形態は、第１の実施の形態における対応付けデータの生成に関するものである。
【００４１】
図７に示すように本発明の第２の実施の形態の構成は、ＧＵＩ１と、選択中のＧＵＩの識別情報を取得するＧＵＩ取得手段７０２と、音声認識用の辞書保持手段３と、前述の文法を用いて認識した結果をＧＵＩに設定する方法を格納した設定方法データベース７０３と、前述のＧＵＩ取得手段７０２で取得したＧＵＩの識別情報と前述のＧＵＩが選択された場合に使用する辞書とＧＵＩへの設定方法とを関連付けて設定方法を作成する対応付け手段７０４と、作成した対応付けデータを格納する対応付けデータ保持手段６とを含む。
【００４２】
次に第２の実施の形態の動作を第１の実施の形態で用いた例に基づいて説明する。図８は、第２の実施の形態の動作を説明するためのフローチャートである。まず、対応付けデータを作成する対象となるＧＵＩを含むアプリケーションおよびウインドウを起動し、対象のＧＵＩをアクティブにする（ステップＣ１）。
【００４３】
以下のフローチャートの動作の具体例として、図２に示したウインドウ２１におけるテキストボックス２０２、リストボックス２０３、ボタン２０４に対応する対応付けデータを作成する方法を説明する。ステップＣ１を受けて、ＧＵＩの識別情報を取得する（ステップＣ２）。
【００４４】
図９はＧＵＩの識別情報を説明するための図である。ＧＵＩの識別情報としては、ウインドウ名、ＧＵＩ名、ＧＵＩのタイプを含む。テキストボックス２０２をアクティブにした場合、ウインドウ名としてウインドウ２１、ＧＵＩ名としてテキストボックス２０２、ＧＵＩタイプとしてテキストボックス、であること意味している。
【００４５】
ステップＣ２のＧＵＩの識別情報を受けて、設定方法データベース７０３より設定方法データを取得する（ステップＣ３）。設定方法は、ＧＵＩの型毎に音声認識手段２の結果をＧＵＩ設定する方法である。設定方法データは、設定方法名と設定するＧＵＩの型とＧＵＩへの設定方法から構成される。
【００４６】
図１０は、設定方法データベース７０３の一例を示し、設定方法データベース７０３には設定方法１から設定方法３までの３つの設定方法データが格納されている。設定方法１では、設定するＧＵＩの型がテキストボックスの場合、音声認識手段２での結果をテキストボックスに設定することを定めている。設定方法２では、設定するＧＵＩの型がリストボックスの場合、音声認識手段２での結果と一致する表記をリストの中から選択して表示することを定めている。設定方法３では、設定するＧＵＩの型がボタンまたはメニューの場合、音声認識手段２の結果と一致するボタンまたはメニューを実行することを定めている。例えば、図９のステップＣ２において識別されたＧＵＩがテキストボックス型である場合、設定方法１が選択される。
【００４７】
次に、選択したＧＵＩに対応させる辞書を辞書保持手段３より選択する（ステップＣ４）。辞書保持手段３にはあらかじめ作成された辞書が複数格納されている。
【００４８】
最後に、ステップＣ２で取得したＧＵＩの識別情報とステップＣ３で取得した設定方法とステップＣ４で設定した辞書とを組み合わせて対応付けデータとし、対応付けデータ保持手段６に保存する。
【００４９】
【発明の効果】
第１の効果は、音声で入力ＧＵＩを制御できることにより、キーボード、マウス、タッチパネルなどの入力装置に非接触でＧＵＩを制御できることである。
【００５０】
第２の効果は、全てのＧＵＩを監視し、監視対象のアプリケーションとは別にプログラムを持つことで、既存のアプリケーションに音声入力機能を追加できることである。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態の構成図である。
【図２】本発明の第１の実施の形態の動作を説明するためのＧＵＩの一例である。
【図３】本発明の第１の実施の形態の動作を説明するための対応付けデータ６に格納されたデータの一例である。
【図４】本発明の第１の実施の形態の動作を説明するための辞書保持手段３に格納された辞書の一例である。
【図５】本発明の第１の実施の形態の動作を説明するためのフローチャートである。
【図６】本発明の第１の実施の形態の動作を説明するためのフローチャートである。
【図７】本発明の第２の実施の形態の構成図である。
【図８】本発明の第２の実施の形態の動作を説明するためのフローチャートである。
【図９】本発明の第２の実施の形態の動作を説明するための選択中のＧＵＩの情報の一例である。
【図１０】本発明の第２の実施の形態の動作を説明するための設定方法データベース７０３に格納されたデータの一例である。
【符号の説明】
１　　ＧＵＩ
２　　音声認識手段
３　　辞書保持手段
４　　コマンド変換手段
５　　ＧＵＩ監視手段
６　　対応付けデータ保持手段
７　　対応付け変更手段
２１、２２、２３　　ウインドウ
２０２　　テキストボックス
２０３　　リストボックス
２０４　　ボタン
７０２　　ＧＵＩ取得手段
７０３　　設定方法データベース
７０４　　対応付け手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice control device, a voice control method, and a program, and more particularly, to a user interface using a voice recognition technology.
[0002]
[Prior art]
As a conventional technique, there is known a technique in which windows and applications are controlled by voice, such as a "computer system" disclosed in Japanese Patent Application Laid-Open No. 10-222337. In this method, control commands for windows and applications are associated with keywords, and the keywords are registered as a dictionary for speech recognition. When the keyword is recognized from the voice input, a control command corresponding to the keyword is executed.
[0003]
In such speech recognition, a common dictionary and a dictionary for each application are prepared. By preparing a dictionary for each application, it becomes possible to assign a different operation to each application even for the same utterance. The dictionary for the application is configured to switch when the target application is activated by voice or when the window is switched.
[0004]
[Problems to be solved by the invention]
In the above-described method, the dictionary for the application is switched according to the recognition result of the input voice, instead of monitoring the state of the GUI. In such a case, if the selection state of the window or application is changed by a method other than voice input such as a mouse or a touch panel, switching to the corresponding dictionary does not occur. Therefore, the correspondence between the selection state of the application or the window and the dictionary for recognition is rubbed, so that there is a problem that it is impossible to continue the control by voice.
[0005]
In general, a user interface using voice has an advantage that, when one of multiple items is selected, a space for displaying a selection candidate and a selection procedure can be reduced. For example, when trying to realize selection from multiple items with a graphical user interface (GUI) such as a pull-down menu, a menu is given a hierarchical structure, and menu display and selection are repeated many times. It takes time to find the corresponding item from.
[0006]
An object of the present invention is to provide a voice control device, a voice control method, and a program that realize an easy-to-use user interface that can be properly used depending on the situation by enabling input from both voice and GUI user interfaces. It is to be.
[0007]
[Means for Solving the Problems]
A first voice control device of the present invention includes a graphical user interface (GUI);
Dictionary holding means for holding a dictionary storing rules for converting voice to character strings; voice recognition means for converting input voice to characters by referring to the dictionary; and a command for inputting a character string recognized by the voice recognition means. A command conversion means for converting into a column and setting the GUI, a GUI monitoring means for monitoring the status of the GUI, and associating the GUI status recognized by the GUI monitoring means with a voice recognition method or dictionary referred to by the voice recognition means. An association data holding unit that holds the association data; and an association change unit that reads the association data corresponding to the state of the GUI from the association data holding unit and switches a voice recognition method or a dictionary that is referred to by the voice recognition unit. It is characterized by the following.
[0008]
The second voice control device of the present invention includes, in the first voice control device, a GUI obtaining means for obtaining identification information of a selected GUI, and a method of setting a recognition result of the voice recognition means in the GUI. A setting method database that holds setting method data, association data that associates identification information of the GUI, a dictionary used when the GUI is selected, and a setting method for the GUI held in the setting method database are generated. An association unit for storing the data in the association data holding unit.
[0009]
A first voice control method according to the present invention includes a voice recognition step of converting a voice string input in reference to a dictionary storing rules for converting a voice into a character string, and a command for converting the character string recognized in the voice recognition step into a command. A command conversion step of converting into a column and setting the same in a graphical user interface (GUI); and a correspondence data holding unit for holding association data for associating a GUI state with a speech recognition method or a dictionary referred to by the speech recognition unit. And an association changing step of switching the dictionary or the voice recognition method referred to in the voice recognition step by reading the association data according to the state of the GUI.
[0010]
A second voice control method according to the present invention, in the first voice control method, comprises: a GUI obtaining step of obtaining identification information of a selected GUI; and a dictionary used when the GUI is selected and the GUI is selected. And a method of generating association data in which the method associates a method of setting a recognition result of the voice recognition step with the GUI and storing the association data in the association data holding unit.
[0011]
According to a first program of the present invention, a speech recognition step of converting a speech string input by referring to a dictionary storing rules for converting speech to a character string, and converting the character string recognized in the speech recognition step into a command string A command conversion step of converting and setting a GUI to a graphical user interface; and a mapping data holding means for holding mapping data for associating a GUI state with a speech recognition method or a dictionary referred to by the speech recognition means. The computer causes the computer to execute an association change step of reading the association data according to the state and switching the speech recognition method or dictionary referred to in the audio recognition step.
[0012]
According to a second program of the present invention, in the first program, a GUI acquisition step of acquiring identification information of a selected GUI, identification information of the GUI, a dictionary used when the GUI is selected, and the voice recognition And causing the computer to execute an associating step of generating association data that associates the method of setting the recognition result of the step with the GUI and storing the association data in the association data holding unit.
[0013]
BEST MODE FOR CARRYING OUT THE INVENTION
Next, a first embodiment of the present invention will be described in detail with reference to the drawings.
Referring to FIG. 1, a first embodiment of the present invention relates to a GUI 1 including input means such as a keyboard, output means such as a display device, and various graphical user interfaces (GUIs), and a voice input device such as a microphone. Recognizing means 2 for converting the voice data input through the PC into characters, dictionary holding means 3 for holding a dictionary for converting the voice data into characters, and converting the characters converted by the voice recognizing means 2 into commands. Command conversion means 4 to be input to the GUI 1, GUI monitoring means 5 for monitoring the status of the GUI 1, and association data for associating the status of the GUI 1 with a speech recognition method in the speech recognition means 2 and a dictionary to be activated. When the status of the GUI 1 changes, the association data holding unit 6 to be held is associated with the notification from the GUI monitoring unit 5. Load the association data corresponding to GUI1 that is currently active from over data holding means 6, constituted by the association changing unit 7 for switching a dictionary to recognize scheme or an active speech recognition unit 2.
[0014]
The operation of the present embodiment will be described in detail with reference to FIGS.
FIG. 2 is a diagram for explaining the operation of GUI1. The GUI 1 includes a plurality of GUIs corresponding to various interfaces with the user. FIG. 2 shows an example of the GUI, in which windows 21, 22, and 23 are activated on the screen, and the window 21 is activated. It indicates that The window 21 includes a text box 202, a list box 203, and a button 204.
[0015]
The window 21 shows an operation of inputting a station name in a text box 202 with a keyboard, selecting a time from a list box 203, and clicking a button 204 to perform a search.
[0016]
FIG. 3 is an example of data stored in the association data holding unit 6.
The association data stores rules for switching dictionaries according to the active GUI. In the example of FIG. 3, N pieces of association data of data 1 to data N are stored. Each association data includes an active application name, a window name, a GUI name, a dictionary name to be switched, and a command conversion method. The command conversion method means a method of controlling the GUI using the recognition result when the voice is recognized.
[0017]
Referring to FIG. 3, when the text box 202 on the window 21 of the application 1 is activated, the data 1 is switched to the general place name strengthening dictionary 1. When the above is executed, the recognition result is displayed in the text box 202. Means to be inserted into
[0018]
In the data 2, when the list box 203 on the window 21 of the application 1 is activated, the dictionary is switched to the time dictionary 2, and when the above is executed, the recognition result is displayed in the text box 202 according to the recognition result. It means that it can be switched.
[0019]
In the data 3, when the text box 202 and the list box 203 are both inactive in the window 21 of the application 1, the menu is switched to the menu / button dictionary 3, and when the above is executed, the button of the GUI name corresponding to the recognition result or the menu Means to click.
[0020]
The data N means that the application dictionary N is activated in a GUI state other than that specified by the data N, and when the above is activated, the application having the application name corresponding to the recognition result is activated.
[0021]
FIG. 4 is an example of a dictionary stored in the dictionary holding unit 3. FIG. 4 illustrates two examples of a dictionary when the speech recognition method is dictation and a dictionary when the speech recognition method is CFG.
[0022]
FIGS. 4A and 4B are examples of dictation dictionaries. Dictation dictionaries can recognize a relatively wide range of utterances by having word string data to record the reading, notation, part of speech, and appearance frequency of each word, and the appearance frequency between words. Become.
[0023]
FIG. 4A is an example of word string data for recording individual words such as the aforementioned reading, notation, and appearance frequency. As shown in FIG. 4A, the word string data includes a dictionary name, a registered part of speech, a reading, a notation, and a word appearance frequency.
[0024]
In this example, four types of parts of speech, “proper noun place name”, “noun”, “particle”, and “auxiliary verb” are registered in the word string data. As the "proper noun place name", three words having the readings of "Tokyo", "Osaka", and "Kyoto" are registered, and the notation is "Tokyo", "Kyoto", and "Osaka". I have. The frequency of appearance of these words is 2.0.
[0025]
As the "noun", three words "Eki", "Shi", and "Town" are registered, and the notation of each word is "Station", "City", and "Town". The frequency of appearance of these words is 1.0 for "Eki" and 0.5 for "shi" and "machi". Two words “de” and “ha” are registered as “particles”. Two words, “is” and “mas”, are registered as “auxiliary verbs”.
[0026]
The word appearance frequency is the ease of appearance for each word, and means that the larger the value, the higher the probability of appearance as a recognition result. In the above example, proper nouns are most likely to appear among the four parts of speech. Also, in nouns, "eki" means that it appears more easily than "machi" or "shi".
[0027]
FIG. 4B is a table showing the frequency of appearance between dictation parts of speech. In this table, the appearance frequency of “proper noun place name” and “noun” is 1.2, and the appearance frequency of auxiliary verbs and particles is 0.1. This indicates that the combination of “proper noun place name” and “place name” appears more easily than the combination of “auxiliary verb” and “particle”.
[0028]
FIGS. 4C and 4D are examples of a dictionary for CFG. In a dictionary for CFG, a word string and a word string are described in combination.
FIG. 4C is an example of a word string. This column shows an example of registration of three types of word strings, “time”, “prefix”, and “end”. Each word string is composed of a reading and a notation. In the word string "time", four words are registered, and each reading is "Rokuji", "Gozenkuro", "Asakuroku", "Asanokuroku", and the corresponding notation is " 6:00 ”.
[0029]
FIG. 4D shows an example of the grammar. The grammar is defined by a permutation of a recognizable word string. In this example, a permutation of three word strings is shown, and the grammar 1 indicates that a combination of the word string “prefix”, the word string “time”, and the word string “end” can be recognized. By defining, "Oh, it's Asano lottery" can be recognized. Similarly, grammar 2 indicates that a combination of the word string “prefix” and the word string “time” can be recognized, and by defining such a combination, it is possible to recognize “um, asanokuroku”. Can be. The grammar 3 indicates that a combination of the word string “time” and the word string “end” can be recognized. By defining such a combination, “please enter lottery” can be recognized.
[0030]
FIG. 5 is a flowchart for explaining the operation when the GUI is changed. The case where the active GUI of the window 21 is changed from the text box 202 to the list box 203 in the same window 21 will be described using the above specific example.
[0031]
The GUI monitoring means 5 compares the stored information of the GUI with the information of the currently active GUI, and if different, determines that the active GUI has been changed. The GUI information is described by an application name, a window name, a GUI name, and a GUI identification number. The identification number of the GUI is a number for identifying the running GUI, and if this value is recorded, it can be understood that the GUI has been changed even if there is a GUI having the same window name and the same GUI name. If there is no change in the GUI, the process ends (step A1).
[0032]
If the GUI has been changed, the current GUI information is sent to the GUI monitoring means 5 and the association changing means 7 and updated (step A2).
[0033]
In response to this, the association changing unit 7 reads the corresponding data from the association data holding unit 6 (step A3). In the above-described example, the data 2 as the association data corresponding to the case where the application name is the application 1, the window name is the window 21, and the GUI name is the list box 203 is read from the association data holding unit 6.
[0034]
The association changing unit 7 sends a dictionary change command to the voice recognition unit 2. In the dictionary change instruction, the dictionary name to be switched is also sent. In response to this, the voice recognition means 2 switches the dictionary (step A4). In the above-described example, the voice recognition unit 2 changes the time dictionary 2 of the data 2 to be selected.
[0035]
Further, the association changing unit 7 sends the command conversion method described in the association data read in step A3 to the command conversion unit 4, and the command conversion unit 4 changes the command conversion method (step A5). In the example described above, the command conversion method of the data 2 is sent to the command conversion means 4. Thus, when a value is sent from the voice recognition unit 2 as a recognition result, the command conversion unit 4 changes the display to the same name as the value.
[0036]
FIG. 6 is a flowchart for explaining an operation when a voice is input to the voice recognition unit 2. The case where the text box 202 is active according to the above-described example will be described as an example.
[0037]
When a voice is input to the voice recognition means 2, voice recognition is performed using the currently selected dictionary (step B1). If the result of the speech recognition does not yield a recognition result, the process ends.
[0038]
When the voice is recognized, the "reading" closest to the input voice is returned as a recognition result (step B2). Since a value corresponding to reading has been set, this value is obtained (step B3). As described above with reference to FIG. 4, when the dictionary being selected is the general place name strengthening dictionary 1 and the pronunciation is “Tokyo”, “Tokyo Station” is acquired as the value.
[0039]
The command conversion unit 4 reads the association data corresponding to the current GUI state through the association change unit 7 in advance. In response to the above value, the GUI is set to the GUI setting method described in the association data (step B4). To explain according to the above-described example, when the text box 202 is active, the data 1 is read into the command conversion unit 4. When the value acquired in step A3 is “Tokyo”, the command conversion unit 4 sets “Tokyo” in the text box 202 according to the data 1.
[0040]
Next, a second embodiment of the present invention will be described with reference to the drawings. The second embodiment relates to the generation of the association data in the first embodiment.
[0041]
As shown in FIG. 7, the configuration of the second embodiment of the present invention comprises a GUI 1, a GUI acquisition unit 702 for acquiring identification information of a GUI being selected, a dictionary holding unit 3 for speech recognition, A setting method database 703 storing a method of setting a result recognized by using a grammar in a GUI; a GUI identification information acquired by the above-described GUI acquiring means 702; a dictionary and a GUI used when the above-mentioned GUI is selected; And an associating data holding unit 6 that stores the created associating data.
[0042]
Next, the operation of the second embodiment will be described based on an example used in the first embodiment. FIG. 8 is a flowchart for explaining the operation of the second embodiment. First, an application and a window including a GUI for which association data is to be created are activated, and the target GUI is activated (step C1).
[0043]
As a specific example of the operation of the following flowchart, a method of creating association data corresponding to the text box 202, the list box 203, and the button 204 in the window 21 shown in FIG. 2 will be described. Upon receiving step C1, GUI identification information is obtained (step C2).
[0044]
FIG. 9 is a diagram for explaining the identification information of the GUI. The GUI identification information includes a window name, a GUI name, and a GUI type. When the text box 202 is activated, it means that the window name is the window 21, the GUI name is the text box 202, and the GUI type is the text box.
[0045]
Upon receiving the identification information of the GUI in step C2, the setting method data is acquired from the setting method database 703 (step C3). The setting method is a method of setting the result of the voice recognition means 2 for each type of GUI. The setting method data includes a setting method name, a GUI type to be set, and a setting method for the GUI.
[0046]
FIG. 10 shows an example of the setting method database 703. The setting method database 703 stores three setting method data from setting method 1 to setting method 3. In the setting method 1, when the type of the GUI to be set is a text box, the result of the voice recognition unit 2 is set in the text box. In the setting method 2, when the type of the GUI to be set is a list box, a notation that matches the result of the voice recognition means 2 is selected from the list and displayed. In the setting method 3, when the type of the GUI to be set is a button or a menu, a button or a menu that matches the result of the voice recognition unit 2 is to be executed. For example, when the GUI identified in step C2 of FIG. 9 is a text box type, the setting method 1 is selected.
[0047]
Next, a dictionary corresponding to the selected GUI is selected from the dictionary holding means 3 (step C4). The dictionary holding means 3 stores a plurality of dictionaries created in advance.
[0048]
Finally, the identification information of the GUI acquired in step C2, the setting method acquired in step C3, and the dictionary set in step C4 are combined as association data, and stored in the association data holding unit 6.
[0049]
【The invention's effect】
A first effect is that the input GUI can be controlled by voice so that the GUI can be controlled without contacting an input device such as a keyboard, a mouse, and a touch panel.
[0050]
The second effect is that a voice input function can be added to an existing application by monitoring all GUIs and having a program separately from the application to be monitored.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a first embodiment of the present invention.
FIG. 2 is an example of a GUI for describing an operation of the first exemplary embodiment of the present invention.
FIG. 3 is an example of data stored in association data 6 for explaining the operation of the first exemplary embodiment of the present invention.
FIG. 4 is an example of a dictionary stored in the dictionary holding means 3 for explaining the operation of the first exemplary embodiment of the present invention.
FIG. 5 is a flowchart for explaining the operation of the first exemplary embodiment of the present invention.
FIG. 6 is a flowchart for explaining the operation of the first exemplary embodiment of the present invention.
FIG. 7 is a configuration diagram of a second embodiment of the present invention.
FIG. 8 is a flowchart for explaining the operation of the second exemplary embodiment of the present invention.
FIG. 9 is an example of information of a GUI under selection for explaining the operation of the second exemplary embodiment of the present invention.
FIG. 10 is an example of data stored in a setting method database 703 for explaining the operation of the second exemplary embodiment of the present invention.
[Explanation of symbols]
1 GUI
2 voice recognition means 3 dictionary holding means 4 command conversion means 5 GUI monitoring means 6 association data holding means 7 association changing means 21, 22, 23 window 202 text box 203 list box 204 button 702 GUI acquisition means 703 setting method database 704 Correlation means

Claims

A graphical user interface (GUI);
Dictionary holding means for holding a dictionary storing rules for converting speech to character strings;
Voice recognition means for converting the input voice to characters by referring to the dictionary,
Command conversion means for converting the character string recognized by the voice recognition means into a command string and setting the same in a GUI;
GUI monitoring means for monitoring the status of the GUI;
Association data holding means for holding association data for associating a state of the GUI recognized by the GUI monitoring means with a speech recognition method or dictionary referred to by the speech recognition means;
A voice control device comprising: an association changing unit that reads association data corresponding to a state of a GUI from an association data holding unit and switches a speech recognition method or a dictionary to be referred to by a speech recognition unit.

GUI acquisition means for acquiring identification information of the selected GUI, a setting method database holding setting method data including a method of setting the recognition result of the voice recognition means in the GUI, GUI identification information and the GUI. An association unit that generates association data in which a dictionary used when selected and a setting method for a GUI held in a setting method database are associated, and stores the association data in the association data holding unit. The voice control device according to claim 1.

A voice recognition step for converting a voice string into a voice character input by referring to a dictionary storing rules for converting a voice into a character string; and converting a character string recognized in the voice recognition step into a command string to a graphical user interface (GUI). A command conversion step to be set, and reading of association data according to the state of the GUI from an association data holding unit that holds association data for associating a GUI state with a speech recognition method or a dictionary referred to by the speech recognition unit. A voice recognition method, comprising: changing a voice recognition method and a dictionary to be referred to in the voice recognition step.

A GUI acquisition step of acquiring identification information of the selected GUI, a method of setting identification information of the GUI and a dictionary used when the GUI is selected, and a method of setting the recognition result of the voice recognition step to the GUI. 4. An audio control method according to claim 3, further comprising an association step of generating association data and storing the association data in the association data holding unit.

A voice recognition step for converting a voice string into a voice character input by referring to a dictionary storing rules for converting a voice into a character string; and converting a character string recognized in the voice recognition step into a command string to a graphical user interface (GUI). A command conversion step to be set, and reading of association data according to the state of the GUI from an association data holding unit that holds association data for associating a GUI state with a speech recognition method or a dictionary referred to by the speech recognition unit. A program for causing a computer to execute an associating change step of switching a speech recognition method or a dictionary referred to in the speech recognition step.

A GUI acquisition step of acquiring identification information of the selected GUI, a method of setting identification information of the GUI and a dictionary used when the GUI is selected, and a method of setting the recognition result of the voice recognition step to the GUI. 6. The computer program according to claim 5, further comprising the step of: generating an association data and storing the association data in the association data holding means.