JP2005091611A

JP2005091611A - Information terminal, speech recognition server, and speech recognition system

Info

Publication number: JP2005091611A
Application number: JP2003323372A
Authority: JP
Inventors: Reiko Okada; 玲子岡田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2003-09-16
Filing date: 2003-09-16
Publication date: 2005-04-07
Anticipated expiration: 2023-09-16
Also published as: JP4413564B2

Abstract

<P>PROBLEM TO BE SOLVED: To reduce ineffective data communication by enhancing the reliability of speech data to be transmitted to a speech recognition server. <P>SOLUTION: In an onboard speech recognition system, which performs a speech recognition processing at a server side, an onboard information apparatus 200 decides an amount of noise of speech data based on the SN ratio of inputted speech data in a speech data determination part 207, and a determination part 203 decides whether the correct speech recognition of the speech data is possible or not based on information of noise occurrence factors such as the traveling speed of a vehicle, an opening/closing condition of a window, then only speech data, which is decided that the speech recognition is possible, to the speech recognition server 210 through a communication part 210. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

この発明は、情報端末および音声認識サーバと、それらを用いた音声認識システムに関するものである。 The present invention relates to an information terminal, a voice recognition server, and a voice recognition system using them.

音声を入出力とする情報システムは、車載情報システムやホームネットワークシステムなどさまざまな分野で利用されている。音声を入出力とするシステムには、入力された音声データに含まれる雑音信号を除去し、音声データを正しく認識するための仕組みが備えられている。 Information systems that use voice as input and output are used in various fields such as in-vehicle information systems and home network systems. A system that uses voice as input / output includes a mechanism for removing a noise signal included in input voice data and correctly recognizing the voice data.

例えば、特許文献１に記載された従来の車載用音声認識装置は、話者用のマイクを介して得られた音声信号から、雑音集音用のマイクを介して得られた雑音信号を除去することにより、高いＳ／Ｎ比を得て音声認識率を上げている。 For example, the conventional in-vehicle voice recognition device described in Patent Document 1 removes a noise signal obtained through a noise collecting microphone from a voice signal obtained through a speaker microphone. Thus, a high S / N ratio is obtained and the speech recognition rate is increased.

また、特許文献２に記載された従来の音声対話型ナビゲーションシステムは、車載端末機と音声対話サーバ間で通話を行い、サーバ側で音声認識処理を行うものである。 In addition, the conventional voice interactive navigation system described in Patent Document 2 performs a call between an in-vehicle terminal and a voice interactive server, and performs voice recognition processing on the server side.

特開平２−７７７９９号公報Japanese Patent Laid-Open No. 2-77799 特開２００２−３１８１３２号公報JP 2002-318132 A

特許文献１に記載された車載用音声認識装置においては、車内外に様々な雑音発生の要因があるため、雑音を完全に除去することは難しい。 In the on-vehicle speech recognition apparatus described in Patent Document 1, it is difficult to completely remove noise because there are various causes of noise generation inside and outside the vehicle.

また、特許文献２に記載された音声対話型ナビゲーションシステムにおいては、サーバとの通信時間、認識処理時間により認識結果が出るまでに時間を要し、サーバで音声認識に失敗した場合にユーザの再入力操作とデータの再送が必要となるため、無駄な通信が発生して操作性が悪くなり、通信料金も高くなるという問題があった。 Further, in the voice interactive navigation system described in Patent Document 2, it takes time until the recognition result is obtained due to the communication time with the server and the recognition processing time. Since an input operation and data retransmission are required, there is a problem that wasteful communication occurs, operability is deteriorated, and a communication fee is increased.

この発明は上記のような課題を解決するためになされたもので、サーバ側で音声認識処理を行う場合に、サーバへ送信する音声データの信頼性を高め、無効なデータ通信を削減することが可能な情報端末を得ることを目的とする。
また、情報端末への無駄なデータ通信を削減することが可能な音声認識サーバを得ることを目的とする。
また、上記のような情報端末および音声認識サーバを用いた音声認識システムを得ることを目的とする。 The present invention has been made to solve the above-described problems. In the case where voice recognition processing is performed on the server side, it is possible to increase the reliability of voice data transmitted to the server and reduce invalid data communication. The purpose is to obtain possible information terminals.
It is another object of the present invention to provide a voice recognition server that can reduce useless data communication with an information terminal.
It is another object of the present invention to obtain a voice recognition system using the information terminal and the voice recognition server as described above.

この発明に係る情報端末は、音声入力部を介して入力された音声データが、正しく音声認識されるかどうかを判断する判定部と、判定部において正しく音声認識されると判断された音声データのみを音声認識サーバに送信する通信部を備えたものである。 The information terminal according to the present invention includes a determination unit that determines whether or not the voice data input via the voice input unit is correctly recognized, and only the voice data that is determined to be correctly recognized by the determination unit. Is provided with a communication unit that transmits the message to the voice recognition server.

この発明に係る音声認識サーバは、情報端末から受信した音声データの音声認識を行う音声認識部と、音声認識部による認識結果を辞書データと比較し、認識結果の信頼度を出力する認識結果判定部と、信頼度がある閾値以上の場合には、受信した音声データに対する応答内容を持つ応答データを作成し、信頼度が閾値に達しない場合には、受信した音声データの認識結果が正しいかどうかを確認する内容の応答データを作成する応答データ作成部と、応答データ作成部で作成した応答データを情報端末へ送信する通信部を備えたものである。 A speech recognition server according to the present invention includes a speech recognition unit that performs speech recognition of speech data received from an information terminal, and a recognition result determination that compares a recognition result by the speech recognition unit with dictionary data and outputs a reliability of the recognition result. If the reliability is equal to or higher than a certain threshold, response data having response contents for the received voice data is created. If the reliability does not reach the threshold, whether the received voice data is recognized correctly A response data creating unit that creates response data having contents to be confirmed and a communication unit that transmits the response data created by the response data creating unit to the information terminal are provided.

この発明に係る音声認識システムは、情報端末が、判定部において、音声入力部を介して入力された音声データが音声認識サーバで正しく音声認識されるかどうか判断し、判定部において正しく音声認識されると判断された音声データのみを通信部を介して音声認識サーバに送信し、音声認識サーバが、音声認識部において、通信部を介して情報端末から受信した音声データの音声認識を行い、認識結果判定部において、音声認識部による認識結果を辞書データと比較することにより、認識結果の信頼度を出力し、信頼度がある閾値以上の場合には、応答データ作成部において、受信した音声データに対する応答内容を持つ応答データを作成し、信頼度が閾値に達しない場合には、受信した音声データの認識結果が正しいかどうかを確認する内容の応答データを作成し、応答データを通信部を介して情報端末へ送信するものである。 In the speech recognition system according to the present invention, the information terminal determines whether the speech data input via the speech input unit is correctly recognized by the speech recognition server in the determination unit, and is correctly recognized by the determination unit. Only the voice data determined to be transmitted to the voice recognition server via the communication unit, and the voice recognition server performs voice recognition of the voice data received from the information terminal via the communication unit and recognizes In the result determination unit, the recognition result by the speech recognition unit is compared with dictionary data to output the reliability of the recognition result. If the reliability is greater than a certain threshold, the response data creation unit receives the received speech data Create response data with the response contents for, and if the reliability does not reach the threshold, check whether the recognition result of the received voice data is correct Creates a response data capacity, and transmits via the communication unit the response data to the information terminal.

この発明によれば、情報端末において、入力された音声データ自身の雑音量を判定すると共に、様々な雑音発生の原因となる要素の情報に基づいて音声データの認識可能性を判断し、認識可能と判断した音声データのみを音声認識サーバへ送信するようにしたので、音声認識サーバでの音声認識の失敗をできるだけ排除し、無駄な通信を減らすことができるという効果がある。 According to the present invention, in the information terminal, the noise amount of the input voice data itself is determined, and the recognizability of the voice data is determined based on the information of the elements that cause various noises. Since only the voice data determined to be transmitted to the voice recognition server is transmitted, it is possible to eliminate as much as possible voice recognition failure in the voice recognition server and reduce useless communication.

この発明によれば、音声認識サーバにおいて、音声認識が成功したと判断したときは、情報端末へ送信データに対する応答内容を含む応答データを送信し、音声認識が失敗したと判断したときは、情報端末に対して送信データの内容を確認する内容の応答データを送信するようにしたので、音声認識サーバからの無駄なデータの送信を避けることができる。 According to the present invention, when the voice recognition server determines that the voice recognition is successful, the response data including the response content to the transmission data is transmitted to the information terminal, and when it is determined that the voice recognition has failed, the information Since the response data having the content for confirming the content of the transmission data is transmitted to the terminal, it is possible to avoid unnecessary data transmission from the voice recognition server.

この発明によれば、情報端末において、入力された音声データ自身の雑音量および様々な雑音発生の原因となる要素の情報に基づいて音声データの認識可能性を判断し、認識可能と判断した音声データのみを音声認識サーバへ送信し、音声認識サーバは、受信した音声データの音声認識が成功したと判断したときは、情報端末へ送信データに対する応答内容を含む応答データを送信し、音声認識が失敗したと判断したときは、情報端末に対して送信データの内容を確認する内容の応答データを送信するようにしたので、無効なデータ通信を減らすとともに、ユーザの操作効率を向上させることが可能な音声認識システムが得られる。 According to the present invention, the information terminal determines the recognizability of the sound data based on the noise amount of the input sound data itself and the information of the elements that cause various noises, and the sound that is determined to be recognizable When only the data is transmitted to the voice recognition server and the voice recognition server determines that the voice recognition of the received voice data is successful, the voice recognition server transmits response data including a response content to the transmission data to the information terminal. When it is determined that it has failed, response data with the content to confirm the content of the transmission data is transmitted to the information terminal, so that it is possible to reduce invalid data communication and improve user operation efficiency Voice recognition system can be obtained.

以下、この発明の実施の様々な形態を説明する。
実施の形態１．
図１は、この発明の実施の形態１による、車載用の音声認識システム１００の構成を示すブロック図である。音声認識システム１００は、車載情報機器（情報端末）２００、および音声認識サーバ３００を備えている。車載情報機器２００および音声認識サーバ３００は、無線通信回線を介して接続されている。 Hereinafter, various embodiments of the present invention will be described.
Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration of an in-vehicle speech recognition system 100 according to Embodiment 1 of the present invention. The voice recognition system 100 includes an in-vehicle information device (information terminal) 200 and a voice recognition server 300. The in-vehicle information device 200 and the voice recognition server 300 are connected via a wireless communication line.

車載情報機器２００は、車両情報取得部２０１、ノイズ量判断部２０２、判定部（雑音量判定部、判定部）２０３、閾値格納部２０４、音声入力部２０５、Ｓ／Ｎ比取得部２０６、音声データ判定部２０７、音声出力部２０８、制御部２０９、通信部２１０、認識結果判定部２１１、判定条件学習部２１２を備える。また、車載情報機器２００は、音声入力部２０５を介して車内情報端末のマイク１１と、音声出力部２０８を介してスピーカー１０と接続されている。また、車両情報取得部２０１を介して、車速計１２、窓の開閉装置１３、ワイパー駆動装置１４、ウィンカー１５、カーナビゲーション装置１６、および車内オーディオ機器１７、車内空調機１８と接続されている。また、制御部２０９は、車載情報機器２００全体を制御している。 The in-vehicle information device 200 includes a vehicle information acquisition unit 201, a noise amount determination unit 202, a determination unit (noise amount determination unit, determination unit) 203, a threshold storage unit 204, an audio input unit 205, an S / N ratio acquisition unit 206, an audio A data determination unit 207, a voice output unit 208, a control unit 209, a communication unit 210, a recognition result determination unit 211, and a determination condition learning unit 212 are provided. The in-vehicle information device 200 is connected to the microphone 11 of the in-vehicle information terminal via the voice input unit 205 and the speaker 10 via the voice output unit 208. In addition, the vehicle information acquisition unit 201 is connected to the vehicle speedometer 12, the window opening / closing device 13, the wiper driving device 14, the winker 15, the car navigation device 16, the in-vehicle audio device 17, and the in-vehicle air conditioner 18. The control unit 209 controls the entire in-vehicle information device 200.

車両情報取得部２０１、ノイズ量判断部２０２、判定部２０３、閾値格納部２０４、音声入力部２０５、Ｓ／Ｎ比取得部２０６、音声データ判定部２０７、音声出力部２０８、制御部２０９、通信部２１０、認識結果判定部２１１、および判定条件学習部２１２は、車載情報機器２００の中央演算処理装置の部分を構成するものであり、これらは、車載情報機器２００の中央演算処理装置の動作を制御するプログラムのモジュールに対応している。 Vehicle information acquisition unit 201, noise amount determination unit 202, determination unit 203, threshold storage unit 204, audio input unit 205, S / N ratio acquisition unit 206, audio data determination unit 207, audio output unit 208, control unit 209, communication The unit 210, the recognition result determination unit 211, and the determination condition learning unit 212 constitute a part of the central processing unit of the in-vehicle information device 200, and these operate the central processing unit of the in-vehicle information device 200. It corresponds to the program module to be controlled.

音声認識サーバ３００は、通信部３０１、制御部３０２、音声認識部３０３、認識結果判定部３０４、応答データ作成部３０５を備える。また、音声認識サーバ３００には、認識辞書記憶部３０６が接続されている。認識辞書記憶部３０６は、認識辞書データベースを格納した記憶装置である。また、制御部３０２は、音声認識サーバ３００全体を制御している。 The speech recognition server 300 includes a communication unit 301, a control unit 302, a speech recognition unit 303, a recognition result determination unit 304, and a response data creation unit 305. In addition, a recognition dictionary storage unit 306 is connected to the voice recognition server 300. The recognition dictionary storage unit 306 is a storage device that stores a recognition dictionary database. Further, the control unit 302 controls the entire voice recognition server 300.

通信部３０１、制御部３０２、音声認識部３０３、認識結果判定部３０４、および応答データ作成部３０５は、音声認識サーバ３００の中央演算処理装置の部分を構成するものであり、これらは、音声認識サーバ３００の中央演算処理装置の動作を制御するプログラムのモジュールに対応している。 The communication unit 301, the control unit 302, the speech recognition unit 303, the recognition result determination unit 304, and the response data creation unit 305 constitute part of the central processing unit of the speech recognition server 300. This corresponds to a program module that controls the operation of the central processing unit of the server 300.

次に、動作について説明する。
図２は、実施の形態１による車載情報機器２００による音声認識処理のフローチャートである。
音声入力処理が開始されると、マイク１１を介してユーザによる発話音声が取得され（ステップＳＴ１０１）、音声入力部２０５に音声データが入力される（ステップＳＴ１０２）。 Next, the operation will be described.
FIG. 2 is a flowchart of voice recognition processing by the in-vehicle information device 200 according to the first embodiment.
When the voice input process is started, the voice spoken by the user is acquired via the microphone 11 (step ST101), and voice data is input to the voice input unit 205 (step ST102).

次に、Ｓ／Ｎ比取得部２０６により、入力された音声データのＳ／Ｎ比（Ｎ）が取得される（ステップＳＴ１０３）。Ｓ／Ｎ比（Ｎ）は、車載情報機器２００が取得した音声入力処理開始直前の周囲の雑音信号と、ステップＳＴ１０２で入力された音声信号を比較することにより算出される。 Next, the S / N ratio acquisition unit 206 acquires the S / N ratio (N) of the input audio data (step ST103). The S / N ratio (N) is calculated by comparing the ambient noise signal acquired by the in-vehicle information device 200 immediately before the start of the voice input process with the voice signal input in step ST102.

次に、音声データ判定部２０７により、ステップＳＴ１０３で取得されたＳ／Ｎ比（Ｎ）と閾値（Ｎ０）が比較される（ステップＳＴ１０４）。ステップＳＴ１０４でＳ／Ｎ比（Ｎ）が閾値（Ｎ０）以上であると判定された場合には、入力された音声データはノイズが多く誤認識が発生しやすいと判断され、スピーカー１０等を介してユーザに再入力を促す。 Next, the audio data determination unit 207 compares the S / N ratio (N) acquired in step ST103 with the threshold (N0) (step ST104). If it is determined in step ST104 that the S / N ratio (N) is greater than or equal to the threshold (N0), it is determined that the input audio data is noisy and misrecognition is likely to occur. Prompt the user to re-enter.

ステップＳＴ１０４でＳ／Ｎ比（Ｎ）が閾値（Ｎ０）より小さいと判定された場合には、車両情報取得部２０１により、車両内で、ノイズの発生原因となる様々な要素についての情報を取得する（ステップＳＴ１０５）。
例えば、車速計１２からは走行速度を取得する。また、窓の開閉装置１３からは窓の開閉状態を取得する。窓の開閉状態については、窓が開いているか閉まっているか、開いている場合はどの程度開いているかという情報を取得する。また、ワイパー駆動装置１４からは、ワイパーが動いているか停止しているか、動いている場合にはどの程度の速さで動いているかという情報を取得する。また、ウィンカー１５からは、ウィンカーが作動しているか停止しているかの情報を取得する。カーナビゲーション装置１６からは、現在、すなわち音声データ入力時の走行地点に関する情報を取得する。走行地点に関する情報としては、例えば、走行している道路の状態、すなわち舗装されているか否か等、すぐ近くに工事現場があるかといった情報を取得する。車内オーディオ機器１７からは、使用中であれば、設定されている音量を取得する。車内空調機１８からは、空調機が稼動中であれば設定されている送風量を取得する。 When it is determined in step ST104 that the S / N ratio (N) is smaller than the threshold value (N0), the vehicle information acquisition unit 201 acquires information on various elements that cause noise in the vehicle. (Step ST105).
For example, the traveling speed is acquired from the vehicle speedometer 12. The window opening / closing device 13 acquires the opening / closing state of the window. As for the open / closed state of the window, information is acquired indicating whether the window is open or closed, and if it is open, how much is open. Further, the wiper driving device 14 acquires information about whether the wiper is moving or stopped, and how fast the wiper is moving when it is moving. Further, from the winker 15, information on whether the winker is operating or stopped is acquired. From the car navigation device 16, information on the current travel point, that is, the travel point at the time of voice data input is acquired. As information on the travel point, for example, information on whether there is a construction site in the immediate vicinity, such as the state of the road on which the vehicle is traveling, that is, whether or not the vehicle is paved is acquired. If it is in use, the set volume is acquired from the in-vehicle audio device 17. If the air conditioner is in operation, the set air flow rate is acquired from the in-vehicle air conditioner 18.

ノイズ量判断部２０２は、車両情報取得部２０１により取得されたノイズ発生原因となる様々な要素についての情報に基づいて、各々の要素についてのノイズ量の判定値Ｓを取得する（ステップＳＴ１０５）。判定値Ｓは、ノイズ量判断部２０２で取得した走行速度等をそのまま利用してもよいし、窓の開閉状態やワイパーの動作状態等を複数の段階に分けて点数化してもよい。例えば、窓の開閉状態については、閉まっていれば０点、開いている部分が半分以下ならば１点、半分以上開いていれば２点、全開ならば３点というように設定してもよい。同様に、ワイパーについても、停止していれば０点、最も遅い設定の速度で動いていれば１点、中間の速度なら２点、最も速い速度なら３点というように設定してもよい。 The noise amount determination unit 202 acquires a noise amount determination value S for each element based on information about various elements that cause noise generation acquired by the vehicle information acquisition unit 201 (step ST105). As the determination value S, the traveling speed acquired by the noise amount determination unit 202 may be used as it is, or the open / close state of the window, the operation state of the wiper, and the like may be scored in a plurality of stages. For example, the open / closed state of the window may be set to 0 point if it is closed, 1 point if the open part is less than half, 2 points if it is more than half open, and 3 points if it is fully open. . Similarly, the wiper may be set to 0 point if it is stopped, 1 point if it is moving at the slowest setting speed, 2 points if it is an intermediate speed, and 3 points if it is the fastest speed.

次に、判定部２０３は、ステップＳＴ１０５で得られたノイズ量の判定値Ｓを取得すると共に、閾値格納部２０４から各々の要素についてのノイズ量の閾値Ｓ０を取得する（ステップＳＴ１０６）。 Next, the determination unit 203 acquires the noise amount determination value S obtained in step ST105 and also acquires the noise amount threshold value S0 for each element from the threshold storage unit 204 (step ST106).

図３は、閾値格納部２０４に格納されたノイズ量の閾値Ｓ０の設定方法を説明する図である。ここでは、車両内でノイズの発生原因となる要素として車両の走行速度を例に取り説明する。
閾値格納部２０４は、予め定められた認識率またはノイズ量の限界値に従って得られる閾値Ｓ０を保持する。図に示す例では、音声の認識率が５０％以上かどうかを音声認識サーバ３００へ送信するか否かの限界値Ａとして定めており、この場合、音声認識率が５０％となる８０ｋｍ／ｈが走行速度の閾値Ｓ０となる。
走行速度のほか、窓の開閉状態、加速度状態、オーディオの音量状態、ワイパーの動作状態、ウィンカーの動作状態、エアコンの送風量状態、走行地点の状態といった他の要素についても、同様に限界値Ａを満たす値を閾値Ｓ０とする。 FIG. 3 is a diagram for explaining a method of setting the noise amount threshold value S0 stored in the threshold value storage unit 204. Here, a description will be given by taking the traveling speed of the vehicle as an example of an element that causes noise in the vehicle.
The threshold value storage unit 204 holds a threshold value S0 obtained according to a predetermined recognition rate or a noise amount limit value. In the example shown in the figure, whether or not the speech recognition rate is 50% or more is defined as a limit value A for determining whether or not to transmit to the speech recognition server 300. In this case, the speech recognition rate is 80 km / h, which is 50%. Becomes the threshold S0 of the traveling speed.
In addition to the traveling speed, the limit value A is similarly applied to other factors such as the opening / closing state of the window, the acceleration state, the audio volume state, the wiper operation state, the blinker operation state, the air conditioner air flow state, and the travel point state. A value satisfying the condition is defined as a threshold value S0.

また、ノイズ発生原因となる要素が複数ある場合、それらの条件を組み合わせた状態で認識率またはノイズ量の限界値を満たすような閾値Ｓ０を設定するようにすれば、より適切な閾値Ｓ０を得ることができる。図４には、走行速度と窓の開閉状態を組み合わせた場合の走行速度の閾値Ｓ０の設定方法の例を示している。図に示すように、窓が閉まっている状態では走行速度の閾値Ｓ０は８０ｋｍ／ｈであるが、窓が半分開いた状態、さらに全開の状態では、認識率の限界値５０％を満たす走行速度の閾値Ｓ０は、７０ｋｍ／ｈ、６５ｋｍ／ｈとなる。 Further, when there are a plurality of elements that cause noise generation, a more appropriate threshold value S0 can be obtained by setting a threshold value S0 that satisfies the recognition rate or the noise amount limit value in a state in which these conditions are combined. be able to. FIG. 4 shows an example of a method for setting the threshold value S0 of the traveling speed when the traveling speed and the window open / closed state are combined. As shown in the figure, the running speed threshold S0 is 80 km / h when the window is closed, but the running speed that satisfies the recognition rate limit value of 50% when the window is half open and further fully opened. The threshold value S0 is 70 km / h and 65 km / h.

判定部２０３は、各々のノイズ発生原因となる要素について、判定値Ｓと閾値Ｓ０を比較する（ステップＳＴ１０７）。
ステップＳＴ１０７で、判定値Ｓが閾値Ｓ０以上であると判定された場合には、入力された音声データはノイズ量が多く誤認識が発生しやすいと判断され、音声認識サーバ３００に音声データを送信せず、ユーザに再入力を促す。図３の例を用いて説明すると、ステップＳＴ１０５で取得した車両の走行速度が８０ｋｍ／ｈ以上の場合には、音声認識サーバ３００に音声データを送信せず、ユーザに再入力を促す。 The determination unit 203 compares the determination value S with the threshold value S0 for each element that causes noise generation (step ST107).
If it is determined in step ST107 that the determination value S is greater than or equal to the threshold value S0, it is determined that the input voice data has a large amount of noise and misrecognition is likely to occur, and the voice data is transmitted to the voice recognition server 300. Without prompting the user to input again. If it demonstrates using the example of FIG. 3, when the driving speed of the vehicle acquired by step ST105 is 80 km / h or more, audio | voice data will not be transmitted to the speech recognition server 300, but a user will be prompted for re-input.

ステップＳＴ１０７で判定値Ｓが閾値Ｓ０より小さいと判定された場合には、通信部２１０により、音声データが音声認識サーバ３００に送信される（ステップＳＴ１０８）。図３の例を用いて説明すると、例えば、ステップＳＴ１０５で取得した車両の走行速度が６０ｋｍ／ｈであれば、閾値の８０ｋｍ／ｈよりも小さいため、音声データを音声認識サーバ３００に送信する。 If it is determined in step ST107 that the determination value S is smaller than the threshold value S0, the communication unit 210 transmits the sound data to the sound recognition server 300 (step ST108). If it demonstrates using the example of FIG. 3, if the driving speed of the vehicle acquired by step ST105 is 60 km / h, since it is smaller than the threshold value 80 km / h, audio | voice data will be transmitted to the speech recognition server 300. FIG.

音声認識サーバ３００において後述する音声認識処理が行われ、音声認識サーバ３００から応答データが通知されると、通信部２１０において応答データを受信する（ステップＳＴ１０９）。 When the voice recognition process described later is performed in the voice recognition server 300 and response data is notified from the voice recognition server 300, the communication unit 210 receives the response data (step ST109).

認識結果判定部２１１において、受信した応答データの内容を判断し、音声認識サーバ３００において音声認識が正しく行われたかどうかを判定する（ステップＳＴ１１０）。
ステップＳＴ１１０で、音声認識サーバ３００による音声認識が成功したと判定された場合には、判定条件学習部２１２は閾値格納部２０４に格納された閾値Ｓ０の値を上げる（ステップＳＴ１１１）。一方、ステップＳＴ１１０で、音声認識が失敗したと判定された場合には、閾値Ｓ０の値を下げる（ステップＳＴ１１２）。 The recognition result determination unit 211 determines the content of the received response data, and determines whether or not the voice recognition is correctly performed in the voice recognition server 300 (step ST110).
If it is determined in step ST110 that the speech recognition by the speech recognition server 300 is successful, the determination condition learning unit 212 increases the value of the threshold value S0 stored in the threshold value storage unit 204 (step ST111). On the other hand, when it is determined in step ST110 that the voice recognition has failed, the threshold value S0 is decreased (step ST112).

判定条件学習部２１２は、音声認識サーバ３００での音声認識が成功したか失敗したかに基づいて、閾値格納部２０４に保持された各々の車両状態についての閾値Ｓ０を更新する。例えば、図３の例で、走行速度の閾値Ｓ０を８０ｋｍ／ｈを設定していたが、音声認識サーバ３００での音声認識が失敗した場合には、閾値Ｓ０を６０ｋｍ／ｈに下げる。これにより、音声認識サーバ３００へ音声データを送信するか否かの判定条件が厳しくなり、より信頼度の高い音声データのみを音声認識サーバ３００へ送信するようにすることができる。 The determination condition learning unit 212 updates the threshold value S0 for each vehicle state held in the threshold value storage unit 204 based on whether or not the voice recognition by the voice recognition server 300 is successful. For example, in the example of FIG. 3, the threshold value S0 of the traveling speed is set to 80 km / h, but when the voice recognition by the voice recognition server 300 fails, the threshold value S0 is lowered to 60 km / h. As a result, the condition for determining whether or not to transmit the voice data to the voice recognition server 300 becomes strict, and only the voice data with higher reliability can be transmitted to the voice recognition server 300.

次に、音声認識サーバ３００の動作について説明する。
図５は、実施の形態１による音声認識サーバ３００による音声認識処理のフローチャートである。
通信部３０１で、車載情報機器２００がステップＳＴ１０８で送信した音声データを受信する（ステップＳＴ２０１）。 Next, the operation of the voice recognition server 300 will be described.
FIG. 5 is a flowchart of voice recognition processing by the voice recognition server 300 according to the first embodiment.
The communication unit 301 receives the audio data transmitted by the in-vehicle information device 200 in step ST108 (step ST201).

次に、音声認識部３０３において、認識辞書記憶部３０６に格納された辞書データを基に受信した音声データの音声認識処理を行う（ステップＳＴ２０２）。 Next, the speech recognition unit 303 performs speech recognition processing on the received speech data based on the dictionary data stored in the recognition dictionary storage unit 306 (step ST202).

次に、認識結果判定部３０４は、ステップＳＴ２０２で行われた音声認識の結果が、認識辞書記憶部３０６に格納された辞書データとどの程度一致したかを表す結果のスコアｓ（信頼度）を出力する（ステップＳＴ２０３）。 Next, the recognition result determination unit 304 uses a result score s (reliability) indicating how much the result of the speech recognition performed in step ST202 matches the dictionary data stored in the recognition dictionary storage unit 306. Output (step ST203).

次に、ステップＳＴ２０３で取得したスコアｓと、予め設定されたスコアの閾値ｓ０を比較する（ステップＳＴ２０４）。
ステップＳＴ２０４で、スコアｓが閾値ｓ０以上であると判定された場合には、誤認識である可能性が高いと判断され、応答データ作成部３０５は、ユーザに音声認識結果が正しいか否かを確認する内容の応答データを作成する（ステップＳＴ２０６）。
一方、ステップＳＴ２０４で、スコアｓが閾値ｓ０より小さいと判定された場合には、音声認識が成功している可能性が高いと判断され、認識結果に基づいて、それに対する応答内容を含む応答データを作成する（ステップＳＴ２０５）。 Next, the score s acquired in step ST203 is compared with a preset score threshold s0 (step ST204).
If it is determined in step ST204 that the score s is greater than or equal to the threshold value s0, it is determined that there is a high possibility of misrecognition, and the response data creation unit 305 determines whether the voice recognition result is correct or not. Response data having contents to be confirmed is created (step ST206).
On the other hand, if it is determined in step ST204 that the score s is smaller than the threshold value s0, it is determined that there is a high possibility that the speech recognition is successful, and the response data including the response content for the recognition based on the recognition result. Is created (step ST205).

次に、通信部３０１により、ステップＳＴ２０５またはステップＳＴ２０６で作成された応答データが車載情報機器２００に送信される（ステップＳＴ２０７）。送信した応答データは、車載情報機器２００がステップＳＴ１０９で受信する。車載情報機器２００の認識結果判定部２１１は、ステップＳＴ１１０で、受信した応答データの内容に基づいて音声認識サーバ３００での音声認識が成功したか失敗したかを判定する。 Next, the communication unit 301 transmits the response data created in step ST205 or step ST206 to the in-vehicle information device 200 (step ST207). The transmitted response data is received by the in-vehicle information device 200 in step ST109. In step ST110, the recognition result determination unit 211 of the in-vehicle information device 200 determines whether or not the speech recognition by the speech recognition server 300 has succeeded or failed based on the content of the received response data.

図６と図７を用いて、実施の形態１による、車載情報機器２００と音声認識サーバ３００間での音声認識処理の流れを説明する。図６は、音声認識サーバ３００において、ステップＳＴ２０４で音声認識結果のスコアｓが閾値ｓ０より小さかった場合、すなわち音声認識が成功したと判断された場合のシーケンスを示している。例えば、車載情報機器２００から音声認識サーバ３００に「近くのラーメン屋」という内容の音声データを送信した場合、音声認識サーバ３００で認識結果のスコアｓが閾値ｓ０より小さいと判断されれば、音声認識サーバ３００は送信された音声データの要求に答えて、ラーメン屋を検索し、検索結果のラーメン屋リストを応答データとして車載情報機器２００に送信する。車載情報機器２００の音声出力部２０８は、スピーカー１０を介して受信した応答データを出力する。また、この時、音声認識サーバ３００での音声認識が成功しているので、車載情報機器２００の判定条件学習部２１２は、閾値格納部２０４に格納された閾値Ｓ０の値を上げる。 The flow of voice recognition processing between the in-vehicle information device 200 and the voice recognition server 300 according to the first embodiment will be described with reference to FIGS. 6 and 7. FIG. 6 shows a sequence in the case where the speech recognition server 300 determines that the speech recognition result score s is smaller than the threshold value s0 in step ST204, that is, it is determined that the speech recognition is successful. For example, when the in-vehicle information device 200 transmits voice data having the content “near ramen shop” to the voice recognition server 300, if the voice recognition server 300 determines that the score s of the recognition result is smaller than the threshold value s0, the voice In response to the transmitted voice data request, the recognition server 300 searches for a ramen shop, and transmits the search result ramen shop list to the in-vehicle information device 200 as response data. The audio output unit 208 of the in-vehicle information device 200 outputs response data received via the speaker 10. At this time, since the voice recognition by the voice recognition server 300 is successful, the determination condition learning unit 212 of the in-vehicle information device 200 increases the value of the threshold value S0 stored in the threshold value storage unit 204.

一方、図７は、音声認識サーバ３００において、ステップＳＴ２０４で音声認識結果のスコアｓが閾値ｓ０以上であった場合、すなわち音声認識が失敗したと判断された場合のシーケンスを示している。この場合には、音声認識サーバ３００が音声データの内容を誤って認識している可能性があるため、車載情報機器２００の「近くのラーメン屋」という音声に対し、応答データ作成部３０５は、例えば「近くのラーメン屋ですか？」といった確認のための応答データを作成し、車載情報機器２００に送信する。車載情報機器２００の音声出力部２０８は、スピーカー１０を介して受信した応答データを出力する。ユーザが出力された音声に対して、例えば「はい」と答えると、「はい」という内容の音声データが音声認識サーバ３００に送信される。音声認識サーバ３００は、「はい」という内容の音声データを確認して、検索結果のラーメン屋リストを車載情報機器２００に送信する。これにより、ユーザが目的のデータを得るまでの通信手順を最小にできるとともに、誤認識による無駄な通信を避けることが可能となる。 On the other hand, FIG. 7 shows a sequence when the speech recognition server 300 determines that the speech recognition result score s is greater than or equal to the threshold value s0 in step ST204, that is, it is determined that speech recognition has failed. In this case, since there is a possibility that the voice recognition server 300 has erroneously recognized the content of the voice data, the response data creation unit 305 responds to the voice “near ramen shop” of the in-vehicle information device 200. For example, response data for confirmation such as “Is it a nearby ramen shop?” Is created and transmitted to the in-vehicle information device 200. The audio output unit 208 of the in-vehicle information device 200 outputs response data received via the speaker 10. For example, when the user answers “Yes” to the output voice, the voice data having the content “Yes” is transmitted to the voice recognition server 300. The voice recognition server 300 confirms the voice data with the content “Yes”, and transmits the ramen shop list of the search result to the in-vehicle information device 200. As a result, the communication procedure until the user obtains the target data can be minimized, and unnecessary communication due to erroneous recognition can be avoided.

以上のように、この実施の形態１によれば、車載情報機器２００において、音声データ判定部２０７が入力された音声データ自身のノイズ量を判定すると共に、判定部２０３が様々なノイズ発生要因を総合して音声データの認識可能性を判断し、音声認識サーバ３００への送信を制御するようにしたので、音声認識サーバ３００での音声認識の失敗をできるだけ排除することが可能である。このため、無駄な通信を減らし、通信料金を削減することができる。また、誤認識の発生による無駄なユーザ操作も省くことができ、操作効率が向上する。 As described above, according to the first embodiment, in the in-vehicle information device 200, the audio data determination unit 207 determines the noise amount of the input audio data itself, and the determination unit 203 causes various noise generation factors. Overall, the possibility of recognizing the voice data is determined and the transmission to the voice recognition server 300 is controlled. Therefore, the voice recognition failure in the voice recognition server 300 can be eliminated as much as possible. For this reason, useless communication can be reduced and communication charges can be reduced. In addition, useless user operations due to occurrence of erroneous recognition can be omitted, and the operation efficiency is improved.

また、音声認識サーバ３００は、音声認識が成功したと判断したときは、車載情報機器２００へ送信データに対する応答内容を含む応答データを送信し、音声認識が失敗したと判断したときは、車載情報機器２００に対して送信データの内容を確認する内容の応答データを送信するようにしたので、音声認識サーバ３００からの無駄なデータの送信を避けることができる。 When the voice recognition server 300 determines that the voice recognition is successful, the voice recognition server 300 transmits response data including the response content to the transmission data to the in-vehicle information device 200. When the voice recognition server 300 determines that the voice recognition has failed, the in-vehicle information Since the response data having the content for confirming the content of the transmission data is transmitted to the device 200, it is possible to avoid unnecessary data transmission from the voice recognition server 300.

さらに、車載情報機器２００は、音声認識サーバ３００から受信した応答データに基づいて認識結果判定部２１１で音声認識サーバ３００での音声認識が成功したか失敗したかを判定し、成功した場合には閾値格納部２０４に格納された閾値の値を上げ、失敗した場合には閾値を下げるようにしたので、音声認識サーバ３００へ送信する音声データの信頼性をより高めることができると共に、必要以上に送信する条件が厳しくなるのを防ぐことができる。 Furthermore, the in-vehicle information device 200 determines whether or not the speech recognition at the speech recognition server 300 is successful or unsuccessful by the recognition result determination unit 211 based on the response data received from the speech recognition server 300. Since the threshold value stored in the threshold value storage unit 204 is increased and the threshold value is decreased if the threshold value is unsuccessful, the reliability of the voice data to be transmitted to the voice recognition server 300 can be further increased, and more than necessary. It is possible to prevent the transmission conditions from becoming severe.

また、実施の形態１では、カーナビゲーション装置１６の機能を利用して、走行地点の情報を取得し雑音量の判定に用いるようにしたので、カーナビゲーション装置１６を有効に利用してより精度の高い音声認識システムを実現することができる。 In the first embodiment, the function of the car navigation device 16 is used to acquire travel point information and use it to determine the amount of noise. Therefore, the car navigation device 16 can be used effectively to achieve higher accuracy. A high voice recognition system can be realized.

実施の形態２．
図８は、この発明の実施の形態２による、ホームネットワーク上の音声認識システム５００の構成を示すブロック図である。音声認識システム５００は、ホームネットワーク９００上の情報端末６００、および音声認識サーバ７００を備えている。情報端末６００および音声認識サーバ７００は、通信回線を介して接続されている。情報端末６００および音声認識サーバ７００の構成は、実施の形態１の車載情報機器２００および音声認識サーバ３００と同様である。 Embodiment 2. FIG.
FIG. 8 is a block diagram showing a configuration of a voice recognition system 500 on the home network according to the second embodiment of the present invention. The voice recognition system 500 includes an information terminal 600 on a home network 900 and a voice recognition server 700. The information terminal 600 and the voice recognition server 700 are connected via a communication line. The configurations of the information terminal 600 and the voice recognition server 700 are the same as those of the in-vehicle information device 200 and the voice recognition server 300 of the first embodiment.

情報端末６００は、ホームネットワーク９００のホームサーバ８００に接続されている。ホームサーバ８００には、例えばテレビ８０１、オーディオ機器８０２、洗濯機８０３、エアコン８０４、パソコン８０５が接続されており、ホームサーバ８００は、これらの家屋内の電機機器や情報機器を一元管理している。 The information terminal 600 is connected to the home server 800 of the home network 900. For example, a television 801, an audio device 802, a washing machine 803, an air conditioner 804, and a personal computer 805 are connected to the home server 800, and the home server 800 centrally manages the electrical devices and information devices in the house. .

次に、実施の形態２による音声認識システム５００の動作について説明する。
情報端末６００および音声認識サーバ７００の動作は、実施の形態１の図２および図５のフローチャートで示した動作と同様である。実施の形態２では、車載情報機器２００の車両情報取得部２０１と同様の機能により、ホームサーバ８００を介してノイズ発生原因となる要素に関する情報を取得する。それらの情報は、具体的にはテレビ８０１の音量情報、オーディオ機器８０２の音量情報、洗濯機８０３の稼動状態、エアコン８０４の送風量等である。ノイズ量判断部２０２は、取得した情報に基づいて各々のノイズ量判定値Ｓを出力する。判定部２０３は、ノイズ量判定値Ｓと閾値格納部２０４に保持されたそれぞれの要素の閾値Ｓ０とを比較し、音声認識サーバ７００へ送信するか否かを決定する。 Next, the operation of the speech recognition system 500 according to Embodiment 2 will be described.
The operations of information terminal 600 and voice recognition server 700 are the same as the operations shown in the flowcharts of FIGS. 2 and 5 of the first embodiment. In the second embodiment, information regarding an element that causes noise generation is acquired via the home server 800 by the same function as the vehicle information acquisition unit 201 of the in-vehicle information device 200. Specifically, the information includes the volume information of the television 801, the volume information of the audio device 802, the operating state of the washing machine 803, the air volume of the air conditioner 804, and the like. The noise amount determination unit 202 outputs each noise amount determination value S based on the acquired information. The determination unit 203 compares the noise amount determination value S with the threshold value S0 of each element held in the threshold value storage unit 204, and determines whether or not to transmit to the voice recognition server 700.

以上のように、この実施の形態２によれば、家屋内で情報端末６００を用いて音声認識サーバ７００と音声によるやり取りをする場合に、情報端末６００がホームサーバ８００を利用して家屋内で雑音発生の原因となる様々な電気機器の状態情報を収集し、音声データのノイズ量を判定して音声認識サーバ７００への送信を制御するようにしたので、音声認識サーバ７００での音声認識の失敗をできるだけ排除することができる。このため、無駄な通信を減らし、通信料金を削減することができる。また、誤認識の発生による無駄なユーザ操作も省くことができ、操作効率が向上する。 As described above, according to the second embodiment, when information is exchanged with the voice recognition server 700 using the information terminal 600 in the house, the information terminal 600 can use the home server 800 in the house. Since the state information of various electrical devices that cause noise generation is collected, the amount of noise in the voice data is determined, and transmission to the voice recognition server 700 is controlled. Failure can be eliminated as much as possible. For this reason, useless communication can be reduced and communication charges can be reduced. In addition, useless user operations due to occurrence of erroneous recognition can be omitted, and the operation efficiency is improved.

なお、実施の形態２では、音声認識システム５００のクライアントは情報端末６００に実装されているが、例えば、テレビ８０１やパソコン８０５等の電機機器、またはホームサーバ８００に実装されていてもよい。 In the second embodiment, the client of the speech recognition system 500 is mounted on the information terminal 600, but may be mounted on an electrical device such as the television 801 or the personal computer 805 or the home server 800, for example.

この発明の実施の形態１による、車載用の音声認識システムの構成を示すブロック図である。It is a block diagram which shows the structure of the vehicle-mounted speech recognition system by Embodiment 1 of this invention. この発明の実施の形態１による、車載情報機器の処理のフローチャートである。It is a flowchart of a process of the vehicle-mounted information apparatus by Embodiment 1 of this invention. この発明の実施の形態１による、ノイズ量判定条件を説明するための図である。It is a figure for demonstrating the noise amount determination conditions by Embodiment 1 of this invention. この発明の実施の形態１による、ノイズ量判定条件を説明するための図である。It is a figure for demonstrating the noise amount determination conditions by Embodiment 1 of this invention. この発明の実施の形態１による、音声認識サーバの処理のフローチャートである。It is a flowchart of the process of the speech recognition server by Embodiment 1 of this invention. この発明の実施の形態１による、音声認識処理のシーケンスを示す図である。It is a figure which shows the sequence of the speech recognition process by Embodiment 1 of this invention. この発明の実施の形態１による、音声認識処理のシーケンスを示す図である。It is a figure which shows the sequence of the speech recognition process by Embodiment 1 of this invention. この発明の実施の形態２による、ホームネットワーク上の音声認識システムの構成を示すブロック図である。It is a block diagram which shows the structure of the speech recognition system on a home network by Embodiment 2 of this invention.

Explanation of symbols

１０スピーカー、１１マイク、１２車速計、１３窓の開閉装置、１４ワイパー駆動装置、１５ウィンカー、１６カーナビゲーション装置、１７車内オーディオ機器、１８車内空調機、１００，５００音声認識システム、２００車載情報機器（情報端末）、２０１車両情報取得部、２０２ノイズ量判断部、２０３判定部（雑音量判定部、判定部）、２０４閾値格納部、２０５音声入力部、２０６Ｓ／Ｎ比取得部、２０７音声データ判定部（判定部）、２０８音声出力部、２０９制御部、２１０通信部、２１１認識結果判定部、２１２判定条件学習部、３００，７００音声認識サーバ、３０１通信部、３０２制御部、３０３音声認識部、３０４認識結果判定部、３０５応答データ作成部、３０６認識辞書記憶部、６００情報端末、８００ホームサーバ、、８０１テレビ、８０２オーディオ機器、８０３洗濯機、８０４エアコン、８０５パソコン、９００ホームネットワーク。 DESCRIPTION OF SYMBOLS 10 Speaker, 11 Microphone, 12 Speedometer, 13 Window opening and closing device, 14 Wiper drive device, 15 Winker, 16 Car navigation device, 17 Car audio equipment, 18 Car air conditioner, 100,500 Voice recognition system, 200 Car information system (Information terminal), 201 vehicle information acquisition unit, 202 noise amount determination unit, 203 determination unit (noise amount determination unit, determination unit), 204 threshold storage unit, 205 voice input unit, 206 S / N ratio acquisition unit, 207 voice Data determination unit (determination unit), 208 voice output unit, 209 control unit, 210 communication unit, 211 recognition result determination unit, 212 determination condition learning unit, 300,700 speech recognition server, 301 communication unit, 302 control unit, 303 voice Recognition unit, 304 Recognition result determination unit, 305 Response data creation unit, 306 Recognition word Storage unit, 600 information terminal, 800 home server ,, 801 TV, 802 audio equipment, 803 washing machines, 804 Air Conditioning, 805 PC, 900 home network.

Claims

A determination unit that determines whether or not the voice data input through the voice input unit is correctly recognized;
An information terminal including a communication unit that transmits only voice data that is determined to be correctly voice-recognized by the determination unit to a voice recognition server.

The determination unit is a voice data determination unit that determines the reliability of the voice data itself based on the SN ratio between the input voice data and the ambient noise when the voice data is input;
The information terminal according to claim 1, further comprising a noise amount determination unit that determines whether correct voice recognition of the voice data is possible based on information about an element that causes noise generation.

The information terminal is an in-vehicle information device, and the noise amount determination unit determines whether correct voice recognition is possible based on information about a travel point at the time of voice data input acquired from a car navigation system. 2. Information terminal according to 2.

3. The information according to claim 2, wherein the information terminal is an in-vehicle information device, and the noise amount determination unit determines whether correct voice recognition is possible based on a traveling speed at the time of voice data input acquired from a speedometer. Terminal.

3. The information terminal according to claim 2, wherein the information terminal is an in-vehicle information device, and the noise amount determination unit determines whether correct voice recognition is possible based on an open / closed state of the vehicle window when voice data is input.

3. The information terminal according to claim 2, wherein the information terminal is an in-vehicle information device, and the noise amount determination unit determines whether correct voice recognition is possible based on a volume of the audio device at the time of inputting voice data.

3. The information terminal according to claim 2, wherein the information terminal is an in-vehicle information device, and the noise amount determination unit determines whether correct voice recognition is possible based on an operation state of the wiper when voice data is input.

3. The information terminal according to claim 2, wherein the information terminal is an in-vehicle information device, and the noise amount determination unit determines whether correct voice recognition is possible based on whether or not the winker at the time of voice data input is operating. Information terminal.

3. The information terminal according to claim 2, wherein the information terminal is an in-vehicle information device, and the noise amount determination unit determines whether or not correct voice recognition is possible based on an air volume of the in-vehicle air conditioner when voice data is input. .

The information terminal is an information terminal on a home network, and the noise amount determination unit determines whether correct voice recognition is possible based on an operating state of a device connected to the home network when voice data is input. The information terminal according to claim 2.

A recognition result determination unit that determines whether the voice data transmitted to the voice recognition server is correctly voice-recognized by the voice recognition server based on the content of the response data from the voice recognition server;
When it is determined that the speech recognition is correctly recognized by the recognition result determination unit, the condition for determining that the speech data can be recognized by the noise amount determination unit is relaxed, and when it is determined that the speech recognition is not correctly performed, The information terminal according to any one of claims 2 to 10, further comprising a determination condition learning unit that makes conditions for determination stricter.

A voice recognition unit that performs voice recognition of voice data received from the information terminal;
A recognition result determination unit that compares the recognition result by the voice recognition unit with dictionary data and outputs the reliability of the recognition result;
If the reliability is greater than or equal to a threshold value, response data having response contents to the received voice data is created. If the reliability does not reach the threshold value, the recognition result of the received voice data is A response data creation unit that creates response data with content to check whether it is correct,
A speech recognition server comprising a communication unit that transmits response data created by the response data creation unit to the information terminal.

A speech recognition system comprising a speech recognition server and an information terminal,
In the information terminal, the determination unit determines whether or not the voice data input via the voice input unit is correctly recognized by the speech recognition server, and the sound determined to be correctly recognized by the determination unit Send only data to the voice recognition server via the communication unit,
The voice recognition server performs voice recognition of voice data received from the information terminal via a communication unit in a voice recognition unit, and compares a recognition result by the voice recognition unit with dictionary data in a recognition result determination unit. To output the reliability of the recognition result, and when the reliability is equal to or greater than a certain threshold, the response data generating unit generates response data having response contents for the received voice data, and the reliability is When the threshold value is not reached, response data having a content for confirming whether or not the recognition result of the received voice data is correct is created, and the response data is transmitted to the information terminal via a communication unit. Voice recognition system.