JP2006330577A

JP2006330577A - Device and method for speech recognition

Info

Publication number: JP2006330577A
Application number: JP2005157309A
Authority: JP
Inventors: Masaru Okochi; 優大河内; Toshiyuki Momomoto; 利行百本
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2005-05-30
Filing date: 2005-05-30
Publication date: 2006-12-07

Abstract

<P>PROBLEM TO BE SOLVED: To provide "a speech recognition device and a speech recognition method" which can increase the possibility that desire facilities can be extracted and shorten the time of processing for speech recognition even when a user abbreviates and pronounces a facility name. <P>SOLUTION: Principal portions of facility names are registered in a speech recognition dictionary as objects of speech recognition to increase the possibility that desired facilities are extracted, and the number of candidates for readings registered in the speech recognition dictionary can be decreased as compared with a case wherein candidates for readings which are mispronounced, and abbreviated and pronounced are all registered in the speech recognition dictionary, thereby reducing the amount of data in the speech recognition dictionary. Further, recognition processing targets a dictionary having a dictionary ID specified with a current position and then the number of candidates to be retrieved is decreased to make the time of processing for audio recognition shorter than when retrieval from all speech recognition dictionaries is performed. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、ユーザがマイクなどから入力した音声を認識し、施設名などを検索する音声認識装置及び音声認識方法に関するものである。 The present invention relates to a voice recognition apparatus and a voice recognition method for recognizing voice input by a user from a microphone or the like and searching for a facility name or the like.

近年、ナビゲーション装置における目的地となる施設名の入力を、音声認識機能を利用して行うことが可能となっている。ところで、このようなナビゲーション装置においては、ユーザがマイクなどにより入力した音声と音声認識辞書に登録されている認識単語の読み方とが完全に一致しない限り入力された音声に該当する施設を抽出することができない。そのため、ユーザが施設名を省略して発音した場合などでは、所望の施設を呼び出すことができなかった。これを解決するために、ある施設に対して、正式な読み方だけでなく、誤って発音される可能性のある読み方や省略されて発音される可能性のある読み方を音声認識辞書として登録しておく技術が知られている（例えば、特許文献１など）。
特開平９−４２９８８号公報 In recent years, it has become possible to input a facility name as a destination in a navigation apparatus using a voice recognition function. By the way, in such a navigation apparatus, the facility corresponding to the input voice is extracted unless the voice input by the user using a microphone or the like and the reading of the recognized word registered in the voice recognition dictionary are completely the same. I can't. For this reason, when the user pronounces the facility name while omitting it, the desired facility cannot be called. In order to solve this problem, not only a formal reading method but also a reading method that may be pronounced incorrectly and a reading method that may be pronounced omitted are registered as a voice recognition dictionary. The technique to put is known (for example, patent document 1 etc.).
JP-A-9-42988

しかしながら、前述した特許文献１に記載の音声認識装置によれば、あらかじめ音声認識辞書に登録されている読み方にしか対応することができないので、ある施設に対して、正式な読み方だけでなく、誤って発音される可能性のある読み方や省略されて発音される可能性のある読み方を多数登録しておく必要があり、音声認識辞書のデータ量が増えてしまうという問題があった。また、誤って発音される可能性のある読み方や省略されて発音される可能性のある読み方の候補が多くなると、入力された音声と一致しているか否かを調べるための処理の時間が長くなってしまうという問題があった。 However, according to the speech recognition apparatus described in Patent Document 1 described above, it is only possible to deal with readings registered in the speech recognition dictionary in advance. Therefore, it is necessary to register many readings that can be pronounced and pronunciations that can be omitted, and there is a problem that the data amount of the speech recognition dictionary increases. In addition, if there are many readings that may be pronounced in error or that may be omitted and pronounced, the processing time for checking whether or not it matches the input speech will increase. There was a problem of becoming.

ところで、緯度と経度で分割したブロックに含まれる地名と施設名の音素データとその関連データとをブロック毎にブロック単位音声認識辞書として記録し、現在位置が所属するブロック及び周辺の所定範囲のブロックに対応するブロック単位音声認識辞書を対象として検索を行う技術が知られている（例えば、特許文献２など）。
特開２００３−４４７０号公報 By the way, the phoneme data of the place name and the facility name included in the block divided by latitude and longitude and the related data are recorded as a block unit voice recognition dictionary for each block, and the block to which the current position belongs and the blocks in the predetermined range around it. A technique for performing a search for a block-unit speech recognition dictionary corresponding to is known (for example, Patent Document 2).
Japanese Patent Laid-Open No. 2003-4470

しかしながら、前述した特許文献２に記載の音声認識装置によれば、音声認識の対象となる施設の数を現在位置が所属するブロック内に属する施設数に限定することはできるが、ユーザが施設名を省略して発音した場合などでは、所望の施設を抽出することができないという問題があった。 However, according to the speech recognition apparatus described in Patent Document 2 described above, the number of facilities that are subject to speech recognition can be limited to the number of facilities that belong to the block to which the current position belongs. There is a problem that a desired facility cannot be extracted in the case where the pronunciation is omitted.

本発明は、このような問題を解決するために成されたものであり、音声認識辞書のデータ量の増加を極力抑えつつ、ユーザが施設名を省略して発音した場合に、所望の施設を抽出することができる可能性を高めることができるようにすることを目的とする。 The present invention has been made to solve such a problem, and suppresses an increase in the amount of data in the speech recognition dictionary as much as possible, and when a user pronounces a facility by omitting the facility name, the desired facility is determined. The purpose is to increase the possibility of extraction.

また、本発明の他の態様では、音声認識の処理時間の増加を極力抑えつつ、ユーザが施設名を省略して発音した場合に、所望の施設を抽出することができる可能性を高めることができるようにすることを目的とする。 Further, in another aspect of the present invention, it is possible to increase the possibility that a desired facility can be extracted when the user omits the facility name and pronounces it while suppressing an increase in the speech recognition processing time as much as possible. The purpose is to be able to.

上記した課題を解決するために、本発明では、施設名の主要な部分を音声認識の対象として施設の位置と共に音声認識辞書に登録し、ユーザの現在位置に対して所定の範囲内に存在する施設を対象として音声認識辞書の検索を行い、ユーザが発音した主要な部分の音声に一致する施設を抽出するようにしている。 In order to solve the above-described problems, in the present invention, the main part of the facility name is registered in the speech recognition dictionary together with the location of the facility as a target of speech recognition, and exists within a predetermined range with respect to the current location of the user. A speech recognition dictionary is searched for the facility, and the facility that matches the voice of the main part pronounced by the user is extracted.

また、本発明の他の態様によれば、ユーザの現在位置に対応する地域に関する音声をユーザが発音した音声の前後に合成して、合成した音声に一致する施設を音声認識辞書から抽出するようにしている。 Further, according to another aspect of the present invention, the voice related to the region corresponding to the current position of the user is synthesized before and after the voice pronounced by the user, and the facility that matches the synthesized voice is extracted from the voice recognition dictionary. I have to.

また、本発明の他の態様によれば、施設名を複数の単語の組み合わせで表した語彙パターンを音声認識の対象として施設の位置と共に音声認識辞書に登録し、ユーザの現在位置に対して所定の範囲内に存在する施設を対象として音声認識辞書の検索を行い、ユーザが発音したパターンの音声に一致する施設を抽出するようにしている。 Further, according to another aspect of the present invention, a vocabulary pattern in which a facility name is represented by a combination of a plurality of words is registered in a speech recognition dictionary together with the location of the facility as a speech recognition target, and predetermined for the current location of the user The speech recognition dictionary is searched for facilities existing within the range of, and the facilities that match the sound of the pattern pronounced by the user are extracted.

上記のように構成した本発明によれば、ユーザが施設名を省略して発話した場合でも、施設名の主要な部分が音声認識の対象として音声認識辞書に登録されているので、所望の施設が抽出される可能性を高めることができる。また、読み方の候補として施設名の主要な部分が音声認識辞書に登録されるので、誤って発音される可能性のある読み方や省略されて発音される可能性のある読み方の候補を音声認識辞書に全て登録する場合に比べて、音声認識辞書に登録される読み方の候補の数を減らすことができ、音声認識辞書のデータ量を減らすことができる。また、ユーザの現在位置により検索対象となる読み方の候補の数が絞られるので、音声認識辞書の全てを検索する場合に比べて、音声認識の処理の時間を短くすることができる。 According to the present invention configured as described above, even when a user omits a facility name and speaks, the main part of the facility name is registered in the speech recognition dictionary as a target for speech recognition. Can increase the likelihood of being extracted. In addition, since the main part of the facility name is registered in the speech recognition dictionary as a candidate for reading, the candidate for reading that may be pronounced incorrectly or may be pronounced by omitting it Compared with the case where all are registered, the number of reading candidates registered in the speech recognition dictionary can be reduced, and the data amount of the speech recognition dictionary can be reduced. In addition, since the number of reading candidates to be searched is narrowed by the current position of the user, the time for the speech recognition process can be shortened compared to the case where all the speech recognition dictionaries are searched.

また、本発明の他の態様によれば、ユーザが施設名を省略して発話した場合でも、ユーザの発話した音声にユーザの現在位置の地域の音声を合成した音声が音声認識のキーとして使われるので、音声認識辞書に施設の正式名称しか登録されていなくても所望の施設が抽出される可能性を高めることができる。また、誤って発音される可能性のある読み方や省略されて発音される可能性のある読み方の候補を音声認識辞書に登録する必要がなくなり、施設の正式名称のみが音声認識辞書に登録される。これにより、音声認識辞書のデータ量を増やさないようにすることができる。 Further, according to another aspect of the present invention, even when the user utters without the facility name, the voice obtained by synthesizing the voice of the user's current location and the voice of the user's current location is used as the voice recognition key. Therefore, even if only the official name of the facility is registered in the voice recognition dictionary, the possibility that a desired facility is extracted can be increased. In addition, there is no need to register readings that may be pronounced accidentally or readings that may be omitted and registered in the speech recognition dictionary, and only the official name of the facility is registered in the speech recognition dictionary. . As a result, the amount of data in the speech recognition dictionary can be prevented from increasing.

また、本発明の他の態様によれば、ユーザが施設名を省略して発話した場合でも、施設名を複数の単語の組み合わせで表した語彙パターンが音声認識の対象として音声認識辞書に登録されているので、所望の施設が抽出される可能性を高めることができる。また、誤って発音される可能性のある読み方や省略されて発音される可能性のある読み方の候補を音声認識辞書に個々に登録する場合に比べて、音声認識辞書に登録される読み方の候補の数を減らすことができ、音声認識辞書のデータ量を減らすことができる。また、ユーザの現在位置により検索対象となる読み方の候補の数が絞られるので、音声認識辞書の全てを検索する場合に比べて、音声認識の処理の時間を短くすることができる。 Further, according to another aspect of the present invention, even when the user utters while omitting the facility name, a vocabulary pattern representing the facility name as a combination of a plurality of words is registered in the speech recognition dictionary as a target of speech recognition. Therefore, the possibility that a desired facility is extracted can be increased. Also, compared to the case where readings that may be pronounced in error or readings that may be pronounced as omitted are individually registered in the speech recognition dictionary, the reading candidates registered in the speech recognition dictionary And the amount of data in the speech recognition dictionary can be reduced. In addition, since the number of reading candidates to be searched is narrowed by the current position of the user, the time for the speech recognition process can be shortened compared to the case where all the speech recognition dictionaries are searched.

（第１の実施形態）
以下、本発明の第１の実施形態を図面に基づいて説明する。図１は、第１の実施形態による音声認識装置の構成例を示すブロック図である。ここで、本実施形態による音声認識装置は、車載用のナビゲーション装置などに搭載される。図１において、１は音声入力部であり、図示しないマイクなどを通じて話者音声を入力する。 (First embodiment)
DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, a first embodiment of the invention will be described with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration example of the speech recognition apparatus according to the first embodiment. Here, the voice recognition device according to the present embodiment is mounted on a vehicle-mounted navigation device or the like. In FIG. 1, reference numeral 1 denotes a voice input unit which inputs a speaker voice through a microphone or the like (not shown).

２は音声認識辞書記憶部であり、音声認識辞書を記憶する。ここで、音声認識辞書は、図２に示すように、辞書ＩＤ（Identification）と施設ＩＤと施設名と正式名称と要部名称とを関連付けている。辞書ＩＤは、特許請求の範囲の位置情報に対応するものであり、都道府県や市区町村などの地域毎に設定されており、同じ地域の施設は同じ辞書ＩＤを有する。なお、第1の実施形態では、辞書ＩＤは、市区町村毎に設定しているが、行政界の区分に応じて辞書ＩＤを付すようにしても良い。この図２の例に示すように、本発明の位置情報は、特定の位置を座標等で表すものである必要は必ずしもなく、少なくとも地域を区別できる情報であれば良い。 A voice recognition dictionary storage unit 2 stores a voice recognition dictionary. Here, as shown in FIG. 2, the speech recognition dictionary associates a dictionary ID (Identification), a facility ID, a facility name, a formal name, and a main part name. The dictionary ID corresponds to the position information in the claims, and is set for each region such as a prefecture or a municipality, and facilities in the same region have the same dictionary ID. In the first embodiment, the dictionary ID is set for each municipality. However, the dictionary ID may be attached according to the division of administrative boundaries. As shown in the example of FIG. 2, the position information of the present invention does not necessarily represent a specific position with coordinates or the like, and may be information that can distinguish at least a region.

また、施設ＩＤは、施設毎に設定されており、すべての施設が異なる施設ＩＤを有する。また、施設名は、施設の正式な名称を示すテキストデータである。また、正式名称は、施設の正式な名称を示す音素片データである。また、要部名称は、施設名から都道府県や市区町村などの地域を示す文字列を削除した要部を示す音素片データである。ここで、要部名称としては、美術館、動物園、植物園、役所、役場、図書館、警察署、消防署、裁判所、庁舎、学校、病院などの音素片データがあり、これらの要部名称に地域を示す音素片データを付加することで正式名称とすることができる。 The facility ID is set for each facility, and all the facilities have different facility IDs. The facility name is text data indicating the official name of the facility. The official name is phoneme piece data indicating the official name of the facility. The main part name is phoneme piece data indicating a main part in which a character string indicating a region such as a prefecture or a city is deleted from the facility name. Here, there are phoneme segment data such as museums, zoos, botanical gardens, government offices, government offices, libraries, police stations, fire departments, courts, government buildings, schools, hospitals, etc. A formal name can be obtained by adding phoneme data.

３は現在位置検出部であり、ＧＰＳ（Global Positioning System）受信機や自立航法センサなどを用いた公知技術によりユーザの現在位置（正確には音声認識装置の現在位置）を検出する。 Reference numeral 3 denotes a current position detection unit that detects a user's current position (more precisely, the current position of the voice recognition device) by a known technique using a GPS (Global Positioning System) receiver or a self-contained navigation sensor.

４は音声認識部であり、音声入力部１に入力された音声と音声認識辞書記憶部２に記憶された音声認識辞書の要部名称の音素片データとを比較することにより、入力音声を認識する。具体的には、音声認識部４は、音声入力部１より音声が入力されたときに、まず、現在位置検出部３から入力した現在位置に対応する辞書ＩＤを判定する。ここで、音声認識部４は、現在位置検出部３から入力した現在位置がどの地域であるかを判定する。そして、音声認識部４は、判定された地域と一致する地域の辞書ＩＤを抽出する。また、音声認識部４は、現在位置に対応する辞書ＩＤが付された音声認識辞書のみを対象として、入力音声と一致する読み方の要部名称を有する施設の施設名を特定して抽出する。 Reference numeral 4 denotes a voice recognition unit that recognizes an input voice by comparing the voice input to the voice input unit 1 with the phoneme piece data of the main part name of the voice recognition dictionary stored in the voice recognition dictionary storage unit 2. To do. Specifically, when a voice is input from the voice input unit 1, the voice recognition unit 4 first determines a dictionary ID corresponding to the current position input from the current position detection unit 3. Here, the voice recognition unit 4 determines which region the current position input from the current position detection unit 3 is. Then, the voice recognition unit 4 extracts a dictionary ID of a region that matches the determined region. Further, the speech recognition unit 4 specifies and extracts the facility name of the facility having the main part name of the reading that matches the input speech for only the speech recognition dictionary with the dictionary ID corresponding to the current position.

５は表示制御部であり、音声認識部４によって抽出された施設名を図示しない表示部に表示する。 Reference numeral 5 denotes a display control unit, which displays the facility name extracted by the voice recognition unit 4 on a display unit (not shown).

このように構成した音声認識装置において、例えば、ユーザがマイクに向かって「美術館」と発話した場合には、音声入力部１は、「美術館」という音声を入力する。また、現在位置検出部３によりユーザの現在位置が「○○市」であることを検出したとする。音声認識部４は、現在位置検出部３で検出された現在位置が「○○市」であることから、音声認識辞書記憶部２に記憶されている音声認識辞書より、「○○市」に該当する辞書ＩＤを有するものを選択してワークＲＡＭなどの実際に検索を行う領域にロードする（ここでは、辞書ＩＤ＝Ｄ１をロードする）。そして、辞書ＩＤがＤ１のものを対象として「美術館」という入力音声の音声認識処理を実行することにより、入力音声と一致する「美術館」の要部名称を持つ施設名「○○市美術館」を抽出する。また、表示制御部５は、「○○市美術館」というテキストを表示部に表示する。これにより、ユーザは、「○○市美術館」を目的地として設定することができる。 In the voice recognition device configured as described above, for example, when the user speaks “museum” toward the microphone, the voice input unit 1 inputs the voice “museum”. Further, it is assumed that the current position detection unit 3 detects that the current position of the user is “XX city”. Since the current position detected by the current position detection unit 3 is “XX city”, the voice recognition unit 4 is changed to “XX city” from the voice recognition dictionary stored in the voice recognition dictionary storage unit 2. The one having the corresponding dictionary ID is selected and loaded into an area for actual search such as a work RAM (here, dictionary ID = D1 is loaded). Then, by executing speech recognition processing of the input voice “Museum” for the dictionary ID D1, the facility name “XX City Art Museum” having the main part name of “Museum” that matches the input voice is obtained. Extract. Further, the display control unit 5 displays the text “XX City Art Museum” on the display unit. Thereby, the user can set “XX City Art Museum” as the destination.

次に、第１の実施形態による音声認識装置の動作及び音声認識方法について説明する。図３は、第１の実施形態による音声認識装置の動作及び音声認識方法を示すフローチャートである。 Next, the operation of the speech recognition apparatus and the speech recognition method according to the first embodiment will be described. FIG. 3 is a flowchart illustrating the operation of the speech recognition apparatus and the speech recognition method according to the first embodiment.

図３において、まず、音声認識部４は、マイクなどを通じて音声入力部１により音声が入力されたか否かを調べる（ステップＳ１）。音声が入力されていないと音声認識部４にて判断した場合には（ステップＳ１にてＮＯ）、ステップＳ１の処理を繰り返す。一方、音声が入力されたと音声認識部４にて判断した場合には（ステップＳ１にてＹＥＳ）、音声認識部４は、現在位置検出部３にて検出された現在位置情報を入力する（ステップＳ２）。 In FIG. 3, first, the voice recognition unit 4 checks whether or not voice is input by the voice input unit 1 through a microphone or the like (step S1). If the voice recognition unit 4 determines that no voice is input (NO in step S1), the process in step S1 is repeated. On the other hand, when the voice recognition unit 4 determines that a voice is input (YES in step S1), the voice recognition unit 4 inputs the current position information detected by the current position detection unit 3 (step S1). S2).

音声認識部４は、入力した現在位置情報により、現在位置に対応する地域の辞書ＩＤの辞書を選択する（ステップＳ３）。そして、音声認識部４は、選択した辞書の中から、音声入力部１より入力した音声と一致する要部名称が存在するか否かを調べる（ステップＳ４）。音声入力部１より入力した音声と一致する要部名称が存在しないと音声認識部４にて判断した場合には（ステップＳ４にてＮＯ）、処理を終了する。なお、このような場合に、表示制御部５は、「該当する施設がありません」などのメッセージを生成し、図示しない表示部に表示させても良い。 The voice recognition unit 4 selects a dictionary having a dictionary ID of the region corresponding to the current position based on the input current position information (step S3). Then, the voice recognition unit 4 checks whether or not there is a main part name that matches the voice input from the voice input unit 1 from the selected dictionary (step S4). If the speech recognition unit 4 determines that there is no main part name that matches the speech input from the speech input unit 1 (NO in step S4), the process ends. In such a case, the display control unit 5 may generate a message such as “No corresponding facility” and display it on a display unit (not shown).

一方、音声入力部１より入力した音声と一致する要部名称が存在すると音声認識部４にて判断した場合には（ステップＳ４にてＹＥＳ）、音声認識部４は、認識された要部名称を有する施設名のテキストデータを表示制御部５に出力する。表示制御部５は、該当する施設名のテキストを図示しない表示部に表示させる（ステップＳ５）。なお、図２に示すように、実際には要部名称のみ認識の対象としている訳ではなく、正式名称も認識対象とする。つまり、入力音声が正式名称又は要部名称に一致すれば良い。 On the other hand, when the voice recognition unit 4 determines that there is a main part name that matches the voice input from the voice input unit 1 (YES in step S4), the voice recognition unit 4 recognizes the main part name recognized. Is output to the display control unit 5. The display control unit 5 displays the text of the corresponding facility name on a display unit (not shown) (step S5). In addition, as shown in FIG. 2, not only the main part name is actually recognized, but the official name is also recognized. That is, the input voice only needs to match the official name or the main part name.

以上、詳しく説明したように、第1の実施形態によれば、施設の要部名称を音声認識の対象として施設の位置と共に音声認識辞書に登録し、目的地などの所望の施設を音声認識によって検索するときに、ユーザの現在位置に対して選択される辞書の中からユーザが発話した音声に一致する要部名称を持つ施設の施設名を抽出するようにしている。これにより、ユーザが施設名を省略して発話した場合でも、要部名称が音声認識の対象として音声認識辞書に登録されているので、所望の施設が抽出される可能性を高めることができる。また、読み方の候補として施設の要部名称が音声認識辞書に登録されるので、誤って発音される可能性のある読み方や省略されて発音される可能性のある読み方の候補を音声認識辞書に全て登録する場合に比べて、音声認識辞書に登録される読み方の候補の数を減らすことができ、音声認識辞書のデータ量を減らすことができる。また、ユーザの現在位置により辞書が選択されて検索対象となる施設の数が絞られるので、音声認識辞書の全てを検索する場合に比べて、音声認識の処理の時間を短くすることができる。 As described above in detail, according to the first embodiment, the name of the main part of the facility is registered in the speech recognition dictionary together with the location of the facility as the target of speech recognition, and a desired facility such as a destination is recognized by speech recognition. When searching, the facility name of the facility having the main part name that matches the voice spoken by the user is extracted from the dictionary selected for the current position of the user. Thus, even when the user omits the facility name and speaks, the name of the main part is registered in the speech recognition dictionary as the target of speech recognition, so the possibility that a desired facility is extracted can be increased. In addition, since the name of the main part of the facility is registered in the speech recognition dictionary as a candidate for reading, the candidate for reading that may be pronounced incorrectly or omitted and may be pronounced in the speech recognition dictionary. Compared to the case where all are registered, the number of reading candidates registered in the speech recognition dictionary can be reduced, and the data amount of the speech recognition dictionary can be reduced. In addition, since the number of facilities to be searched is narrowed down by selecting a dictionary according to the current position of the user, the time for voice recognition processing can be shortened compared to the case of searching all of the voice recognition dictionary.

（第２の実施形態）
次に、本発明の第２の実施形態を図面に基づいて説明する。図４は、第２の実施形態による音声認識装置の構成例を示すブロック図である。なお、第1の実施形態と同一の構成要素については同一の符号を付し、説明を一部省略する。図４において、６は音声合成部であり、現在位置検出部３から入力した現在位置によって決まる地域に関する音素片データを音声入力部１に入力された音声の前後に合成する。ここで、音声入力部１に入力した音声を要部名称とし、現在位置検出部３から入力した現在位置に対応する地域の音素片データ（例えば、都道府県や市区町村など）を要部名称の前後に合成すると、音声合成部６では仮想施設名称が生成される。 (Second Embodiment)
Next, a second embodiment of the present invention will be described with reference to the drawings. FIG. 4 is a block diagram illustrating a configuration example of the speech recognition apparatus according to the second embodiment. Note that the same components as those in the first embodiment are denoted by the same reference numerals, and a part of the description is omitted. In FIG. 4, reference numeral 6 denotes a speech synthesizer, which synthesizes phoneme piece data related to the region determined by the current position input from the current position detector 3 before and after the speech input to the speech input unit 1. Here, the voice input to the voice input unit 1 is used as the main part name, and the phoneme segment data (for example, prefectures and municipalities) corresponding to the current position input from the current position detection unit 3 is used as the main part name. Are synthesized before and after, the voice synthesizer 6 generates a virtual facility name.

例えば、図５に示すように、１０個のパターンの仮想施設名称が生成される。音声入力部１によって入力された要部名称が「美術館」であり、現在位置が○○県□□市である場合には、以下の１０個のパターンの仮想施設名称が存在する。１０個のパターンは、美術館の前に○○県を合成するパターンＰ１と、美術館の前に○○県立を合成するパターンＰ２と、美術館の前に□□市を合成するパターンＰ３と、美術館の前に□□市立を合成するパターンＰ４と、美術館の前に○○県□□市を合成するパターンＰ５と、美術館の前に○○県□□市立を合成するパターンＰ６と、美術館に何も合成しないパターンＰ７と、美術館の後に○○県を合成するパターンＰ８と、美術館の後に□□市を合成するパターンＰ９と、美術館の後に○○県□□市を合成するパターンＰ１０とがある。 For example, as shown in FIG. 5, ten patterns of virtual facility names are generated. When the main part name input by the voice input unit 1 is “Museum” and the current position is □□□□ city, the following ten patterns of virtual facility names exist. The ten patterns are the pattern P1 that composes XX prefecture before the museum, the pattern P2 that composes XX prefecture before the museum, the pattern P3 that composes □□ city before the museum, There is a pattern P4 that synthesizes □□ city in front, a pattern P5 that synthesizes □□□ city in front of the museum, a pattern P6 that synthesizes □□ city in front of the museum, and nothing in the museum There are a pattern P7 that is not combined, a pattern P8 that combines the XX prefecture after the museum, a pattern P9 that combines the □ city after the museum, and a pattern P10 that combines the XX city after the museum.

また、音声認識部４は、音声合成部６にて生成された仮想施設名称の音声と音声認識辞書記憶部７に記憶された施設の正式名称の音素片データとを比較することにより、音声認識を行う。ここで、音声認識辞書記憶部７に記憶される音声認識辞書は、図６に示すように、施設ＩＤと施設名（テキストデータ）と正式名称（音素片データ）とを関連付けている。 The voice recognition unit 4 compares the voice of the virtual facility name generated by the voice synthesizer 6 with the phoneme piece data of the official name of the facility stored in the voice recognition dictionary storage unit 7 to recognize the voice. I do. Here, the speech recognition dictionary stored in the speech recognition dictionary storage unit 7 associates a facility ID, a facility name (text data), and a formal name (phoneme piece data) as shown in FIG.

このように構成した音声認識装置において、例えば、ユーザがマイクに向かって「美術館」と発話した場合には、音声入力部１は、「美術館」という音声を入力する。また、現在位置検出部３は、ユーザの現在位置が「○○県□□市」であることを検出する。音声合成部６は、現在位置検出部３から入力した現在位置が「○○県□□市」であることから、音声入力部１により入力された「美術館」という要部名称の前後に、地域の名称である「○○県」や「□□市」などの音素片データを合成して仮想施設名称を生成する。ここで、要部名称として「○○県□□市美術館」という正式名称が音声入力部１により入力された場合でも音声認識されるように、音声合成部６では、要部名称の前後に何も合成しないパターンＰ７の仮想施設名称も生成する。 In the voice recognition device configured as described above, for example, when the user speaks “museum” toward the microphone, the voice input unit 1 inputs the voice “museum”. In addition, the current position detection unit 3 detects that the current position of the user is “XX prefecture □□ city”. Since the current position input from the current position detection unit 3 is “XX prefecture □□ city”, the speech synthesizer 6 has a region before and after the main part name “Museum” input by the voice input unit 1. A virtual facility name is generated by synthesizing phoneme segment data such as “XX prefecture” and “□□ city”. Here, the speech synthesizing unit 6 determines what the main part name is before and after the main part name so that speech recognition is performed even when the official name “XX prefecture □□ art museum” is input by the voice input unit 1. The virtual facility name of the pattern P7 that is not synthesized is also generated.

音声認識部４は、前述したパターンＰ１〜パターンＰ１０の仮想施設名称の音素片データをキーとして、音声認識辞書記憶部７に記憶されている音声認識辞書を検索することにより、仮想施設名称と読み方が一致する「○○県□□市美術館」を抽出する。また、表示制御部５は、「○○県□□市美術館」というテキストを表示部に表示する。これにより、ユーザは、「○○県□□市美術館」を目的地として設定することができる。 The voice recognition unit 4 searches the voice recognition dictionary stored in the voice recognition dictionary storage unit 7 using the phoneme piece data of the virtual facility name of the patterns P1 to P10 described above as a key, thereby reading the virtual facility name and how to read it. “XX Museum of Art Museum” is extracted. Further, the display control unit 5 displays the text “XX Prefecture □□ City Art Museum” on the display unit. Thus, the user can set “XX Prefecture □□ City Art Museum” as the destination.

次に、第２の実施形態による音声認識装置の動作及び音声認識方法について説明する。図７は、第２の実施形態による音声認識装置の動作及び音声認識方法を示すフローチャートである。 Next, the operation of the speech recognition apparatus and the speech recognition method according to the second embodiment will be described. FIG. 7 is a flowchart showing the operation of the speech recognition apparatus and speech recognition method according to the second embodiment.

図７において、まず、音声合成部６は、マイクなどを通じて音声入力部１により音声が入力されたか否かを調べる（ステップＳ１１）。音声が入力されていないと音声合成部６にて判断した場合には（ステップＳ１１にてＮＯ）、ステップＳ１１の処理を繰り返す。一方、音声が入力されたと音声合成部６にて判断した場合には（ステップＳ１１にてＹＥＳ）、音声合成部６は、現在位置検出部３にて検出された現在位置情報を入力する（ステップＳ１２）。 In FIG. 7, first, the speech synthesizer 6 checks whether or not speech is input by the speech input unit 1 through a microphone or the like (step S11). If the voice synthesizer 6 determines that no voice is input (NO in step S11), the process in step S11 is repeated. On the other hand, when speech synthesizer 6 determines that a voice has been input (YES in step S11), speech synthesizer 6 inputs the current position information detected by current position detector 3 (step S11). S12).

音声合成部６は、入力した現在位置情報により、現在位置に対応する地域の名称を示す音素変データを入力された音声の前後に合成して仮想施設名称を生成する（ステップＳ１３）。そして、音声認識部４は、生成された仮想施設名称と音声認識辞書記憶部７に記憶された施設の正式名称とを比較して、一致するものが存在するか否かを調べる（ステップＳ１４）。仮想施設名称に該当する正式名称が存在しないと音声認識部４にて判断した場合には（ステップＳ１４にてＮＯ）、処理を終了する。なお、このような場合に、表示制御部５は、「該当する施設がありません」などのメッセージを生成し、図示しない表示部に表示させても良い。 The voice synthesis unit 6 generates a virtual facility name by synthesizing phoneme variation data indicating the name of the area corresponding to the current position before and after the inputted voice based on the input current position information (step S13). Then, the voice recognition unit 4 compares the generated virtual facility name with the official name of the facility stored in the voice recognition dictionary storage unit 7 and checks whether there is a match (step S14). . If the voice recognition unit 4 determines that there is no official name corresponding to the virtual facility name (NO in step S14), the process ends. In such a case, the display control unit 5 may generate a message such as “No corresponding facility” and display it on a display unit (not shown).

一方、仮想施設名称に該当する正式名称が存在すると音声認識部４にて判断した場合には（ステップＳ１４にてＹＥＳ）、音声認識部４は、認識された正式名称を有する施設の施設名のテキストデータを表示制御部５に出力する。表示制御部５は、該当する施設名のテキストを図示しない表示部に表示させる（ステップＳ１５）。 On the other hand, when the voice recognition unit 4 determines that there is a formal name corresponding to the virtual facility name (YES in step S14), the voice recognition unit 4 determines the facility name of the facility having the recognized formal name. The text data is output to the display control unit 5. The display control unit 5 displays the text of the corresponding facility name on a display unit (not shown) (step S15).

以上、詳しく説明したように、第２の実施形態によれば、ユーザが発話した音声にユーザの現在位置の地域の音声を合成した音声を合成し、音声認識のキーとして使用している。これにより、ユーザが施設名を省略して発話した場合でも、ユーザの現在位置に応じて施設名を補うことができるので、所望の施設が抽出される可能性を高めることができる。また、誤って発音される可能性のある読み方や省略されて発音される可能性のある読み方の候補を音声認識辞書に登録する必要がなくなり、施設の正式名称のみが音声認識辞書に登録される。これにより、音声認識辞書のデータ量を増やさないようにすることができる。 As described above in detail, according to the second embodiment, a voice obtained by synthesizing a voice in a region where the user is currently located is synthesized with a voice uttered by the user and used as a voice recognition key. Thereby, even when a user omits a facility name and speaks, the facility name can be supplemented according to the current position of the user, so that the possibility that a desired facility is extracted can be increased. In addition, there is no need to register readings that may be pronounced accidentally or readings that may be omitted and registered in the speech recognition dictionary, and only the official name of the facility is registered in the speech recognition dictionary. . As a result, the amount of data in the speech recognition dictionary can be prevented from increasing.

（第３の実施形態）
次に、本発明の第３の実施形態を図面に基づいて説明する。図８は、第３の実施形態による音声認識装置の構成例を示すブロック図である。なお、第1の実施形態と同一の構成要素については同一の符号を付し、説明を一部省略する。図８において、８は音声認識辞書記憶部であり、音声認識辞書を記憶する。ここで、音声認識辞書は、図９に示すように、辞書ＩＤと施設ＩＤと施設名と施設名を複数の単語に分割して１つ以上を組み合わせた語彙パターンとを関連付けている。ここで、施設名を複数の単語に分割して１つ以上組み合わせたパターンとしては、以下のようなものがある。例えば、施設名が「○○県立美術館」である場合には、施設名を「○○」、「○○県」、「県立」、「美術館」の４つに分割する。そして、図９に示すように、「○○」と「県立」と「美術館」とを組み合わせた第１のパターンと、「○○」と「ＮＵＬＬ」と「美術館」とを組み合わせた第２のパターンと、「ＮＵＬＬ」と「県立」と「美術館」とを組み合わせた第３のパターンと、「ＮＵＬＬ」と「ＮＵＬＬ」と「美術館」とを組み合わせた第４のパターンと、「○○県」と「ＮＵＬＬ」と「美術館」とを組み合わせた第５のパターンとを表す語彙パターンが音声認識辞書に登録される。ここで、「ＮＵＬＬ」は単語が無い状態を示す。 (Third embodiment)
Next, a third embodiment of the present invention will be described with reference to the drawings. FIG. 8 is a block diagram illustrating a configuration example of the speech recognition apparatus according to the third embodiment. Note that the same components as those in the first embodiment are denoted by the same reference numerals, and a part of the description is omitted. In FIG. 8, 8 is a voice recognition dictionary storage unit for storing a voice recognition dictionary. Here, as shown in FIG. 9, the speech recognition dictionary associates a dictionary ID, a facility ID, a facility name, and a facility name with a vocabulary pattern in which one or more are combined and divided into a plurality of words. Here, as a pattern in which the facility name is divided into a plurality of words and combined one or more, there are the following. For example, when the facility name is “XX prefectural art museum”, the facility name is divided into “XX”, “XX prefecture”, “prefectural”, and “museum”. Then, as shown in FIG. 9, a first pattern that combines “XX”, “Prefectural”, and “Museum”, and a second pattern that combines “XX”, “NULL”, and “Museum”. A pattern, a third pattern combining “NULL”, “prefectural” and “museum”, a fourth pattern combining “NULL”, “NULL” and “museum”, and “XX prefecture” And a vocabulary pattern representing a fifth pattern combining “NULL” and “museum” is registered in the speech recognition dictionary. Here, “NULL” indicates a state where there is no word.

９は音声認識部であり、現在位置検出部３から入力した現在位置に対応する辞書ＩＤを判定し、それ以外の辞書ＩＤをマスクして音声認識の対象から除外する。ここで、辞書ＩＤをマスクするとは、音声認識部９にて実際に音声認識を行うワーク領域に対象となる辞書ＩＤの辞書をロードしないことを示す。そして、音声認識部９は、音声入力部１に入力された音声と音声認識辞書記憶部８に記憶された音声認識辞書の語彙パターンの音素片データとを比較することにより、入力音声を認識する。具体的には、音声認識部９は、音声入力部１より音声が入力されたときに、まず、現在位置検出部３から入力した現在位置に対応する辞書ＩＤを判定する。ここで、音声認識部９は、現在位置検出部３から入力した現在位置がどの地域であるかを判定する。そして、判定された地域と一致する地域の辞書ＩＤを抽出する。音声認識部９は、現在位置に対応する辞書ＩＤが付された音声認識辞書のみを対象として、入力音声と一致する読み方の語彙パターンを有する施設の施設名を特定して抽出する。例えば、ユーザが「○○県立美術館」や「○○美術館」、「○○県美術館」、「県立美術館」、「美術館」と発話した何れの場合でも、施設名である「○○県立美術館」を抽出することができる。 A voice recognition unit 9 determines a dictionary ID corresponding to the current position input from the current position detection unit 3, masks other dictionary IDs, and excludes the dictionary ID from the target of voice recognition. Here, masking the dictionary ID means that the dictionary of the target dictionary ID is not loaded in the work area where the voice recognition unit 9 actually performs voice recognition. Then, the speech recognition unit 9 recognizes the input speech by comparing the speech input to the speech input unit 1 with the phoneme piece data of the vocabulary pattern of the speech recognition dictionary stored in the speech recognition dictionary storage unit 8. . Specifically, when a voice is input from the voice input unit 1, the voice recognition unit 9 first determines a dictionary ID corresponding to the current position input from the current position detection unit 3. Here, the voice recognition unit 9 determines which region the current position input from the current position detection unit 3 is. Then, the dictionary ID of the area that matches the determined area is extracted. The speech recognition unit 9 specifies and extracts the facility name of the facility having the vocabulary pattern of reading that matches the input speech, only for the speech recognition dictionary with the dictionary ID corresponding to the current position. For example, if the user speaks “XX Prefectural Art Museum”, “XX Art Museum”, “XX Prefectural Art Museum”, “Prefectural Art Museum”, or “Museum”, the facility name “XX Prefectural Art Museum” Can be extracted.

次に、第３の実施形態による音声認識装置の動作及び音声認識方法について説明する。図１０は、第３の実施形態による音声認識装置の動作及び音声認識方法を示すフローチャートである。 Next, the operation of the speech recognition apparatus and speech recognition method according to the third embodiment will be described. FIG. 10 is a flowchart showing the operation of the speech recognition apparatus and speech recognition method according to the third embodiment.

図１０において、まず、音声認識部９は、マイクなどを通じて音声入力部１により音声が入力されたか否かを調べる（ステップＳ２１）。音声が入力されていないと音声認識部９にて判断した場合には（ステップＳ２１にてＮＯ）、ステップＳ２１の処理を繰り返す。一方、音声が入力されたと音声認識部９にて判断した場合には（ステップＳ２１にてＹＥＳ）、音声認識部９は、現在位置検出部３にて検出された現在位置情報を入力する（ステップＳ２２）。 In FIG. 10, the voice recognition unit 9 first checks whether or not voice is input by the voice input unit 1 through a microphone or the like (step S21). If the voice recognition unit 9 determines that no voice is input (NO in step S21), the process of step S21 is repeated. On the other hand, when the voice recognition unit 9 determines that a voice has been input (YES in step S21), the voice recognition unit 9 inputs the current position information detected by the current position detection unit 3 (step S21). S22).

音声認識部９は、入力した現在位置情報により、現在位置に対応する地域の辞書ＩＤの辞書を選択する（ステップＳ２３）。そして、音声認識部９は、選択した辞書の中から、音声入力部１より入力した音声と一致する語彙パターンが存在するか否かを調べる（ステップＳ２４）。音声入力部１より入力した音声と一致する語彙パターンが存在しないと音声認識部９にて判断した場合には（ステップＳ２４にてＮＯ）、処理を終了する。なお、このような場合に、表示制御部５は、「該当する施設がありません」などのメッセージを生成し、図示しない表示部に表示させても良い。 The voice recognition unit 9 selects a dictionary having a dictionary ID of the region corresponding to the current position based on the input current position information (step S23). Then, the voice recognition unit 9 checks whether there is a vocabulary pattern that matches the voice input from the voice input unit 1 from the selected dictionary (step S24). If the speech recognition unit 9 determines that there is no vocabulary pattern that matches the speech input from the speech input unit 1 (NO in step S24), the process ends. In such a case, the display control unit 5 may generate a message such as “No corresponding facility” and display it on a display unit (not shown).

一方、音声入力部１より入力した音声と一致する語彙パターンが存在すると音声認識部９にて判断した場合には（ステップＳ２４にてＹＥＳ）、音声認識部９は、認識された語彙パターンを有する施設の施設名のテキストデータを表示制御部５に出力する。表示制御部５は、該当する施設名のテキストを図示しない表示部に表示させる（ステップＳ２５）。 On the other hand, when the speech recognition unit 9 determines that there is a vocabulary pattern that matches the speech input from the speech input unit 1 (YES in step S24), the speech recognition unit 9 has the recognized vocabulary pattern. The text data of the facility name of the facility is output to the display control unit 5. The display control unit 5 displays the text of the corresponding facility name on a display unit (not shown) (step S25).

以上、詳しく説明したように、第３の実施形態によれば、施設名を複数の単語に分割して１つ以上組み合わせた語彙パターンを音声認識の対象として施設の位置と共に音声認識辞書に登録し、目的地などの所望の施設を音声認識によって検索するときに、ユーザの現在位置に対して選択される辞書の中からユーザが発話した音声に一致する語彙パターンを持つ施設の施設名を抽出するようにしている。これにより、ユーザが施設名を省略して発話した場合でも、施設名を示す語彙パターンが音声認識の対象として音声認識辞書に登録されているので、所望の施設が抽出される可能性を高めることができる。また、読み方の候補として施設名を複数の単語に分割して１つ以上組み合わせた語彙パターンのみが音声認識辞書に登録されるので、誤って発音される可能性のある読み方や省略されて発音される可能性のある読み方の候補を音声認識辞書に個々に登録する場合に比べて、音声認識辞書に登録される読み方の候補の数を減らすことができ、音声認識辞書のデータ量を減らすことができる。また、ユーザの現在位置により辞書が選択されて検索対象となる施設の数が絞られるので、音声認識辞書の全てを検索する場合に比べて、音声認識の処理の時間を短くすることができる。 As described above in detail, according to the third embodiment, a vocabulary pattern obtained by dividing a facility name into a plurality of words and combining one or more words is registered in the speech recognition dictionary together with the location of the facility as a speech recognition target. When searching for a desired facility such as a destination by speech recognition, the facility name of the facility having a vocabulary pattern that matches the speech spoken by the user is extracted from a dictionary selected for the current position of the user I am doing so. As a result, even when the user omits the facility name and speaks, the vocabulary pattern indicating the facility name is registered in the speech recognition dictionary as the target of speech recognition, so the possibility that a desired facility is extracted is increased. Can do. Moreover, since only the vocabulary pattern in which the facility name is divided into a plurality of words and combined with one or more words is registered in the speech recognition dictionary as a candidate for reading, it may be pronounced in a way that may be mistakenly pronounced or omitted. Compared to the case of registering possible reading candidates individually in the speech recognition dictionary, the number of reading candidates registered in the speech recognition dictionary can be reduced and the amount of data in the speech recognition dictionary can be reduced. it can. In addition, since the number of facilities to be searched is narrowed down by selecting a dictionary according to the current position of the user, the time for voice recognition processing can be shortened compared to the case of searching all of the voice recognition dictionary.

なお、第1の実施形態及び第３の実施形態では、辞書ＩＤを市区町村毎に設定しているが、これに限定されない。例えば、辞書ＩＤを都道府県毎に設定しても良いし、複数の都道府県毎に設定しても良いし、複数の市区町村毎に設定しても良い。 In the first embodiment and the third embodiment, the dictionary ID is set for each city, but the present invention is not limited to this. For example, the dictionary ID may be set for each prefecture, may be set for a plurality of prefectures, or may be set for a plurality of municipalities.

また、第１の実施形態及び第３の実施形態では、１つの辞書ＩＤの辞書を対象として音声認識を行っているが、これに限定されない。例えば、ユーザの現在位置に対応する辞書ＩＤとその辞書ＩＤの地域に隣接する複数の地域の辞書ＩＤを音声認識の対象とするようにしても良い。 In the first embodiment and the third embodiment, voice recognition is performed for a dictionary having one dictionary ID, but the present invention is not limited to this. For example, the dictionary ID corresponding to the current position of the user and the dictionary IDs of a plurality of regions adjacent to the region of the dictionary ID may be set as speech recognition targets.

また、第１の実施形態乃至第３の実施形態では、音声認識の結果として抽出された施設が１つであるが、これに限定されない。例えば、二つ以上の施設を抽出してその施設名を表示部に表示するようにしても良い。また、このような場合において、複数の施設が抽出されたときに、ユーザの現在位置から近い順番に施設名を表示するようにしても良い。ここで、施設の位置は、例えば、施設の位置を示す緯度・経度などを施設名などに関連付けて音声認識辞書記憶部２，７に記憶させておく。また、ナビゲーション装置が備えている施設の位置情報を利用しても良い。 Moreover, in 1st Embodiment thru | or 3rd Embodiment, although the one facility extracted as a result of speech recognition is one, it is not limited to this. For example, two or more facilities may be extracted and the facility names may be displayed on the display unit. In such a case, when a plurality of facilities are extracted, the facility names may be displayed in order from the user's current position. Here, the location of the facility is stored in the speech recognition dictionary storage units 2 and 7 in association with the facility name or the like, for example, latitude / longitude indicating the location of the facility. Moreover, you may utilize the positional information on the facility with which a navigation apparatus is provided.

また、第1の実施形態では、音声認識辞書は、施設ＩＤを有しているが、これに限定されない。例えば、図１１に示すように、ある辞書ＩＤについて、複数の要部名称に施設名をそれぞれ関連付けて記憶するようにしても良い。 In the first embodiment, the voice recognition dictionary has a facility ID, but is not limited to this. For example, as shown in FIG. 11, for a certain dictionary ID, a facility name may be associated with a plurality of main part names and stored.

また、第２の実施形態では、音声認識辞書は、辞書ＩＤを付していないが、これに限定されない。例えば、図１２に示すように、各施設について、地域毎に異なる辞書ＩＤを付すようにしても良い。そして、音声認識部４は、現在位置検出部３により検出した現在位置を入力する。そして、入力した現在位置がどの地域であるかを判定し、判定された地域と一致する地域の辞書ＩＤを音声認識辞書記憶部７から抽出する。そして、音声認識部４は、現在位置に対応する辞書ＩＤが付された音声認識辞書のみを対象として、入力音声と一致する読み方の施設名を特定して抽出する。これにより、ユーザの現在位置により辞書が選択されて検索対象となる施設の数が絞られるので、音声認識辞書の全てを検索する場合に比べて、音声認識の処理の時間を短くすることができる。 In the second embodiment, the speech recognition dictionary does not have a dictionary ID, but is not limited thereto. For example, as shown in FIG. 12, a different dictionary ID may be assigned to each facility for each region. Then, the voice recognition unit 4 inputs the current position detected by the current position detection unit 3. Then, it is determined which region the input current position is, and the dictionary ID of the region that matches the determined region is extracted from the speech recognition dictionary storage unit 7. Then, the speech recognition unit 4 specifies and extracts the facility name of the reading that matches the input speech for only the speech recognition dictionary to which the dictionary ID corresponding to the current position is attached. Thereby, since the dictionary is selected according to the current position of the user and the number of facilities to be searched is narrowed down, it is possible to shorten the time of the voice recognition processing compared to the case of searching all of the voice recognition dictionary. .

その他、上記実施形態は、何れも本発明を実施するにあたっての具体化の一例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその精神、またはその主要な特徴から逸脱することなく、様々な形で実施することができる。 In addition, each of the above-described embodiments is merely an example of implementation in carrying out the present invention, and the technical scope of the present invention should not be construed in a limited manner. In other words, the present invention can be implemented in various forms without departing from the spirit or main features thereof.

本発明は、ユーザが発音した音声を認識して施設名などを抽出する音声認識装置に有用である。 INDUSTRIAL APPLICABILITY The present invention is useful for a speech recognition apparatus that recognizes speech produced by a user and extracts facility names and the like.

第１の実施形態による音声認識装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the speech recognition apparatus by 1st Embodiment. 第１の実施形態による音声認識装置の音声認識辞書の例を示す図である。It is a figure which shows the example of the speech recognition dictionary of the speech recognition apparatus by 1st Embodiment. 第１の実施形態による音声認識装置の動作及び音声認識方法を示すフローチャートである。It is a flowchart which shows the operation | movement of the speech recognition apparatus by 1st Embodiment, and the speech recognition method. 第２の実施形態による音声認識装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the speech recognition apparatus by 2nd Embodiment. 第２の実施形態による音声認識装置の音声合成部にて生成される仮想施設名称の例を示す図である。It is a figure which shows the example of the virtual facility name produced | generated in the speech synthesizer of the speech recognition apparatus by 2nd Embodiment. 第２の実施形態による音声認識装置の音声認識辞書の例を示す図である。It is a figure which shows the example of the speech recognition dictionary of the speech recognition apparatus by 2nd Embodiment. 第２の実施形態による音声認識装置の動作及び音声認識方法を示すフローチャートである。It is a flowchart which shows the operation | movement of the speech recognition apparatus by 2nd Embodiment, and the speech recognition method. 第３の実施形態による音声認識装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the speech recognition apparatus by 3rd Embodiment. 第３の実施形態による音声認識装置の音声認識辞書の例を示す図である。It is a figure which shows the example of the speech recognition dictionary of the speech recognition apparatus by 3rd Embodiment. 第３の実施形態による音声認識装置の動作及び音声認識方法を示すフローチャートである。It is a flowchart which shows the operation | movement of the speech recognition apparatus by 3rd Embodiment, and the speech recognition method. 第１の実施形態による音声認識装置の音声認識辞書の変形例を示す図である。It is a figure which shows the modification of the speech recognition dictionary of the speech recognition apparatus by 1st Embodiment. 第２の実施形態による音声認識装置の音声認識辞書の変形例を示す図である。It is a figure which shows the modification of the speech recognition dictionary of the speech recognition apparatus by 2nd Embodiment.

Explanation of symbols

１音声入力部
２，７，８音声認識辞書記憶部
３現在位置検出部
４，９音声認識部
５表示制御部
６音声合成部 DESCRIPTION OF SYMBOLS 1 Voice input part 2, 7, 8 Voice recognition dictionary memory | storage part 3 Current position detection part 4,9 Voice recognition part 5 Display control part 6 Voice synthesizer

Claims

A voice recognition dictionary storage unit that stores a name of a facility and a main part name existing in a part of the name of the facility as a voice recognition dictionary in association with positional information of the facility;
A current position detector for detecting the current position of the user;
A voice input unit for inputting a voice pronounced by the user;
The speech recognition dictionary is searched for the name of a main part of a facility that exists within a predetermined range corresponding to the current position detected by the current position detection unit, and the speech input by the voice input unit matches the reading. A voice recognition unit that extracts the name of the facility having the main part name to be
A speech recognition apparatus comprising:

The speech recognition apparatus according to claim 1, wherein the name of the main part is obtained by removing a part indicating a region from the name of the facility.

A current position detector for detecting the current position of the user;
A voice input unit for inputting a voice pronounced by the user;
A voice recognition dictionary storage unit that stores the name of the facility as a voice recognition dictionary;
A voice synthesizer for generating a virtual facility name by synthesizing voice related to an area within a predetermined range corresponding to the current position detected by the current position detector before or after the voice input by the voice input unit;
A voice recognition unit that searches the voice recognition dictionary using the virtual facility name generated by the voice synthesis unit as a key, and extracts a name of a facility that matches the virtual facility name;
A speech recognition apparatus comprising:

The voice recognition dictionary stores the name of the facility in association with the position information of the facility, and the voice recognition unit exists within a predetermined range corresponding to the current position detected by the current position detection unit. The speech recognition apparatus according to claim 3, wherein the speech recognition dictionary is searched for a name of a facility to be operated.

A voice recognition dictionary storage unit that stores a name of a facility and a vocabulary pattern that represents the name of the facility by a combination of a plurality of words as a voice recognition dictionary in association with the location information of the facility;
A current position detector for detecting the current position of the user;
A voice input unit for inputting a voice pronounced by the user;
The speech recognition dictionary is searched for a vocabulary pattern of a facility existing within a predetermined range corresponding to the current position detected by the current position detection unit, and the speech input by the voice input unit matches the reading. A voice recognition unit that extracts names of facilities having vocabulary patterns;
A speech recognition apparatus comprising:

A first step of inputting a voice produced by the user at a voice input unit;
A second step of detecting a current position of the user by a current position detection unit when voice is input in the first step;
A speech recognition dictionary in which the name of the facility and the main part name existing in a part of the name of the facility are associated with the location information of the facility is a predetermined number corresponding to the current position of the user detected in the second step. A third step of searching for a main part name of a facility existing in a range, and extracting a name of a facility having a main part name whose reading matches the voice input in the first step;
A speech recognition method comprising:

A first step of inputting a voice produced by the user at a voice input unit;
A second step of detecting the current position of the user by the current position detection unit when voice is input in the first step;
A voice synthesizer synthesizes a voice related to an area within a predetermined range corresponding to the current position of the user detected in the second step, before or after the voice input in the first step, to obtain a virtual facility name. A third step of generating,
A fourth step of searching a speech recognition dictionary using the virtual facility name generated in the third step as a key, and extracting a name of the facility that matches the virtual facility name;
A speech recognition method comprising:

A first step of inputting a voice produced by the user at a voice input unit;
A second step of detecting a current position of the user by a current position detection unit when voice is input in the first step;
A speech recognition dictionary in which a facility name and a vocabulary pattern representing a name of the facility by a combination of a plurality of words are associated with position information of the facility, a predetermined correspondence corresponding to the current position detected in the second step A third step of searching for a vocabulary pattern of a facility existing within a range, and extracting a name of the facility having a vocabulary pattern whose reading matches the voice input in the first step;
A speech recognition method comprising: