JP2013114107A

JP2013114107A - Communication system, utterance content generation device, utterance content generation program, and utterance content generation method

Info

Publication number: JP2013114107A
Application number: JP2011261245A
Authority: JP
Inventors: Takamasa Iio; 尊優飯尾; Masahiro Shiomi; 昌裕塩見; Kazuhiko Shinosawa; 一彦篠沢; Katsunori Shimohara; 勝憲下原; Norihiro Hagita; 紀博萩田
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2011-11-30
Filing date: 2011-11-30
Publication date: 2013-06-10
Anticipated expiration: 2031-11-30
Also published as: JP5866646B2

Abstract

PROBLEM TO BE SOLVED: To enable easy generation of utterance content in a confirming action.SOLUTION: A communication system 10 includes a robot 12 and a server 20. The robot specifies an article specified by a user. The server performs a pointing action for confirming the specified article and generates utterance content in response to a request from the robot. In generating the utterance content, the server generates a power set on plural attributes of the specified article and a power set on plural attributes of other articles existing nearby the specified article for each of the aforementioned other articles. The server deletes an element common with the power set on each of the other articles from an element of the power set on the specified article. The server selects elements with the least number of words from among the remaining elements and selects an element with the highest non-similarity with the character string of plural attributes of the other articles existing nearby the specified article from among the remaining elements and generates utterance content.

Description

この発明はコミュニケーションシステム、発話内容生成装置、発話内容生成プログラムおよび発話内容生成方法に関し、特にたとえば、音声認識によって人間が指示する物品を特定し、当該特定した物品が当該人間の指示したものであるかどうかを少なくとも音声によって確認する、コミュニケーションシステム、発話内容生成装置、発話内容生成プログラムおよび発話内容生成方法に関する。 The present invention relates to a communication system, an utterance content generation device, an utterance content generation program, and an utterance content generation method. In particular, for example, an article designated by a person is identified by voice recognition, and the identified article is designated by the person. The present invention relates to a communication system, an utterance content generation device, an utterance content generation program, and an utterance content generation method.

この種の従来のコミュニケーションシステムの一例が特許文献１に開示されている。この特許文献１に開示されるコミュニケーションシステムでは、サーバが人間の指示した物品を特定すると、当該物品を特定する「特定単語」を用いて、当該物品が人間の指示した物品であるか否かを確認する音声をロボットが発する。このとき、「特定単語」として、人間の近傍に存在する他の物品を特定するのに使用されない単語であり、音声認識による認識率の高い単語が選択される。 An example of this type of conventional communication system is disclosed in Patent Document 1. In the communication system disclosed in Patent Document 1, when the server specifies an article designated by a person, a “specific word” that identifies the article is used to determine whether the article is an article designated by a person. The robot emits a confirmation voice. At this time, as the “specific word”, a word that is not used to specify another article in the vicinity of a human and has a high recognition rate by voice recognition is selected.

特開２００９−２２３１７１号［G10L 15/22, G10L 15/00］JP 2009-223171 [G10L 15/22, G10L 15/00]

しかし、特許文献１のコミュニケーションシステムでは、特定単語として用いる単語について、予め音声認識の認識率を求めておく必要があり、面倒である。 However, in the communication system of Patent Document 1, it is necessary to obtain a recognition rate of speech recognition in advance for a word used as a specific word, which is troublesome.

それゆえに、この発明の主たる目的は、新規な、コミュニケーションシステム、発話内容生成装置、発話内容生成プログラムおよび発話内容生成方法を提供することである。 Therefore, a main object of the present invention is to provide a novel communication system, utterance content generation device, utterance content generation program, and utterance content generation method.

また、この発明の他の目的は、発話内容に含める単語を簡単に選択できる、コミュニケーションシステム、発話内容生成装置、発話内容生成プログラムおよび発話内容生成方法を提供することである。 Another object of the present invention is to provide a communication system, an utterance content generation device, an utterance content generation program, and an utterance content generation method that can easily select words to be included in the utterance content.

本発明は、上記の課題を解決するために、以下の構成を採用した。なお、括弧内の参照符号および補足説明等は、本発明の理解を助けるために後述する実施の形態との対応関係を示したものであって、本発明を何ら限定するものではない。 The present invention employs the following configuration in order to solve the above problems. The reference numerals in parentheses, supplementary explanations, and the like indicate correspondence relationships with embodiments described later to help understanding of the present invention, and do not limit the present invention in any way.

第１の発明は、音声認識によって人間が指示する物品を特定し、当該特定した物品が当該人間の指示したものであるかを音声にて確認するコミュニケーションシステムであって、物品の名称および当該物品に関連する複数の単語を記憶する記憶手段、特定した物品に関連する複数の単語を記憶手段から読み出し、読み出した単語について第１べき集合を作成する第１作成手段、特定した物品の近傍に存在する他の物品に関する複数の単語を記憶手段から読み出し、読み出した単語についての第２べき集合を、当該他の物品毎に作成する第２作成手段、第１作成手段によって作成された第１べき集合と、第２作成手段によって作成された第２べき集合のそれぞれとの間で共通する要素を、当該第１べき集合から削除する削除手段、削除手段によって削除した結果、第１べき集合に含まれる要素のうち、単語数が最小である要素を選択する第１選択手段、第１選択手段によって選択された要素のうち、他の物品に関連する複数の単語の文字列との非類似度が最も高い要素を選択する第２選択手段、および第２選択手段によって選択された要素に含まれる単語を含めて、音声にて確認する際の発話内容を生成する発話内容生成手段を備える、コミュニケーションシステムである。 A first invention is a communication system for identifying an article designated by a person by voice recognition and confirming by voice whether the identified article is designated by the person, the name of the article and the article A storage unit that stores a plurality of words related to the first word, a first generation unit that reads a plurality of words related to the identified article from the storage unit, and creates a first power set for the read word; A plurality of words related to other articles to be read from the storage means, and a second power set for the read articles is created for each other article, a first power set created by the first creation means And a deletion means for deleting elements common to each of the second power sets created by the second creation means from the first power set. As a result of the deletion, the first selection means for selecting the element with the smallest number of words among the elements included in the first power set, and the plurality of elements related to other articles among the elements selected by the first selection means The second selection means for selecting the element having the highest degree of dissimilarity with the character string of the word, and the utterance content when confirming by voice including the word included in the element selected by the second selection means It is a communication system provided with the utterance content production | generation means to produce | generate.

第１の発明では、コミュニケーションシステム（１０）は、音声認識によって人間が指示する物品を特定し、当該特定した物品が当該人間の指示したものであるかを音声にて確認する。記憶手段（１２２、２０４）は、物品の名称および当該物品に関連する複数の単語を記憶する。物品に関連する複数の単語は、たとえば、物品の種類、色、厚みのような属性についての単語であり、物品を補足的に説明する単語である。第１作成手段（２００、Ｓ３３）は、特定した物品に関連する複数の単語を記憶手段から読み出し、読み出した単語について第１べき集合を作成する。第２作成手段（２００、Ｓ３５）は、特定した物品の近傍に存在する他の物品に関する複数の単語を記憶手段から読み出し、読み出した単語についての第２べき集合を、当該他の物品毎に作成する。第削除手段（２００、Ｓ３７）は、１作成手段によって作成された第１べき集合と、第２作成手段によって作成された第２べき集合のそれぞれとの間で共通する要素を、当該第１べき集合から削除する。第１選択手段（２００、Ｓ３９）は、削除手段によって削除した結果、第１べき集合に含まれる要素のうち、単語数が最小である要素を選択する。第２選択手段（２００、Ｓ４１）は、第１選択手段によって選択された要素のうち、他の物品に関連する複数の単語の文字列との非類似度が最も高い要素を選択する。発話内容生成手段（２００、Ｓ１３）は、第２選択手段によって選択された要素に含まれる単語を含めて、音声にて確認する際の発話内容を生成する。 In the first invention, the communication system (10) specifies an article designated by a human by voice recognition, and confirms by voice whether the identified article is designated by the human. The storage means (122, 204) stores the name of the article and a plurality of words related to the article. The plurality of words related to the article are words regarding attributes such as the type, color, and thickness of the article, and are words that supplementarily describe the article. The first creation means (200, S33) reads a plurality of words related to the identified article from the storage means, and creates a first power set for the read words. The second creation means (200, S35) reads a plurality of words related to other articles existing in the vicinity of the identified article from the storage means, and creates a second power set for the read words for each of the other articles. To do. The first deleting means (200, S37) determines the common elements between the first power set created by the first creating means and the second power set created by the second creating means. Remove from set. The first selection means (200, S39) selects the element having the smallest number of words among the elements included in the first power set as a result of deletion by the deletion means. A 2nd selection means (200, S41) selects the element with the highest dissimilarity with the character string of the several word relevant to another article among the elements selected by the 1st selection means. The utterance content generation means (200, S13) generates the utterance contents when confirming by voice including the word included in the element selected by the second selection means.

第１の発明によれば、特定した物品の近傍に存在する物品とは異なり、文字列の非類似度が最も高い文字列を選択するので、予め単語の音声認識率を求める必要がなく、発話内容に含める単語を簡単に選択することができる。したがって、発話内容を簡単に生成することができる。 According to the first invention, unlike the article existing in the vicinity of the specified article, the character string having the highest dissimilarity of the character string is selected, so it is not necessary to obtain the speech recognition rate of the word in advance, and the utterance You can easily select words to include in the content. Therefore, the utterance content can be easily generated.

第２の発明は、第１の発明に従属し、第１選択手段によって選択された要素に含まれる第１単語の文字列と、他の物品に関連する複数の第２単語の文字列のそれぞれとのレーベンシュタイン距離を算出する算出手段をさらに備え、第２選択手段は、算出手段によって算出されたレーベンシュタイン距離が最大となる第１単語を含む要素を選択する。 The second invention is dependent on the first invention, and each of the character string of the first word included in the element selected by the first selecting means and the character strings of the plurality of second words related to other articles And a second selection unit selects an element including the first word that maximizes the Levenshtein distance calculated by the calculation unit.

第２の発明では、コミュニケーションシステムは、算出手段（２００、Ｓ９１）をさらに備える。算出手段は、第１選択手段によって選択された要素に含まれる第１単語の文字列と、他の物品に関連する複数の第２単語の文字列のそれぞれとのレーベンシュタイン距離を算出する。つまり、文字列（発音）の類似度が算出される。第２選択手段は、算出手段によって算出されたレーベンシュタイン距離が最大となる第１単語を含む要素を選択する。したがって、上述したように、他の物品に関連する複数の単語の文字列との非類似度が最も高い要素が選択される。 In the second invention, the communication system further includes calculation means (200, S91). The calculating means calculates the Levenshtein distance between the character string of the first word included in the element selected by the first selecting means and each of the character strings of the plurality of second words related to other articles. That is, the similarity of the character string (pronunciation) is calculated. The second selection means selects an element including the first word that maximizes the Levenshtein distance calculated by the calculation means. Therefore, as described above, the element having the highest dissimilarity with the character strings of a plurality of words related to other articles is selected.

第２の発明によれば、文字列同士のレーベンシュタイン距離を算出するだけなので、単語の選択が容易である。 According to the second invention, since only the Levenshtein distance between character strings is calculated, it is easy to select a word.

第３の発明は、第１または第２の発明に従属し、特定した物品の近傍に他の物品が存在するかどうかを判断する第１判断手段、第１判断手段によって特定した物品の近傍に他の物品が存在しないと判断されたとき、当該特定した物品に関連する複数の単語のそれぞれを発話内容に含める候補として決定する候補決定手段、および候補決定手段によって決定された単語のうち、人間の近傍に存在する他の物品に関連する複数の単語の文字列との非類似度が最も高い単語を選択する第３選択手段をさらに備え、発話内容生成手段は、第３選択手段によって選択された要素に含まれる単語を含めて、音声にて確認する際の発話内容を生成する。 The third invention is dependent on the first or second invention, and is a first judging means for judging whether or not another article exists in the vicinity of the specified article, and in the vicinity of the article specified by the first judging means. When it is determined that there are no other articles, candidate determination means for determining each of a plurality of words related to the specified article as candidates to be included in the utterance content, and a human being among the words determined by the candidate determination means Further comprising third selection means for selecting a word having the highest degree of dissimilarity with the character strings of a plurality of words related to other articles existing in the vicinity, and the utterance content generation means is selected by the third selection means The utterance contents when confirming by voice including the words included in the elements are generated.

第３の発明では、コミュニケーションシステムは、第１判断手段（２００、Ｓ３１）、候補決定手段（２００、Ｓ４３）、および第３選択手段（２００、Ｓ５１）をさらに備える。第１判断手段は、特定した物品の近傍に他の物品が存在するかどうかを判断する。候補決定手段は、第１判断手段によって特定した物品の近傍に他の物品が存在しないと判断されたとき、当該特定した物品に関連する複数の単語のそれぞれを発話内容に含める候補として決定する。第３選択手段は、候補決定手段によって決定された単語のうち、人間の近傍に存在する他の物品に関連する複数の単語の文字列との非類似度が最も高い単語を選択する。かかる場合には、発話内容生成手段は、第３選択手段によって選択された要素に含まれる単語を含めて、音声にて確認する際の発話内容を生成する。 In the third invention, the communication system further includes first determination means (200, S31), candidate determination means (200, S43), and third selection means (200, S51). The first determination means determines whether another article exists in the vicinity of the specified article. When it is determined that no other article exists in the vicinity of the article specified by the first determination means, the candidate determining means determines each of a plurality of words related to the specified article as a candidate to be included in the utterance content. The third selection means selects the word having the highest dissimilarity with the character strings of a plurality of words related to other articles existing in the vicinity of the human among the words determined by the candidate determination means. In such a case, the utterance content generation unit generates the utterance content when confirming by voice including the word included in the element selected by the third selection unit.

第３の発明によれば、特定した物品の近傍に他の物品が存在しない場合であっても、発話内容に含む適切な単語を選択することができる。 According to the third invention, even if there is no other article in the vicinity of the specified article, an appropriate word included in the utterance content can be selected.

第４の発明は、第３の発明に従属し、第１判断手段によって特定した物品の近傍に他の物品が存在しないと判断されたとき、人間の近傍に他の物品が存在するかどうかを判断する第２判断手段、および第２判断手段によって人間の近傍に他の物品が存在しないことが判断されたとき、候補決定手段によって決定された候補のうちから１つの単語を所定のルールに従って選択する第４選択手段をさらに備え、発話内容生成手段は、第４選択手段によって選択された要素に含まれる単語を含めて、音声にて確認する際の発話内容を生成する。 The fourth invention is dependent on the third invention, and when it is determined that there is no other article in the vicinity of the article specified by the first determination means, it is determined whether or not there is another article in the vicinity of the human. When the second judging means for judging and the second judging means judges that there is no other article in the vicinity of the human, one word is selected from the candidates determined by the candidate determining means according to a predetermined rule. The speech content generating means generates speech content when confirming by voice including the word included in the element selected by the fourth selection means.

第４の発明では、コミュニケーションシステムは、第２判断手段（２００、Ｓ４５）および第４選択手段（２００、Ｓ５３）をさらに備える。第２判断手段は、第１判断手段によって特定した物品の近傍に他の物品が存在しないと判断されたとき、人間の近傍に他の物品が存在するかどうかを判断する。第４選択手段は、第２判断手段によって人間の近傍に他の物品が存在しないことが判断されたとき、候補決定手段によって決定された候補のうちから１つの単語を所定のルールに従って選択する。たとえば、候補から、ランダムに１つの単語が選択されたり、システムの管理者等よって予め決定された１つの関連する単語が選択されたりする。かかる場合には、発話内容生成手段は、第４選択手段によって選択された要素に含まれる単語を含めて、音声にて確認する際の発話内容を生成する。 In the fourth invention, the communication system further includes second determination means (200, S45) and fourth selection means (200, S53). When it is determined that there is no other article in the vicinity of the article specified by the first determination means, the second determination means determines whether there is another article in the vicinity of the human. The fourth selection unit selects one word from the candidates determined by the candidate determination unit according to a predetermined rule when it is determined by the second determination unit that there is no other article in the vicinity of the person. For example, one word is randomly selected from the candidates, or one related word determined in advance by the system administrator or the like is selected. In such a case, the utterance content generation unit generates the utterance content when confirming by voice including the word included in the element selected by the fourth selection unit.

第４の発明においても、第３の発明と同様に、特定した物品の近傍に他の物品が存在しない場合であっても、発話内容に含む適切な単語を選択することができる。 In the fourth invention as well, as in the third invention, it is possible to select an appropriate word included in the utterance content even when there is no other article in the vicinity of the specified article.

第５の発明は、音声認識によって人間が指示する物品を特定し、当該特定した物品が当該人間の指示したものであるかを音声にて確認するコミュニケーションシステムに用いられ、音声にて確認する場合の発話内容を生成する発話内容生成装置であって、物品の名称および当該物品に関連する複数の単語を記憶する記憶手段、特定した物品に関連する複数の単語を記憶手段から読み出し、読み出した単語について第１べき集合を作成する第１作成手段、特定した物品の近傍に存在する他の物品に関する複数の単語を記憶手段から読み出し、読み出した単語についての第２べき集合を、当該他の物品毎に作成する第２作成手段、第１作成手段によって作成された第１べき集合と、第２作成手段によって作成された第２べき集合のそれぞれとの間で共通する要素を、当該第１べき集合から削除する削除手段、削除手段によって削除した結果、第１べき集合に含まれる要素のうち、単語数が最小である要素を選択する第１選択手段、第１選択手段によって選択された要素のうち、他の物品に関連する複数の単語の文字列との非類似度が最も高い要素を選択する第２選択手段、および第２選択手段によって選択された要素に含まれる単語を含めて、音声にて確認する際の発話内容を生成する発話内容生成手段を備える、発話内容生成装置である。 The fifth invention is used in a communication system for identifying an article designated by a person by voice recognition and confirming by voice whether the specified article is designated by the person. Utterance content generation apparatus for generating the utterance content of the storage device, the storage means for storing the name of the article and a plurality of words related to the article, the plurality of words related to the specified article is read from the storage means, and the read word First creation means for creating a first power set for each of the other articles, a plurality of words relating to other articles existing in the vicinity of the identified article are read from the storage means, Between the first power set created by the second creation means, the first power set created by the first creation means, and the second power set created by the second creation means A deletion unit that deletes a common element from the first power set; a first selection unit that selects an element having the smallest number of words among elements included in the first power set as a result of deletion by the deletion unit; Of the elements selected by the first selecting means, the second selecting means for selecting the element having the highest degree of dissimilarity with the character strings of a plurality of words related to other articles, and the element selected by the second selecting means It is an utterance content production | generation apparatus provided with the utterance content production | generation means which produces | generates the utterance content at the time of confirming with a sound including the word contained in.

第６の発明は、音声認識によって人間が指示する物品を特定し、当該特定した物品が当該人間の指示したものであるかを音声にて確認するコミュニケーションシステムに用いられ、物品の名称および当該物品に関連する複数の単語を記憶する記憶手段を備え、音声にて確認する場合の発話内容を生成する発話内容生成装置の発話内容生成プログラムであって、発話内容生成装置のプロセッサに、特定した物品に関連する複数の単語を記憶手段から読み出し、読み出した単語について第１べき集合を作成する第１作成ステップ、特定した物品の近傍に存在する他の物品に関する複数の単語を記憶手段から読み出し、読み出した単語についての第２べき集合を、当該他の物品毎に作成する第２作成ステップ、第１作成ステップにおいて作成された第１べき集合と、第２作成ステップにおいて作成された第２べき集合のそれぞれとの間で共通する要素を、当該第１べき集合から削除する削除ステップ、削除ステップにおいて削除した結果、第１べき集合に含まれる要素のうち、単語数が最小である要素を選択する第１選択ステップ、第１選択ステップにおいて選択された要素のうち、他の物品に関連する複数の単語の文字列との非類似度が最も高い要素を選択する第２選択ステップ、および第２選択ステップにおいて選択された要素に含まれる単語を含めて、音声にて確認する際の発話内容を生成する発話内容生成ステップを実行させる、発話内容生成プログラムである。 A sixth invention is used in a communication system for identifying an article designated by a person by voice recognition and confirming by voice whether the identified article is designated by the person. The name of the article and the article An utterance content generation program of an utterance content generation device for generating utterance content for confirmation by voice, comprising storage means for storing a plurality of words related to the item, the article identified by the processor of the utterance content generation device A first creation step for reading a plurality of words related to the item from the storage unit, and generating a first power set for the read word, and reading and reading a plurality of words relating to other articles existing in the vicinity of the specified article from the storage unit Created in the second creation step and the first creation step of creating a second power set for each of the other articles. The first power set as a result of deleting elements common between the first power set and each of the second power sets created in the second creation step from the first power set, and the deletion step A first selection step of selecting an element having the smallest number of words among the elements included in, and dissimilarity to character strings of a plurality of words related to other articles among the elements selected in the first selection step A second selection step for selecting the element having the highest degree, and an utterance content generation step for generating an utterance content for confirming by voice including words included in the element selected in the second selection step This is an utterance content generation program.

第７の発明は、音声認識によって人間が指示する物品を特定し、当該特定した物品が当該人間の指示したものであるかを音声にて確認するコミュニケーションシステムに用いられ、物品の名称および当該物品に関連する複数の単語を記憶する記憶手段を備え、音声にて確認する場合の発話内容を生成する発話内容生成装置の発話内容生成方法であって、発話内容生成装置のプロセッサは、（ａ）特定した物品に関連する複数の単語を記憶手段から読み出し、読み出した単語について第１べき集合を作成し、（ｂ）特定した物品の近傍に存在する他の物品に関する複数の単語を記憶手段から読み出し、読み出した単語についての第２べき集合を、当該他の物品毎に作成し、（ｃ）ステップ（ａ）において作成された第１べき集合と、ステップ（ｂ）において作成された第２べき集合のそれぞれとの間で共通する要素を、当該第１べき集合から削除し、（ｄ）ステップ（ｃ）において削除した結果、第１べき集合に含まれる要素のうち、単語数が最小である要素を選択し、（ｅ）ステップ（ｄ）において選択された要素のうち、他の物品に関連する複数の単語の文字列との非類似度が最も高い要素を選択し、そして（ｆ）ステップ（ｅ）において選択された要素に含まれる単語を含めて、音声にて確認する際の発話内容を生成する、発話内容生成方法である。 The seventh invention is used in a communication system for identifying an article designated by a person by voice recognition and confirming by voice whether the identified article is designated by the person. The name of the article and the article The utterance content generation method of the utterance content generation device includes a storage means for storing a plurality of words related to the utterance, and generates the utterance content when confirming by speech. The processor of the utterance content generation device includes: A plurality of words related to the specified article are read from the storage means, a first power set is created for the read words, and (b) a plurality of words related to other articles existing in the vicinity of the specified article is read from the storage means. The second power set for the read word is created for each of the other articles, and (c) the first power set created in step (a) and the step ( The elements common to each of the second power sets created in step) are deleted from the first power set, and (d) as a result of the deletion in step (c), the elements included in the first power set are deleted. Among them, the element having the smallest number of words is selected, and (e) the element having the highest dissimilarity with the character strings of a plurality of words related to other articles among the elements selected in step (d). (F) An utterance content generation method for generating an utterance content for confirmation by voice including the word included in the element selected in step (e).

第５−第７の発明においても、第１の発明と同様に、発話内容を簡単に生成することができる。 In the fifth to seventh inventions, as in the first invention, the utterance content can be easily generated.

この発明によれば、特定した物品の近傍に存在する物品とは異なり、文字列の非類似度が最も高い文字列を選択するので、予め単語の音声認識率を求める必要がなく、発話内容に含める単語を簡単に選択することができる。したがって、発話内容を簡単に生成することができる。 According to the present invention, unlike the article existing in the vicinity of the identified article, the character string having the highest dissimilarity of the character string is selected. You can easily select words to include. Therefore, the utterance content can be easily generated.

この発明の上述の目的，その他の目的，特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

図１はこの発明の一実施例を示すコミュニケーションシステムの概要を示す図解図である。FIG. 1 is an illustrative view showing an outline of a communication system showing an embodiment of the present invention. 図２は図１に示すロボットの外観を正面から見た図解図である。FIG. 2 is an illustrative view showing the appearance of the robot shown in FIG. 1 from the front. 図３は図１に示すロボットの電気的な構成を示すブロック図である。FIG. 3 is a block diagram showing an electrical configuration of the robot shown in FIG. 図４は図１に示すサーバの電気的な構成を示すブロック図である。FIG. 4 is a block diagram showing an electrical configuration of the server shown in FIG. 図５は図１の実施例で用いられる物品辞書の一例を示す図解図である。FIG. 5 is an illustrative view showing one example of an article dictionary used in the embodiment of FIG. 図６は図１の実施例で用いられる物品ローカル辞書の一例を示す図解図である。FIG. 6 is an illustrative view showing one example of an article local dictionary used in the embodiment of FIG. 図７は図１の実施例で用いられる単語の一覧の一例を示す図解図である。FIG. 7 is an illustrative view showing one example of a list of words used in the embodiment of FIG. 図８は図１の実施例の確認行動で用いられる単語を決定する方法を説明するための図解図である。FIG. 8 is an illustrative view for explaining a method of determining a word used in the confirmation action of the embodiment of FIG. 図９は図４に示すＲＡＭのメモリマップの一例を示す図解図である。FIG. 9 is an illustrative view showing one example of a memory map of the RAM shown in FIG. 図１０は図４に示すＣＰＵの確認行動決定処理を示すフロー図である。FIG. 10 is a flowchart showing the confirmation action determination process of the CPU shown in FIG. 図１１は図４に示すＣＰＵの単語選択処理を示すフロー図である。FIG. 11 is a flowchart showing the word selection processing of the CPU shown in FIG. 図１２は図４に示すＣＰＵの物品Ｘの属性組を生成する処理を示すフロー図である。FIG. 12 is a flowchart showing a process of generating the attribute set of the article X of the CPU shown in FIG. 図１３は図４に示すＣＰＵの近傍の他の物品の属性組を生成する処理を示すフロー図である。FIG. 13 is a flowchart showing processing for generating an attribute set of another article in the vicinity of the CPU shown in FIG. 図１４は図４に示すＣＰＵの物品Ｘの属性組を低減する処理を示すフロー図である。FIG. 14 is a flowchart showing processing for reducing the attribute group of the article X of the CPU shown in FIG. 図１５は図４に示すＣＰＵの単語の決定（１）処理を示すフロー図である。FIG. 15 is a flowchart showing the CPU word determination (1) processing shown in FIG. 図１６は図４に示すＣＰＵの物品Ｘの呼び名の候補を生成する処理を示すフロー図である。FIG. 16 is a flowchart showing processing for generating a candidate for the name of the article X of the CPU shown in FIG. 図１７は図４に示すＣＰＵの単語の決定（２）処理を示すフロー図である。FIG. 17 is a flowchart showing the CPU word determination (2) process shown in FIG.

図１を参照して、この実施例のコミュニケーションシステム（以下、単に「システム」という。）１０は、コミュニケーションロボット(以下、単に「ロボット」という。)１２を含む。このロボット１２は、音声および身体動作（ジェスチャ）の少なくとも一方を用いて、人間や他のロボットとコミュニケーションを行うことができる。また、ロボット１２は、たとえば無線ＬＡＮなどのネットワーク１４を介してサーバ２０にアクセスすることができる。この実施例のロボット１２は、サーバ２０と協働して、人間１６が音声、視線、および指差しで指示する物品を特定し、たとえばその物品を人間１６に持って行くなどの動作を実行する。 Referring to FIG. 1, a communication system (hereinafter simply referred to as “system”) 10 of this embodiment includes a communication robot (hereinafter simply referred to as “robot”) 12. The robot 12 can communicate with humans and other robots using at least one of voice and body movement (gesture). The robot 12 can access the server 20 via a network 14 such as a wireless LAN. The robot 12 of this embodiment cooperates with the server 20 to identify an article that the human 16 indicates with voice, line of sight, and pointing, and performs an operation such as taking the article to the human 16. .

人間１６には、無線タグ１８が装着されるとともに、図示しないが、モーションキャプチャのためのマーカが付着されている。無線タグ１８は、識別情報を含む電波信号を発信する。ここでは、電波信号に含まれる識別情報は、人間１６を個別に識別するために用いられる。マーカは、典型的には、人間の頭頂、両肩、両肘、両手の人差し指の先端などに設定されていて、それらのマーカが、人間１６の全体とともに、サーバ２０に制御されるカメラ１２０によって撮影される。カメラ１２０は、実施例では、３つ設けられ、人間１６を３方向から撮影し、そのカメラ映像についてのデータ（カメラ映像データ）をサーバ２０に供給する。 A wireless tag 18 is attached to the person 16 and a marker for motion capture is attached, although not shown. The wireless tag 18 transmits a radio signal including identification information. Here, the identification information included in the radio signal is used to individually identify the person 16. The markers are typically set on the top of the human head, both shoulders, both elbows, the tip of the index finger of both hands, etc., and these markers together with the entire human 16 are controlled by the camera 120 controlled by the server 20. Taken. In the embodiment, three cameras 120 are provided, take a picture of the human 16 from three directions, and supply data about the camera video (camera video data) to the server 20.

サーバ２０は、ネットワーク１４に接続され、上述のようにして入力されるカメラ映像データに基づいて、マーカの動きを検出するモーションキャプチャ処理を実行するとともに、たとえば肌色領域を検出することによって、人間１６の顔の位置を特定することができる。 The server 20 is connected to the network 14 and executes a motion capture process for detecting the movement of the marker based on the camera video data input as described above. The position of the face can be specified.

このシステム１０では、上述のように、ロボット１２が人間１６の指示する物品を対象物として特定するものである。対象物となり得る物品の例として、この実施例では、本（書籍）２４を用いる。本２４（図１では、「ＯＢＪ」と表記してある。）には、その本を識別可能な情報（識別情報）を電波信号に含んで発信する無線タグ１８が付着されている。 In the system 10, as described above, the robot 12 specifies an article designated by the human 16 as an object. In this embodiment, a book (book) 24 is used as an example of an article that can be an object. The book 24 (indicated as “OBJ” in FIG. 1) is attached with a wireless tag 18 that includes information (identification information) that can identify the book in a radio wave signal and transmits the information.

ただし、対象物となり得る物品は実施例の書籍だけでなく、もし家庭用のシステムであれば、家庭内のあらゆる物品が考えられる。また、当然、家庭用としてだけではなく、人間と一緒に働く任意の場所（会社、事務所、工場など）での利用が考えられる。かかる場合には、任意の場所に存在する種々の物品が対象物となり得る。 However, the articles that can be the object are not only the books of the embodiment, but any household article can be considered if it is a home system. Naturally, it can be used not only for home use but also in any place (company, office, factory, etc.) that works with people. In such a case, various articles existing in any place can be the object.

そして、このシステム１０が対象とするすべての物品（本２４）についての情報は、サーバ２０に接続された物品辞書データベース（ＤＢ）１２２に登録される。物品辞書ＤＢ１２２については後述する。 Information about all articles (books 24) targeted by the system 10 is registered in an article dictionary database (DB) 122 connected to the server 20. The article dictionary DB 122 will be described later.

また、システム１０が対象とする人間１６に装着された無線タグ１８から発信された識別情報は、複数存在するアンテナ１２４のいずれかを介して無線タグ読取装置２０８（図４参照）で読み取られる。そして、無線タグ読取装置２０８で読み取られた識別情報はサーバ２０に与えられる。これに応じて、サーバ２０は、識別情報から人間１１６を特定するとともに、識別情報を受信した（読み取った）アンテナ１２４が配置されている位置から人間１６の若干大まかな位置を把握（検出）する。 Further, identification information transmitted from the wireless tag 18 attached to the person 16 targeted by the system 10 is read by the wireless tag reader 208 (see FIG. 4) via any one of the plurality of antennas 124. The identification information read by the wireless tag reader 208 is given to the server 20. In response to this, the server 20 identifies the person 116 from the identification information, and grasps (detects) a slightly rough position of the person 16 from the position where the antenna 124 that receives (reads) the identification information is disposed. .

なお、図１では、簡単のため、１台のロボット１２を示してあるが、２台以上であってよい。また、人間１６は１人に限定される必要はなく、無線タグ１８で識別できるので、複数であってよい。 In FIG. 1, one robot 12 is shown for simplicity, but two or more robots may be used. Further, the human 16 is not necessarily limited to one person, and can be identified by the wireless tag 18 and may be plural.

また、図１に示す実施例では、このシステム１０を設置している空間のワールド座標を用いてロボット１２、人間１６、物品２４などの位置が表現されていて、他方、ロボット１２の制御はロボット座標で行なわれるので、詳細は説明しないが、ロボット１２は、後述の処理における必要に応じて、ロボット座標とワールド座標との間の座標変換処理を実行するものである。 In the embodiment shown in FIG. 1, the positions of the robot 12, the human 16, the article 24, and the like are expressed using the world coordinates of the space where the system 10 is installed. Since it is performed in coordinates, details will not be described, but the robot 12 performs a coordinate conversion process between the robot coordinates and the world coordinates as necessary in the process described later.

図２を参照して、ロボット１２のハードウェアの構成について説明する。図２は、この実施例のロボット１２の外観を示す正面図である。ロボット１２は台車３０を含み、台車３０の下面にはロボット１２を自律移動させる２つの車輪３２および１つの従輪３４が設けられる。２つの車輪３２は車輪モータ３６（図３参照）によってそれぞれ独立に駆動され、台車３０すなわちロボット１２を前後左右の任意方向に動かすことができる。また、従輪３４は車輪３２を補助する補助輪である。したがって、ロボット１２は、配置された空間内を自律制御によって移動可能である。 The hardware configuration of the robot 12 will be described with reference to FIG. FIG. 2 is a front view showing the appearance of the robot 12 of this embodiment. The robot 12 includes a carriage 30, and two wheels 32 and one slave wheel 34 for autonomously moving the robot 12 are provided on the lower surface of the carriage 30. The two wheels 32 are independently driven by a wheel motor 36 (see FIG. 3), and the carriage 30, that is, the robot 12 can be moved in any direction, front, back, left, and right. The slave wheel 34 is an auxiliary wheel that assists the wheel 32. Therefore, the robot 12 can move in the arranged space by autonomous control.

台車３０の上には、円柱形のセンサ取り付けパネル３８が設けられ、このセンサ取り付けパネル３８には、多数の赤外線距離センサ４０が取り付けられる。これらの赤外線距離センサ４０は、センサ取り付けパネル３８すなわちロボット１２の周囲の物体（人間や障害物など）との距離を測定するものである。 A cylindrical sensor attachment panel 38 is provided on the carriage 30, and a large number of infrared distance sensors 40 are attached to the sensor attachment panel 38. These infrared distance sensors 40 measure the distance to the sensor mounting panel 38, that is, the object (human being, obstacle, etc.) around the robot 12.

なお、この実施例では、距離センサとして、赤外線距離センサを用いるようにしてあるが、赤外線距離センサに代えて、超音波距離センサやミリ波レーダなどを用いることもできる。 In this embodiment, an infrared distance sensor is used as the distance sensor, but an ultrasonic distance sensor, a millimeter wave radar, or the like can be used instead of the infrared distance sensor.

センサ取り付けパネル３８の上には、胴体４２が直立するように設けられる。また、胴体４２の前方中央上部（人の胸に相当する位置）には、上述した赤外線距離センサ４０がさらに設けられ、ロボット１２の前方の主として人間との距離を計測する。また、胴体４２には、その側面側上端部のほぼ中央から伸びる支柱４４が設けられ、支柱４４の上には、全方位カメラ４６が設けられる。全方位カメラ４６は、ロボット１２の周囲を撮影するものであり、後述する眼カメラ７０とは区別される。この全方位カメラ４６としては、たとえばＣＣＤやＣＭＯＳのような固体撮像素子を用いるカメラを採用することができる。なお、これら赤外線距離センサ４０および全方位カメラ４６の設置位置は、当該部位に限定されず適宜変更され得る。 A body 42 is provided on the sensor mounting panel 38 so as to stand upright. Further, the above-described infrared distance sensor 40 is further provided in the upper front upper portion of the body 42 (a position corresponding to a human chest), and measures the distance mainly to a human in front of the robot 12. Further, the body 42 is provided with a support column 44 extending from substantially the center of the upper end of the side surface, and an omnidirectional camera 46 is provided on the support column 44. The omnidirectional camera 46 photographs the surroundings of the robot 12 and is distinguished from an eye camera 70 described later. As this omnidirectional camera 46, for example, a camera using a solid-state imaging device such as a CCD or a CMOS can be adopted. In addition, the installation positions of the infrared distance sensor 40 and the omnidirectional camera 46 are not limited to the portions, and can be changed as appropriate.

胴体４２の両側面上端部（人の肩に相当する位置）には、それぞれ、肩関節４８Ｒおよび肩関節４８Ｌによって、上腕５０Ｒおよび上腕５０Ｌが設けられる。図示は省略するが、肩関節４８Ｒおよび肩関節４８Ｌは、それぞれ、直交する３軸の自由度を有する。すなわち、肩関節４８Ｒは、直交する３軸のそれぞれの軸廻りにおいて上腕５０Ｒの角度を制御できる。肩関節４８Ｒの或る軸（ヨー軸）は、上腕５０Ｒの長手方向（または軸）に平行な軸であり、他の２軸（ピッチ軸およびロール軸）は、その軸にそれぞれ異なる方向から直交する軸である。同様にして、肩関節４８Ｌは、直交する３軸のそれぞれの軸廻りにおいて上腕５０Ｌの角度を制御できる。肩関節４８Ｌの或る軸（ヨー軸）は、上腕５０Ｌの長手方向（または軸）に平行な軸であり、他の２軸（ピッチ軸およびロール軸）は、その軸にそれぞれ異なる方向から直交する軸である。 An upper arm 50R and an upper arm 50L are provided at upper end portions on both sides of the torso 42 (position corresponding to a human shoulder) by a shoulder joint 48R and a shoulder joint 48L, respectively. Although illustration is omitted, each of the shoulder joint 48R and the shoulder joint 48L has three orthogonal degrees of freedom. That is, the shoulder joint 48R can control the angle of the upper arm 50R around each of three orthogonal axes. A certain axis (yaw axis) of the shoulder joint 48R is an axis parallel to the longitudinal direction (or axis) of the upper arm 50R, and the other two axes (pitch axis and roll axis) are orthogonal to the axes from different directions. It is an axis to do. Similarly, the shoulder joint 48L can control the angle of the upper arm 50L around each of three orthogonal axes. A certain axis (yaw axis) of the shoulder joint 48L is an axis parallel to the longitudinal direction (or axis) of the upper arm 50L, and the other two axes (pitch axis and roll axis) are orthogonal to the axes from different directions. It is an axis to do.

また、上腕５０Ｒおよび上腕５０Ｌのそれぞれの先端には、肘関節５２Ｒおよび肘関節５２Ｌが設けられる。図示は省略するが、肘関節５２Ｒおよび肘関節５２Ｌは、それぞれ１軸の自由度を有し、この軸（ピッチ軸）の軸回りにおいて前腕５４Ｒおよび前腕５４Ｌの角度を制御できる。 In addition, an elbow joint 52R and an elbow joint 52L are provided at the respective distal ends of the upper arm 50R and the upper arm 50L. Although illustration is omitted, each of the elbow joint 52R and the elbow joint 52L has one degree of freedom, and the angle of the forearm 54R and the forearm 54L can be controlled around the axis (pitch axis).

前腕５４Ｒおよび前腕５４Ｌのそれぞれの先端には、人の手に相当するハンド５６Ｒおよびハンド５６Ｌがそれぞれ設けられる。これらのハンド５６Ｒおよび５６Ｌは、詳細な図示は省略するが、開閉可能に構成され、それによってロボット１２は、ハンド５６Ｒおよび５６Ｌを用いて物体を把持または挟持することができる。ただし、ハンド５６Ｒ，５６Ｌの形状は実施例の形状に限らず、人間の手に酷似した形状や機能を持たせるようにしてもよい。 At the tip of each of the forearm 54R and the forearm 54L, a hand 56R and a hand 56L corresponding to a human hand are provided. Although the detailed illustration is omitted, these hands 56R and 56L are configured to be openable and closable so that the robot 12 can grip or hold an object using the hands 56R and 56L. However, the shape of the hands 56R and 56L is not limited to the shape of the embodiment, and may have a shape and a function very similar to a human hand.

また、図示は省略するが、台車３０の前面，肩関節４８Ｒと肩関節４８Ｌとを含む肩に相当する部位，上腕５０Ｒ，上腕５０Ｌ，前腕５４Ｒ，前腕５４Ｌ，球体５６Ｒおよび球体５６Ｌには、それぞれ、接触センサ５８（図３で包括的に示す）が設けられる。台車３０の前面の接触センサ５８は、台車３０への人間や他の障害物の接触を検知する。したがって、ロボット１２は、その自身の移動中に障害物との接触が有ると、それを検知し、直ちに車輪３２の駆動を停止してロボット１２の移動を急停止させることができる。また、その他の接触センサ５８は、当該各部位に触れたかどうかを検知する。なお、接触センサ５８の設置位置は、当該部位に限定されず、適宜な位置（人の胸，腹，脇，背中および腰に相当する位置）に設けられてもよい。 Although not shown, the front surface of the carriage 30, the portion corresponding to the shoulder including the shoulder joint 48R and the shoulder joint 48L, the upper arm 50R, the upper arm 50L, the forearm 54R, the forearm 54L, the sphere 56R, and the sphere 56L, A contact sensor 58 (shown generically in FIG. 3) is provided. A contact sensor 58 on the front surface of the carriage 30 detects contact of a person or another obstacle with the carriage 30. Therefore, the robot 12 can detect the contact with the obstacle during its movement and immediately stop the driving of the wheel 32 to suddenly stop the movement of the robot 12. Further, the other contact sensors 58 detect whether or not the respective parts are touched. In addition, the installation position of the contact sensor 58 is not limited to the said site | part, and may be provided in an appropriate position (position corresponding to a person's chest, abdomen, side, back, and waist).

胴体４２の中央上部（人の首に相当する位置）には首関節６０が設けられ、さらにその上には頭部６２が設けられる。図示は省略するが、首関節６０は、３軸の自由度を有し、３軸の各軸廻りに角度制御可能である。或る軸（ヨー軸）はロボット１２の真上（鉛直上向き）に向かう軸であり、他の２軸（ピッチ軸、ロール軸）は、それぞれ、それと異なる方向で直交する軸である。 A neck joint 60 is provided at the upper center of the body 42 (a position corresponding to a person's neck), and a head 62 is further provided thereon. Although illustration is omitted, the neck joint 60 has a degree of freedom of three axes, and the angle can be controlled around each of the three axes. A certain axis (yaw axis) is an axis directed directly above (vertically upward) of the robot 12, and the other two axes (pitch axis and roll axis) are axes orthogonal to each other in different directions.

頭部６２には、人の口に相当する位置に、スピーカ６４が設けられる。スピーカ６４は、ロボット１２が、それの周辺の人間に対して音声ないし音によってコミュニケーションを取るために用いられる。また、人の耳に相当する位置には、マイク６６Ｒおよびマイク６６Ｌが設けられる。以下、右のマイク６６Ｒと左のマイク６６Ｌとをまとめてマイク６６ということがある。マイク６６は、周囲の音、とりわけコミュニケーションを実行する対象である人間の音声を取り込む。さらに、人の目に相当する位置には、眼球部６８Ｒおよび眼球部６８Ｌが設けられる。眼球部６８Ｒおよび眼球部６８Ｌは、それぞれ眼カメラ７０Ｒおよび眼カメラ７０Ｌを含む。以下、右の眼球部６８Ｒと左の眼球部６８Ｌとをまとめて眼球部６８ということがある。また、右の眼カメラ７０Ｒと左の眼カメラ７０Ｌとをまとめて眼カメラ７０ということがある。 The head 62 is provided with a speaker 64 at a position corresponding to a human mouth. The speaker 64 is used for the robot 12 to communicate with humans around it by voice or sound. A microphone 66R and a microphone 66L are provided at a position corresponding to a human ear. Hereinafter, the right microphone 66R and the left microphone 66L may be collectively referred to as a microphone 66. The microphone 66 captures ambient sounds, in particular, the voices of humans who are subjects of communication. Furthermore, an eyeball part 68R and an eyeball part 68L are provided at positions corresponding to human eyes. The eyeball portion 68R and the eyeball portion 68L include an eye camera 70R and an eye camera 70L, respectively. Hereinafter, the right eyeball part 68R and the left eyeball part 68L may be collectively referred to as the eyeball part 68. The right eye camera 70R and the left eye camera 70L may be collectively referred to as an eye camera 70.

眼カメラ７０は、ロボット１２に接近した人間の顔や他の部分ないし物体などを撮影して、それに対応する映像信号を取り込む。この実施例では、ロボット１２は、この眼カメラ７０からの映像信号によって、人間１６の左右両目のそれぞれの視線方向（ベクトル）を検出する。その視線検出方法は具体的には、２つのカメラを用いるものとして特開２００４‐２５５０７４号公報に、１つのカメラを用いるものとして特開２００６‐１７２２０９号公報や特開２００６‐２８５５３１号公報開示されるが、ここではその詳細は重要ではないので、これらの公開公報を引用するにとどめる。 The eye camera 70 captures a human face approaching the robot 12, other parts or objects, and captures a corresponding video signal. In this embodiment, the robot 12 detects the line-of-sight directions (vectors) of the left and right eyes of the human 16 from the video signal from the eye camera 70. Specifically, the line-of-sight detection method is disclosed in Japanese Patent Application Laid-Open No. 2004-255074 as using two cameras, and Japanese Patent Application Laid-Open No. 2006-172209 and Japanese Patent Application Laid-Open No. 2006-285531 as using one camera. However, the details are not important here, so only those publications are cited.

ただし、人間１６の視線ベクトルの検出のためには、よく知られているアイマークレコーダなどが利用されてもよい。 However, a well-known eye mark recorder or the like may be used for detecting the line-of-sight vector of the human 16.

また、眼カメラ７０は、上述した全方位カメラ４６と同様のカメラを用いることができる。たとえば、眼カメラ７０は、眼球部６８内に固定され、眼球部６８は、眼球支持部（図示せず）を介して頭部６２内の所定位置に取り付けられる。図示は省略するが、眼球支持部は、２軸の自由度を有し、それらの各軸廻りに角度制御可能である。たとえば、この２軸の一方は、頭部６２の上に向かう方向の軸（ヨー軸）であり、他方は、一方の軸に直交しかつ頭部６２の正面側（顔）が向く方向に直行する方向の軸（ピッチ軸）である。眼球支持部がこの２軸の各軸廻りに回転されることによって、眼球部６８ないし眼カメラ７０の先端（正面）側が変位され、カメラ軸すなわち視線方向が移動される。なお、上述のスピーカ６４，マイク６６および眼カメラ７０の設置位置は、当該部位に限定されず、適宜な位置に設けられてよい。 The eye camera 70 can be the same camera as the omnidirectional camera 46 described above. For example, the eye camera 70 is fixed in the eyeball unit 68, and the eyeball unit 68 is attached to a predetermined position in the head 62 via an eyeball support unit (not shown). Although illustration is omitted, the eyeball support portion has two degrees of freedom, and the angle can be controlled around each of these axes. For example, one of the two axes is an axis (yaw axis) in a direction toward the top of the head 62, and the other is orthogonal to the one axis and goes straight in a direction in which the front side (face) of the head 62 faces. It is an axis (pitch axis) in the direction to be performed. By rotating the eyeball support portion around each of these two axes, the tip (front) side of the eyeball portion 68 or the eye camera 70 is displaced, and the camera axis, that is, the line-of-sight direction is moved. Note that the installation positions of the speaker 64, the microphone 66, and the eye camera 70 described above are not limited to those portions, and may be provided at appropriate positions.

このように、この実施例のロボット１２は、車輪３２の独立２軸駆動，肩関節４８の３自由度（左右で６自由度），肘関節５２の１自由度（左右で２自由度），首関節６０の３自由度および眼球支持部の２自由度（左右で４自由度）の合計１７自由度を有する。 As described above, the robot 12 of this embodiment includes independent two-axis driving of the wheels 32, three degrees of freedom of the shoulder joint 48 (6 degrees of freedom on the left and right), one degree of freedom of the elbow joint 52 (2 degrees of freedom on the left and right), It has a total of 17 degrees of freedom, 3 degrees of freedom for the neck joint 60 and 2 degrees of freedom for the eyeball support (4 degrees of freedom on the left and right).

図３はロボット１２の電気的な構成を示すブロック図である。この図３を参照して、ロボット１２は、ＣＰＵ８０を含む。ＣＰＵ８０は、マイクロコンピュータ或いはプロセッサとも呼ばれ、バス８２を介して、メモリ８４，モータ制御ボード８６，センサ入力／出力ボード８８および音声入力／出力ボード９０に接続される。 FIG. 3 is a block diagram showing the electrical configuration of the robot 12. With reference to FIG. 3, the robot 12 includes a CPU 80. The CPU 80 is also called a microcomputer or a processor, and is connected to the memory 84, the motor control board 86, the sensor input / output board 88 and the audio input / output board 90 via the bus 82.

メモリ８４は、図示は省略をするが、ＲＯＭ，ＨＤＤおよびＲＡＭを含む。ＲＯＭおよびＨＤＤには、ロボット１２の動作を制御するための制御プログラムが予め記憶される。たとえば、各センサの出力（センサ情報）を検知するための検知プログラムや、外部コンピュータとの間で必要なデータやコマンドを送受信するための通信プログラムなどが記録される。また、ＲＡＭは、ワークメモリやバッファメモリとして用いられる。 The memory 84 includes a ROM, an HDD, and a RAM (not shown). In the ROM and the HDD, a control program for controlling the operation of the robot 12 is stored in advance. For example, a detection program for detecting the output (sensor information) of each sensor, a communication program for transmitting / receiving necessary data and commands to / from an external computer, and the like are recorded. The RAM is used as a work memory or a buffer memory.

さらに、この実施例では、ロボット１２は、人間１６とのコミュニケーションをとるために発話したり、ジェスチャしたりできるように構成されているが、メモリ８４に、このような発話やジェスチャのための発話／ジェスチャ辞書８５Ａが設定されている。 Furthermore, in this embodiment, the robot 12 is configured to be able to speak and make a gesture to communicate with the human 16, but the memory 84 has a speech for such a speech and gesture. / Gesture dictionary 85A is set.

モータ制御ボード８６は、たとえばＤＳＰで構成され、各腕や首関節および眼球部などの各軸モータの駆動を制御する。すなわち、モータ制御ボード８６は、ＣＰＵ８０からの制御データを受け、右眼球部６８Ｒの２軸のそれぞれの角度を制御する２つのモータ（図３では、まとめて「右眼球モータ９２」と示す）の回転角度を制御する。同様にして、モータ制御ボード８６は、ＣＰＵ８０からの制御データを受け、左眼球部６８Ｌの２軸のそれぞれの角度を制御する２つのモータ（図３では、まとめて「左眼球モータ９４」と示す）の回転角度を制御する。 The motor control board 86 is constituted by, for example, a DSP, and controls driving of each axis motor such as each arm, neck joint, and eyeball unit. That is, the motor control board 86 receives control data from the CPU 80, and controls two motors (collectively indicated as “right eyeball motor 92” in FIG. 3) that control the angles of the two axes of the right eyeball portion 68R. Control the rotation angle. Similarly, the motor control board 86 receives control data from the CPU 80, and controls two angles of the two axes of the left eyeball portion 68L (in FIG. 3, collectively referred to as “left eyeball motor 94”). ) To control the rotation angle.

また、モータ制御ボード８６は、ＣＰＵ８０からの制御データを受け、肩関節４８Ｒの直交する３軸のそれぞれの角度を制御する３つのモータと肘関節５２Ｒの角度を制御する１つのモータとの計４つのモータ（図３では、まとめて「右腕モータ９６」と示す）の回転角度を制御する。同様にして、モータ制御ボード８６は、ＣＰＵ８０からの制御データを受け、肩関節４８Ｌの直交する３軸のそれぞれの角度を制御する３つのモータと肘関節５２Ｌの角度を制御する１つのモータとの計４つのモータ（図３では、まとめて「左腕モータ９８」と示す）の回転角度を制御する。 The motor control board 86 receives control data from the CPU 80, and includes a total of four motors including three motors for controlling the angles of the three orthogonal axes of the shoulder joint 48R and one motor for controlling the angle of the elbow joint 52R. The rotation angle of two motors (collectively indicated as “right arm motor 96” in FIG. 3) is controlled. Similarly, the motor control board 86 receives control data from the CPU 80, and includes three motors for controlling the angles of the three orthogonal axes of the shoulder joint 48L and one motor for controlling the angle of the elbow joint 52L. The rotation angles of a total of four motors (collectively indicated as “left arm motor 98” in FIG. 3) are controlled.

さらに、モータ制御ボード８６は、ＣＰＵ８０からの制御データを受け、首関節６０の直交する３軸のそれぞれの角度を制御する３つのモータ（図３では、まとめて「頭部モータ１００」と示す）の回転角度を制御する。そして、モータ制御ボード８６は、ＣＰＵ８０からの制御データを受け、車輪３２を駆動する２つのモータ（図３では、まとめて「車輪モータ３６」と示す）の回転角度を制御する。 Further, the motor control board 86 receives control data from the CPU 80, and controls three motors that control the angles of the three orthogonal axes of the neck joint 60 (in FIG. 3, collectively indicated as “head motor 100”). Control the rotation angle. The motor control board 86 receives control data from the CPU 80 and controls the rotation angles of the two motors (collectively indicated as “wheel motor 36” in FIG. 3) that drive the wheels 32.

モータ制御ボード８６にはさらにハンドアクチュエータ１０８が結合され、モータ制御ボード８６は、ＣＰＵ８０からの制御データを受け、ハンド５６Ｒ，５６Ｌの開閉を制御する。 A hand actuator 108 is further coupled to the motor control board 86, and the motor control board 86 receives control data from the CPU 80 and controls the opening and closing of the hands 56R and 56L.

なお、この実施例では、車輪モータ３６を除くモータは、制御を簡素化するためにステッピングモータ（すなわち、パルスモータ）を用いる。ただし、車輪モータ３６と同様に直流モータを用いるようにしてもよい。また、ロボット１２の身体部位を駆動するアクチュエータは、電流を動力源とするモータに限らず適宜変更された、たとえば、他の実施例では、エアアクチュエータが適用されてもよい。 In this embodiment, a motor other than the wheel motor 36 uses a stepping motor (that is, a pulse motor) in order to simplify the control. However, a DC motor may be used similarly to the wheel motor 36. The actuator that drives the body part of the robot 12 is not limited to a motor that uses a current as a power source, and may be changed as appropriate. For example, in another embodiment, an air actuator may be applied.

センサ入力／出力ボード８８は、モータ制御ボード８６と同様に、ＤＳＰで構成され、各センサからの信号を取り込んでＣＰＵ８０に与える。すなわち、赤外線距離センサ４０のそれぞれからの反射時間に関するデータがこのセンサ入力／出力ボード８８を通じてＣＰＵ８０に入力される。また、全方位カメラ４６からの映像信号が、必要に応じてセンサ入力／出力ボード８８で所定の処理を施してからＣＰＵ８０に入力される。眼カメラ７０からの映像信号も、同様にして、ＣＰＵ８０に入力される。また、上述した複数の接触センサ５８（図３では、まとめて「接触センサ５８」と示す）からの信号がセンサ入力／出力ボード８８を介してＣＰＵ８０に与えられる。音声入力／出力ボード９０もまた、同様に、ＤＳＰで構成され、ＣＰＵ８０から与えられる音声合成データに従った音声または声がスピーカ６４から出力される。また、マイク６６からの音声入力が、音声入力／出力ボード９０を介してＣＰＵ８０に与えられる。 Similar to the motor control board 86, the sensor input / output board 88 is configured by a DSP and takes in signals from each sensor and gives them to the CPU 80. That is, data relating to the reflection time from each of the infrared distance sensors 40 is input to the CPU 80 through the sensor input / output board 88. The video signal from the omnidirectional camera 46 is input to the CPU 80 after being subjected to predetermined processing by the sensor input / output board 88 as necessary. Similarly, the video signal from the eye camera 70 is also input to the CPU 80. Further, signals from the plurality of contact sensors 58 described above (collectively indicated as “contact sensors 58” in FIG. 3) are provided to the CPU 80 via the sensor input / output board 88. Similarly, the voice input / output board 90 is also configured by a DSP, and voice or voice in accordance with voice synthesis data provided from the CPU 80 is output from the speaker 64. In addition, voice input from the microphone 66 is given to the CPU 80 via the voice input / output board 90.

また、ＣＰＵ８０は、バス８２を介して通信ＬＡＮボード１０２に接続される。通信ＬＡＮボード１０２は、たとえばＤＳＰで構成され、ＣＰＵ８０から与えられた送信データを無線通信装置１０４に与え、無線通信装置１０４は送信データを、ネットワーク１４を介してサーバ２０に送信する。また、通信ＬＡＮボード１０２は、無線通信装置１０４を介してデータを受信し、受信したデータをＣＰＵ８０に与える。たとえば、送信データとしては、ロボット１２からサーバ２０への信号（コマンド）であったり、ロボット１２が行ったコミュニケーションについての動作履歴情報（履歴データ）などであったりする。このように、コマンドのみならず履歴データを送信するのは、メモリ８４の容量を少なくするためと、消費電力を抑えるためである。この実施例では、履歴データはコミュニケーションが実行される度に、サーバ２０に送信されたが、一定時間または一定量の単位でサーバ２０に送信されるようにしてもよい。 The CPU 80 is connected to the communication LAN board 102 via the bus 82. The communication LAN board 102 is configured by a DSP, for example, and provides transmission data provided from the CPU 80 to the wireless communication device 104, and the wireless communication device 104 transmits the transmission data to the server 20 via the network 14. In addition, the communication LAN board 102 receives data via the wireless communication device 104 and provides the received data to the CPU 80. For example, the transmission data may be a signal (command) from the robot 12 to the server 20, or operation history information (history data) regarding communication performed by the robot 12. The reason why the history data is transmitted as well as the command is to reduce the capacity of the memory 84 and to reduce power consumption. In this embodiment, the history data is transmitted to the server 20 every time communication is performed. However, the history data may be transmitted to the server 20 in units of a fixed time or a fixed amount.

さらに、ＣＰＵ８０は、バス８２を介して無線タグ読取装置１０６が接続される。無線タグ読取装置１０６は、アンテナ（図示せず）を介して、無線タグ１８（ＲＦＩＤタグ）から送信される識別情報の重畳された電波を受信する。そして、無線タグ読取装置１０６は、受信した電波信号を増幅し、当該電波信号から識別信号を分離し、当該識別情報を復調（デコード）してＣＰＵ８０に与える。図１によれば無線タグ１８は、ロボット１２が配置された会社の受付や一般家庭の居間などに居る人間１６や物品（この実施例では、本２４）に装着され、無線タグ読取装置１０６は、通信可能範囲内の無線タグ１８から発信される電波信号を検出する。 Further, the wireless tag reader 106 is connected to the CPU 80 via the bus 82. The wireless tag reader 106 receives a radio wave superimposed with identification information transmitted from the wireless tag 18 (RFID tag) via an antenna (not shown). Then, the RFID tag reader 106 amplifies the received radio wave signal, separates the identification signal from the radio wave signal, demodulates (decodes) the identification information, and supplies the identification information to the CPU 80. According to FIG. 1, the wireless tag 18 is attached to a person 16 or an article (in this embodiment, book 24) in the reception of the company where the robot 12 is disposed or in the living room of a general household, and the wireless tag reader 106 is The radio signal transmitted from the wireless tag 18 within the communicable range is detected.

なお、無線タグ１８は、アクティブ型であってもよいし、無線タグ読取装置１０６から送信される電波に応じて駆動されるパッシブ型であってもよい。 Note that the wireless tag 18 may be an active type or a passive type that is driven according to a radio wave transmitted from the wireless tag reader 106.

図４を参照して、サーバ２０のハードウェアの構成について説明する。図４に示すように、サーバ２０は、ＣＰＵ２００を含む。ＣＰＵ２００は、プロセッサとも呼ばれ、バス２０２を介して、メモリ２０４、カメラ制御ボード２０６、無線タグ読取装置２０８、ＬＡＮ制御ボード２１０、入力装置制御ボード２１２、およびモニタ制御ボード２１４に接続される。 The hardware configuration of the server 20 will be described with reference to FIG. As shown in FIG. 4, the server 20 includes a CPU 200. The CPU 200, also called a processor, is connected to the memory 204, the camera control board 206, the wireless tag reader 208, the LAN control board 210, the input device control board 212, and the monitor control board 214 via the bus 202.

ＣＰＵ２００は、サーバ２０の全体の制御を司る。メモリ２０４は、ＲＯＭ、ＲＡＭ、およびＨＤＤなどを包括的に示したものであり、サーバ２０の動作のためのプログラムを記録したり、ＣＰＵ２００が動作する際のワークエリアとして機能したりする。カメラ制御ボード２０６は、当該制御ボード２０６に接続されるカメラ１２０を制御するためのものである。 The CPU 200 governs overall control of the server 20. The memory 204 comprehensively shows ROM, RAM, HDD, and the like, and records a program for operating the server 20 and functions as a work area when the CPU 200 operates. The camera control board 206 is for controlling the camera 120 connected to the control board 206.

無線タグ読取装置２０８は、当該制御ボード２０８に接続されるアンテナ１２４を介して人間１６や物品（本）２４に装着された無線タグ１８から送信される識別情報の重畳された電波を受信する。そして、無線タグ読取装置２０８は、受信した電波信号を増幅し、当該電波信号から識別信号を分離し、当該識別情報を復調（デコード）してＣＰＵ２００に与える。アンテナ１２４は、ロボット１２が配置された会社の受付や一般家庭の各部屋などにくまなく配置され、システム１０が対象とするすべての物品（本）２４および人間１６の無線タグ１８から電波を受信できるようになっている。したがって、アンテナ１２４は複数存在するが、図１および図４では包括的に示している。 The wireless tag reader 208 receives a radio wave superimposed with identification information transmitted from the wireless tag 18 attached to the person 16 or the article (book) 24 via the antenna 124 connected to the control board 208. Then, the RFID tag reading device 208 amplifies the received radio wave signal, separates the identification signal from the radio wave signal, demodulates (decodes) the identification information, and gives it to the CPU 200. The antenna 124 is placed throughout the reception of the company where the robot 12 is placed, and in each room of a general household, and receives radio waves from all articles (books) 24 targeted by the system 10 and the wireless tag 18 of the human 16. It can be done. Therefore, although there are a plurality of antennas 124, they are comprehensively shown in FIGS.

また、ＬＡＮ制御ボード２１０は、当該制御ボード２１０に接続される無線通信装置２１６を制御し、サーバ２０が外部のネットワーク１４に無線によってアクセスできるようにするものである。さらに、入力装置制御ボード２１２は、当該制御ボード２１２に接続される入力装置としてのたとえば、キーボードやマウスなどによる入力を制御するものである。そして、モニタ制御ボード２１４は、当該制御ボード２１４に接続されるモニタへの出力を制御するものである。 The LAN control board 210 controls the wireless communication device 216 connected to the control board 210 so that the server 20 can access the external network 14 wirelessly. Further, the input device control board 212 controls input by, for example, a keyboard or a mouse as an input device connected to the control board 212. The monitor control board 214 controls output to a monitor connected to the control board 214.

また、サーバ２０は、図示しないインターフェースによって、物品辞書ＤＢ１２２および音声認識辞書ＤＢ１２６（図１参照）に接続されている。 The server 20 is connected to the article dictionary DB 122 and the voice recognition dictionary DB 126 (see FIG. 1) by an interface (not shown).

メモリ２０４（ＲＡＭ）には、後述するように、物品ローカル辞書データ５０４ａ、音声認識ローカル辞書データ５０４ｂ、発話辞書データ５０４ｃおよび個人正誤情報データ５０４ｄが設定（記憶）されている。 As will be described later, article local dictionary data 504a, speech recognition local dictionary data 504b, utterance dictionary data 504c, and personal correct / incorrect information data 504d are set (stored) in the memory 204 (RAM).

物品ローカル辞書データ５０４ａに対応する物品ローカル辞書は、後述するように、物品辞書ＤＢ１２２から抽出された内容が登録される辞書である。サーバ２０は、ロボット１２が人間１６を認識した際に、当該人間１６の近傍に存在する物品（本）２４の情報だけを物品辞書ＤＢ１２２から抽出して物品ローカル辞書に登録する。音声認識ローカル辞書データ５０４ｂに対応する音声認識ローカル辞書は、後述するように、音声認識辞書ＤＢ１２６から抽出された内容が登録される辞書である。サーバ２０は、ロボット１２が人間１６を認識して物品ローカル辞書を作成すると、当該物品ローカル辞書に登録されている単語を音声認識するために必要な情報を音声認識辞書ＤＢ１２６から抽出して音声認識ローカル辞書に登録する。したがって、物品ローカル辞書および音声認識ローカル辞書は、人間１６の位置の変化に応じて動的に書き換えられる。このように、音声認識辞書ＤＢ１２６に記憶された音声認識辞書から音声認識ローカル辞書を作成し、音声認識に使用する辞書を小さくすることによって音声認識の対象となる単語（音素記号列）の数を少なくし、音声認識の処理にかかる時間を短くするとともに正しく音声認識できる割合を高めることができる。 The article local dictionary corresponding to the article local dictionary data 504a is a dictionary in which contents extracted from the article dictionary DB 122 are registered, as will be described later. When the robot 12 recognizes the person 16, the server 20 extracts only the information of the article (book) 24 existing in the vicinity of the person 16 from the article dictionary DB 122 and registers it in the article local dictionary. The speech recognition local dictionary corresponding to the speech recognition local dictionary data 504b is a dictionary in which contents extracted from the speech recognition dictionary DB 126 are registered, as will be described later. When the robot 12 recognizes the human 16 and creates an article local dictionary, the server 20 extracts information necessary for voice recognition of words registered in the article local dictionary from the voice recognition dictionary DB 126 and performs voice recognition. Register in the local dictionary. Therefore, the article local dictionary and the voice recognition local dictionary are dynamically rewritten in accordance with a change in the position of the person 16. In this manner, a speech recognition local dictionary is created from the speech recognition dictionary stored in the speech recognition dictionary DB 126, and the number of words (phoneme symbol strings) that are subject to speech recognition is reduced by reducing the dictionary used for speech recognition. As a result, the time required for voice recognition processing can be shortened and the rate of correct voice recognition can be increased.

発話辞書データ５０４ｃに対応する発話辞書は、サーバ２０がロボット１２に、人間１６に対して発話させる音声の内容を決定するために必要な情報を記憶している。また、個人正誤情報データ５０４ｄに対応する個人正誤情報は、システム１０が、人間１６が指示した物品（本）２４を特定することに最終的に成功したか否かを示す情報（音声認識の成功率）を、人間１６の識別情報（ユーザＩＤ）別に記憶している。 The utterance dictionary corresponding to the utterance dictionary data 504c stores information necessary for the server 20 to determine the content of speech that the robot 12 causes the human 16 to utter. The personal correct / incorrect information corresponding to the personal correct / incorrect data 504d is information indicating whether or not the system 10 has finally succeeded in specifying the article (book) 24 instructed by the human 16 (successful speech recognition). Rate) for each person's 16 identification information (user ID).

次に、図５を参照して、物品辞書ＤＢ１２２に記憶される物品辞書は、たとえばユーコード（Ucode）のようなＩＤをそれぞれの物品の１つに割り当て、物品毎にその名称および属性などの必要な情報を登録している。なお、ユーコードは、具体的には、１２８ビットの数字からなり、３４０兆の１兆倍のさらに１兆倍の数の物品を個別に識別できるものである。ただし、この物品辞書ＤＢ１２２に使うＩＤは必ずしもこのようなユーコードである必要はなく、適宜の数字や記号の組み合わせからなるものであってよい。 Next, referring to FIG. 5, the article dictionary stored in the article dictionary DB 122 assigns an ID such as a Ucode to one of the articles, and the name and attribute of each article. Necessary information is registered. The U-code is specifically a 128-bit number, and can individually identify articles that are 1 trillion times more than 1 trillion times 340 trillion. However, the ID used for the article dictionary DB 122 does not necessarily need to be such a U-code, and may consist of a combination of appropriate numbers and symbols.

このような物品辞書は、システム１０（ロボット１２およびサーバ２０）が識別すべき対象物となるすべての、たとえば家庭内の物品をＩＤと文字列とで登録するものであり、いわばグローバル辞書に相当する。 Such an article dictionary registers all articles, for example, household articles, to be identified by the system 10 (robot 12 and server 20), for example, with IDs and character strings, which is equivalent to a global dictionary. To do.

物品辞書には、１つの物品（本）２４についての情報が１つのレコードとして登録されている。そして、上述したように、たとえば、１つのレコードには、本のＩＤ以外に「名称」および「属性」が記憶される。 In the article dictionary, information on one article (book) 24 is registered as one record. As described above, for example, “name” and “attribute” are stored in one record in addition to the book ID.

なお、図示は省略するが、本２４についての情報として、「著者」および「出版社」などがさらに記憶されてもよい。 Although illustration is omitted, “author” and “publisher” may be further stored as information about the book 24.

「名称」は、対応する本２４の表題（題号）である。また、「属性」は、本２４に関連する情報であり、この実施例では、本２４を補足的に説明する内容である。この実施例では、「属性」の項目には、本２４の種類（漫画、小説、雑誌など）、カバーの色および厚み（厚い、薄い）の情報が記憶されている。図５では分かり易く示すために、名称および属性の両方について、テキスト形式の文字列で記載してあるが、実際には、属性については、ローマ字で表記した文字列が記述されている。 “Name” is the title (title) of the corresponding book 24. The “attribute” is information related to the book 24, and in this embodiment, the content is supplementary explanation of the book 24. In this embodiment, information on the type of the book 24 (manga, novel, magazine, etc.), the color and thickness (thick, thin) of the cover is stored in the item “attribute”. In FIG. 5, for easy understanding, both the name and the attribute are described as text strings in text format, but actually, the attribute is described as a character string written in Roman letters.

次に、音声認識辞書ＤＢ１２６について説明する。一般的に、音声認識辞書には、単語辞書と文法辞書とが存在するが、音声認識辞書ＤＢ１２６は単語辞書についてのデータを記憶する。文法辞書についての説明は省略する。図示は省略するが、音声認識辞書ＤＢ１２６には、物品の名称（この実施例では、本２４の名称）についてのテキスト形式の単語（または物品の識別情報）の各々に対応して、テキスト形式の単語に対応する音素記号形式（音素記号列）が記述されたテーブルのデータ（音声認識辞書データ）が記憶される。 Next, the voice recognition dictionary DB 126 will be described. Generally, a speech recognition dictionary includes a word dictionary and a grammar dictionary, but the speech recognition dictionary DB 126 stores data about the word dictionary. A description of the grammar dictionary is omitted. Although illustration is omitted, the speech recognition dictionary DB 126 stores text-format words (or product identification information) corresponding to each of the text-format words (or product identification information) of the product names (names of the book 24 in this embodiment). Table data (speech recognition dictionary data) in which phoneme symbol formats (phoneme symbol strings) corresponding to words are described is stored.

音声認識の処理では、入力された音声を音素に分解し、分解した各音素について当該音素を表す記号を生成する。これによって、入力された音声の単語に相当する音素記号列が生成される。次に、入力された音声の単語に相当する音素記号列が、音声認識辞書ＤＢ１２６（実際には、後述する音声認識ローカル辞書）に記憶されている音素記号列と比較される。そして、入力された音声の単語に相当する音素記号列ともっとも近い音素記号列を音声認識辞書ＤＢ１２６（音声認識ローカル辞書）内で特定し、この特定した音素記号列に対応して記述されている単語を音声認識結果として出力する。 In the speech recognition process, the input speech is decomposed into phonemes, and a symbol representing the phoneme is generated for each decomposed phoneme. Thus, a phoneme symbol string corresponding to the input speech word is generated. Next, the phoneme symbol string corresponding to the input speech word is compared with the phoneme symbol string stored in the speech recognition dictionary DB 126 (actually, a speech recognition local dictionary described later). The phoneme symbol string closest to the phoneme symbol string corresponding to the input speech word is specified in the speech recognition dictionary DB 126 (speech recognition local dictionary), and is described corresponding to the specified phoneme symbol string. The word is output as a speech recognition result.

上述したように、このシステム１０では、人間１６が音声と視線および指差しによって物品（本）２４を指示すると、ロボット１２とサーバ２０とが協働して、人間１６が指示した物品（本）２４を特定し、その特定した物品（本）２４をロボット１２が人間１６のところに運搬などする。以下において、この人間１６とシステム１０とのやり取りをコミュニケーションと呼ぶことがある。 As described above, in this system 10, when the person 16 indicates the article (book) 24 by voice, line of sight, and pointing, the robot 12 and the server 20 cooperate to provide the article (book) indicated by the person 16. 24 is specified, and the robot 12 transports the specified article (book) 24 to the person 16. Hereinafter, the exchange between the human 16 and the system 10 may be referred to as communication.

より詳細に述べると、このシステム１０では、人間１６がロボット１２に近づくと、ロボット１２が人間１６を無線タグ１８によって認識する。サーバ２０には、システム１０が対象とする物品（本）２４のすべてが登録された物品辞書ＤＢ１２２、および音声認識によって物品（本）２４を特定するための単語が登録された音声認識辞書ＤＢ１２６が接続されている。ロボット１２は、人間１６を認識すると、当該人間１６の識別情報（ユーザＩＤ）をサーバ２０に送信するとともに、サーバ２０に対して物品辞書ＤＢ１２２および音声認識辞書ＤＢ１２６のローカル辞書（物品ローカル辞書、音声認識ローカル辞書）の作成を指示する。 More specifically, in the system 10, when the human 16 approaches the robot 12, the robot 12 recognizes the human 16 by the wireless tag 18. The server 20 includes an article dictionary DB 122 in which all articles (books) 24 targeted by the system 10 are registered, and a voice recognition dictionary DB 126 in which words for identifying the articles (books) 24 by voice recognition are registered. It is connected. When the robot 12 recognizes the person 16, the robot 12 transmits the identification information (user ID) of the person 16 to the server 20, and the local dictionary of the article dictionary DB 122 and the voice recognition dictionary DB 126 (article local dictionary, voice) to the server 20. Instructs creation of a recognition local dictionary.

ローカル辞書の作成の指示を受けると、サーバ２０では、ロボット１２が認識した人間１６の位置を特定し、特定した当該人間１６から所定の範囲内、たとえば、半径５ｍ以内にある物品（本）２４のレコードのみを物品辞書ＤＢ１２２から抽出して物品ローカル辞書を作成する。次に、音声認識辞書ＤＢ１２６から、物品ローカル辞書に登録されている物品（本）２４を音声認識するため必要な情報のみを抽出して音声認識ローカル辞書を作成する。 Upon receiving an instruction to create a local dictionary, the server 20 specifies the position of the person 16 recognized by the robot 12, and the article (book) 24 within a predetermined range from the specified person 16, for example, within a radius of 5 m. Are extracted from the article dictionary DB 122 to create an article local dictionary. Next, only the information necessary for voice recognition of the article (book) 24 registered in the article local dictionary is extracted from the voice recognition dictionary DB 126 to create a voice recognition local dictionary.

その後、ロボット１２は、認識した人間１６に対して、たとえば、「何か本を持ってきましょうか？」という発話を行う。この発話に対し、人間１６は、持ってきてほしい物品（本）２４に視線を向けつつ当該物品（本）２４を指差しながら、「漫画Ａを持ってきて」などと答える。 Thereafter, the robot 12 utters, for example, “Let's bring some books?” To the recognized human 16. In response to this utterance, the human 16 replies, for example, “Bring Manga A” while pointing at the article (book) 24 while looking at the article (book) 24 that he / she wants to bring.

すると、ロボット１２は、「漫画Ａを持ってきて」という人間１６の声を音声認識し、人間１６の視線を推定し、指差した指が向かっている方向を推定することによって、人間１６が指示している物品（本）２４を特定する。 Then, the robot 12 recognizes the voice of the human 16 saying “Please bring the comic A”, estimates the line of sight of the human 16, and estimates the direction in which the pointing finger is pointing. The specified article (book) 24 is specified.

人間１６が指示している物品（本）２４を特定すると、サーバ２０は、人間１６に特定した物品（本）２４を確認するためにロボット１２が発話する音声の内容、たとえば、「赤色の漫画ですか？」を決定し、ロボット１２が当該物品（本）２４（名称「漫画Ａ」）を指し示しながらこれを発話する。つまり、ロボット１２は、特定した物品（本）２４を確認するための行動（確認行動）を行う。 When the article (book) 24 instructed by the human 16 is identified, the server 20 confirms the content of the voice uttered by the robot 12 to confirm the article (book) 24 identified to the human 16, for example, “red cartoon”. Is determined, and the robot 12 speaks while pointing to the article (book) 24 (name “manga A”). That is, the robot 12 performs an action (confirmation action) for confirming the specified article (book) 24.

このとき、サーバ２０は、発話の内容を、ロボット１２が認識した人間１６の近傍や特定した物品（本）２４の近傍に存在する他の物品とは異なる特徴（属性）を用いるとともに、音声認識し易い言葉で、物品（本）２４を確認するための発話内容を生成する。 At this time, the server 20 uses features (attributes) different from those of other articles existing in the vicinity of the person 16 recognized by the robot 12 or in the vicinity of the specified article (book) 24 for the content of the utterance, and also for voice recognition. The utterance content for confirming the article (book) 24 is generated with easy-to-use words.

ここで、音声認識し易い言葉を選択するようにしてあるのは、人間はロボットの発話内容を真似する傾向があるとの知見に基づき、次回以降に、人間が音声で指示した物品（本）２４を特定し易くするためである。ただし、発話内容が短すぎる場合には、音声認識を失敗する可能性が高くなり、一方、発話内容が長過ぎる場合には、人間によって真似されない可能性が高くなるため、この実施例では、２〜３個の属性の単語を用いて発話内容を生成するようにしてある。 Here, the words that are easy to recognize are selected based on the knowledge that humans tend to imitate the utterances of robots. This is because it is easy to specify 24. However, if the utterance content is too short, there is a high possibility that the speech recognition will fail. On the other hand, if the utterance content is too long, there is a high possibility that it will not be imitated by humans. The utterance content is generated using words having three attributes.

ただし、これは単なる一例であり、発話内容は、属性の単語が長い場合には、１つの属性の単語のみを用いて決定しても良いし、属性の単語が短い場合には、さらに多くの属性を物品辞書に登録しておくことにより、４つ以上を用いて決定してもよい。 However, this is merely an example, and the content of the utterance may be determined using only one attribute word if the attribute word is long, or more if the attribute word is short. By registering the attribute in the article dictionary, it may be determined using four or more.

また、システム１０が特定した物品（本）２４を確認するために、ロボット１２がたとえば「赤色の漫画ですか？」と発話すると、人間１６は、「そうです」あるいは「ちがいます」などと発話し、ロボット１２に返答する。サーバ２０は、この人間１６の返答における音声を音声認識し、システム１０が特定した物品（本）２４が、人間１６が指示したものであるか否かを判断する。システム１０が特定した物品（本）２４が、人間１６が指示したものでなかった場合には、次の候補である物品（本）２４が、人間１６が指示したものであるか否かを確認する。一方、システム１０が特定した物品（本）２４が、人間１６が指示したものであった場合には、ロボット１２が当該物品（本）２４を人間１６のところにまで運搬する。 Further, when the robot 12 speaks, for example, “is it a red cartoon?” In order to confirm the article (book) 24 identified by the system 10, the human 16 speaks “Yes” or “No”. Then, it responds to the robot 12. The server 20 recognizes the voice in the response of the human 16 and determines whether or not the article (book) 24 specified by the system 10 is the one designated by the human 16. If the article (book) 24 identified by the system 10 is not designated by the person 16, it is confirmed whether or not the next candidate article (book) 24 is designated by the person 16. To do. On the other hand, when the article (book) 24 specified by the system 10 is the one designated by the person 16, the robot 12 carries the article (book) 24 to the person 16.

また、サーバ２０は、特定した物品（本）２４を確認した結果（正誤の情報）を累積的に記録する。サーバ２０は、この累積的に記録した正誤の情報を成功率（音声認識の成功率）として、発話内容を生成する際に参照する。 Further, the server 20 cumulatively records the result (correct / incorrect information) of confirming the specified article (book) 24. The server 20 refers to the cumulatively recorded correct / incorrect information as a success rate (speech recognition success rate) when generating the utterance content.

なお、発話内容を生成する際に単語（発話に使用する単語）を選択する方法が異なる以外は、出願人が先に出願し既に出願公開された特開平２００９−２２３１７１号に開示された内容とほぼ同じであり、また、本願発明の本質的な内容ではないため、この実施例においては、単語を決定する方法についてのみ、詳細に説明してある。 The content disclosed in Japanese Patent Application Laid-Open No. 2009-223171 previously filed by the applicant and already published, except that the method of selecting words (words used for speech) is different when generating the speech content. In this embodiment, only the method for determining the word is described in detail because they are substantially the same and are not essential contents of the present invention.

まず、特定した物品（本）２４（以下、「物品Ｘ」という。）の属性Ｆｃがすべて取得される。たとえば、名称「漫画Ａ」である物品（本）２４が特定された場合には、属性Ｆｃとして｛漫画、赤、薄い｝が取得される。次に、属性についての組（属性組）が生成される。この実施例では、属性Ｆｃのべき集合(Power set)が計算され、属性組として生成される。以下、同様である。ただし、属性Ｆｃのべき集合Ｐｏｗｅｒ（Ｆｃ）は、｛｛漫画｝、｛赤｝、｛薄い｝、｛漫画、赤｝、｛赤、薄い｝、｛漫画、薄い｝、｛漫画、赤、薄い｝｝である。 First, all the attributes Fc of the identified article (book) 24 (hereinafter referred to as “article X”) are acquired. For example, when the article (book) 24 with the name “manga A” is specified, {manga, red, light} is acquired as the attribute Fc. Next, a set for the attribute (attribute set) is generated. In this embodiment, a power set of attributes Fc is calculated and generated as an attribute set. The same applies hereinafter. However, the set Power (Fc) of the attribute Fc is {{manga}, {red}, {light}, {cartoon, red}, {red, light}, {cartoon, light}, {cartoon, red, light }}.

次に、特定した物品Ｘの近傍（たとえば、３０ｃｍ以内）に存在する他の物品が検出され、検出された他の物品についての属性が取得される。他の物品が複数存在する場合には、他の物品毎に属性組が生成（べき集合が計算）される。 Next, other articles existing in the vicinity of the identified article X (for example, within 30 cm) are detected, and attributes of the detected other articles are acquired. When there are a plurality of other articles, an attribute set is generated (a power set is calculated) for each other article.

たとえば、物品Ｘの近傍に、名称「漫画Ｂ」の本２４（以下、「物品１」という。）と、名称「雑誌Ｂ」の本２４（以下、「物品２」という。）が存在する場合には、これら他の物品１および物品２のそれぞれについて属性が取得され、それぞれについてべき集合が計算され、それぞれの属性組が生成される。ここで、物品１の属性は｛漫画、青、薄い｝であり、物品２の属性は｛雑誌、赤、厚い｝である。したがって、物品１のべき集合Ｐｏｗｅｒ（Ｆ１）は、｛｛漫画｝、｛青｝、｛薄い｝、｛漫画、青｝、｛青、薄い｝、｛漫画、薄い｝、｛漫画、青、薄い｝｝である。また、物品２のべき集合Ｐｏｗｅｒ（Ｆ２）は、｛｛雑誌｝、｛赤｝、｛厚い｝、｛雑誌、赤｝、｛赤、厚い｝、｛雑誌、厚い｝、｛雑誌、赤、厚い｝｝である。 For example, a book 24 with the name “Manga B” (hereinafter referred to as “article 1”) and a book 24 with the name “magazine B” (hereinafter referred to as “article 2”) exist in the vicinity of the article X. , Attributes are acquired for each of these other articles 1 and 2, a set for each is calculated, and each attribute set is generated. Here, the attribute of the article 1 is {cartoon, blue, thin}, and the attribute of the article 2 is {magazine, red, thick}. Therefore, the power set Power (F1) of the article 1 is {{manga}, {blue}, {light}, {cartoon, blue}, {blue, light}, {cartoon, light}, {cartoon, blue, light. }}. The power set Power (F2) of the articles 2 is {{magazine}, {red}, {thick}, {magazine, red}, {red, thick}, {magazine, thick}, {magazine, red, thick. }}.

次に、特定した物品２４の属性組と、近傍の他の物品２４の属性組とで共通する要素（共通要素）が抽出され、特定した物品２４の属性組から共通要素が削除される。具体的には、物品Ｘの属性組と物品１の属性組とから共通要素Ｅ１が抽出されるとともに、物品Ｘの属性組と物品２の属性組とから共通要素Ｅ２が抽出される。 Next, an element (common element) that is common to the attribute group of the identified article 24 and the attribute group of other neighboring articles 24 is extracted, and the common element is deleted from the attribute group of the identified article 24. Specifically, the common element E1 is extracted from the attribute group of the article X and the attribute group of the article 1, and the common element E2 is extracted from the attribute group of the article X and the attribute group of the article 2.

この実施例においては、共通要素Ｅ１は、｛｛漫画｝、｛薄い｝、｛漫画、薄い｝｝である。また、共通要素Ｅ２は、｛赤｝である。したがって、共通要素の集合Ｓは、共通要素Ｅ１と共通要素Ｅ２との和であり、具体的には、｛｛漫画｝、｛赤｝、｛薄い｝、｛漫画、薄い｝｝である。したがって、特定した物品Ｘのべき集合Ｐｏｗｅｒ（Ｆｃ）の要素から集合Ｓの要素が削除（Ｐｏｗｅｒ（Ｆｃ）−Ｓ）されると、｛｛漫画、赤｝、｛赤、薄い｝、｛漫画、赤、薄い｝｝となる。 In this embodiment, the common element E1 is {{manga}, {thin}, {cartoon, thin}}. The common element E2 is {red}. Therefore, the common element set S is the sum of the common element E1 and the common element E2, and specifically, {{manga}, {red}, {light}, {manga, light}}. Accordingly, when the elements of the set S are deleted (Power (Fc) -S) from the elements of the power set Power (Fc) of the identified article X, {{manga, red}, {red, light}, {cartoon, Red, light}}.

次に、Ｐｏｗｅｒ（Ｆｃ）−Ｓの要素のうち、単語数が最小となるものを抽出する。この演算子をたとえばｍｉｎ（）と定義する。したがって、ｍｉｎ（Ｐｏｗｅｒ（Ｆｃ）−Ｓ）は、｛｛漫画、赤｝、｛赤、薄い｝｝となる。 Next, among the elements of Power (Fc) -S, the element with the smallest number of words is extracted. This operator is defined as, for example, min (). Therefore, min (Power (Fc) -S) becomes {{manga, red}, {red, light}}.

このように、単語数が最小となる要素を選択するのは、上述したように、発話内容を、音声認識を容易にする長さであり、かつ、人間に真似される程度の長さにするためである。 In this way, the element that minimizes the number of words is selected, as described above, so that the content of the utterance has a length that facilitates speech recognition and that can be imitated by a human being. Because.

そして、ｍｉｎ（）の要素のうち、１つの要素が発話内容に使用される単語として選択される。この実施例では、特定した物品２４の近傍に存在する他の物品２４の属性との類似性が最も低い要素（非類似性が最も高い要素）が選択される。ただし、ここでの類似性は、音声で発音する場合に対比する単語が似ている度合を意味する。 Then, one element of min () elements is selected as a word used for the utterance content. In this embodiment, the element having the lowest similarity to the attribute of the other article 24 existing in the vicinity of the identified article 24 (the element having the highest dissimilarity) is selected. However, the similarity here means the degree to which the words to be compared are similar when they are pronounced by speech.

具体的には、要素に含まれる属性を示す単語（文字列）についてのレーベンシュタイン距離の和を算出し、和が最大となる要素に含まれる属性を示す単語が発話内容に使用する単語として選択される。レーベンシュタイン距離（編集距離）は、２つの文字列がどの程度異なっているかを示す数値である。具体的には、文字の挿入や削除、置換によって、１つの文字列を別の文字列に変形するのに必要な手順の最小回数として与えられる。 Specifically, the sum of the Levenshtein distances for the word (character string) indicating the attribute included in the element is calculated, and the word indicating the attribute included in the element having the maximum sum is selected as the word used for the utterance content. Is done. The Levenshtein distance (edit distance) is a numerical value indicating how different two character strings are. Specifically, it is given as the minimum number of procedures required to transform one character string into another character string by inserting, deleting, or replacing characters.

ただし、この実施例では、正しく音声認識するようにするために、レーベンシュタイン距離を算出する場合には、ローマ字で示された文字列を用いるようにしてある。このローマ字で示された文字列は、物品辞書（物品ローカル辞書）に登録されている。ただし、属性に含まれる種類、色および厚みのそれぞれについて（同じ属性同士で）レーベンシュタイン距離が算出され、合計される。 However, in this embodiment, in order to correctly recognize the voice, when calculating the Levenshtein distance, a character string shown in Roman letters is used. The character string indicated in Roman letters is registered in the article dictionary (article local dictionary). However, the Levenshtein distance is calculated and totaled for each of the types, colors, and thicknesses included in the attributes (with the same attributes).

図８（Ａ）に示すように、上記のように選出された｛漫画、赤｝についてレーベンシュタイン距離を算出する場合には、属性のうち、｛種類、色｝が対比される。したがって、特定した物品Ｘと物品１では、種類については“ｍａｎｇａ”ですべて一致し、色については“ａｋａ”と“ａｏ“とで２文字の置き換えが必要であるため、レーベンシュタイン距離ＬＤは「２」となる。また、特定した物品Ｘと物品２とでは、種類については“ｍａｎｇａ”と“ｚａｓｓｈｉ”で４文字の置き換えと１文字の追加が必要であり、色については“ａｋａ”ですべて一致するため、レーベンシュタイン距離ＬＤは「５」となる。したがって、発話内容として｛種類、色｝である｛漫画、赤｝を用いる場合のレーベンシュタイン距離ＬＤの合計は「７」となる。 As shown in FIG. 8A, when the Levenshtein distance is calculated for {manga, red} selected as described above, {type, color} among the attributes is compared. Therefore, in the identified article X and article 1, the types are all “manga” and the colors are “aka” and “ao”, and two characters must be replaced, so the Levenshtein distance LD is “ 2 ”. In addition, for the specified item X and item 2, the type must be replaced with four characters for “manga” and “zashishi” and one character must be added, and the color for all items must match with “aka”. The Stein distance LD is “5”. Therefore, the total Levenshtein distance LD when using {manga, red} of {kind, color} as the utterance content is “7”.

一方、図８（Ｂ）に示すように、上記のように選出された｛赤、薄い｝についてレーベンシュタイン距離を算出する場合には、属性のうち、｛色、厚み｝が対比される。したがって、特定した物品Ｘと物品１とでは、色については“ａｋａ”と“ａｏ”で２文字の置き換えが必要であり、厚みについては“ｕｓｕｉ”ですべて一致するため、レーベンシュタイン距離ＬＤは「２」となる。また、特定した物品Ｘと物品２とでは、色については“ａｋａ”ですべて一致し、厚みについては“ｕｓｕｉ”と“ａｔｓｕｉ”で１文字の追加と１文字の置き換えが必要であるため、レーベンシュタイン距離は「２」となる。したがって、発話内容として｛色、厚み｝である｛赤、薄い｝を用いる場合のレーベンシュタイン距離ＬＤの合計は「４」となる。 On the other hand, as shown in FIG. 8B, when calculating the Levenshtein distance for {red, thin} selected as described above, {color, thickness} among the attributes are compared. Therefore, in the specified article X and article 1, the characters “aka” and “ao” need to be replaced by two characters, and the thicknesses are all “usui”, so the Levenshtein distance LD is “ 2 ”. In addition, the specified item X and the item 2 all have the same color “aka”, and the thicknesses “usui” and “atsui” need to add one character and replace one character. The Stein distance is “2”. Therefore, the total of the Levenshtein distance LD when {color, thickness} {red, thin} is used as the utterance content is “4”.

以上より、発話内容としては、レーベンシュタイン距離ＬＤの合計が大きい｛漫画、赤｝が選択される。 As described above, {manga, red} having a large total Levenshtein distance LD is selected as the utterance content.

次に、特定した物品（本）２４の近傍に他の物品（本）２４が存在しない場合についての発話内容に使用する単語の選択方法につい説明する。 Next, a method for selecting a word to be used for the utterance contents when no other article (book) 24 exists in the vicinity of the identified article (book) 24 will be described.

特定した物品（本）２４の近傍に他の物品（本）２４が存在しない場合には、特定した物品（本）２４のすべての属性が取得される。特定した物品（本）２４の近傍に他の物品（本）２４が存在しない場合には、基本的には、属性のうち、種類、色および厚みの文字列のいずれを用いても、当該特定の物品（本）２４を確認することができるからである。 If no other article (book) 24 exists in the vicinity of the identified article (book) 24, all attributes of the identified article (book) 24 are acquired. When there is no other article (book) 24 in the vicinity of the identified article (book) 24, basically, any of the character strings of type, color, and thickness among the attributes is used. This is because the article (book) 24 can be confirmed.

ただし、人間１６の近傍に他の物品（本）２４が存在する場合には、当該他の物品（本）２４の属性の文字列とのレーベンシュタイン距離の和が最大となる属性の単語を発話内容に使用する単語として選択（決定）する。ここでは、他の物品との間で、属性の種類、色、厚みのそれぞれについてレーベンシュタイン距離ＬＤを算出し、種類、色、厚みについてのレーベンシュタイン距離ＬＤの和を算出する。そして、最も和の大きい単語（種類、色、厚みについての単語）を、発話内容に使用する単語として選択する。 However, when another article (book) 24 exists in the vicinity of the person 16, a word having an attribute that maximizes the sum of the Levenshtein distance with the character string of the attribute of the other article (book) 24 is uttered. Select (determine) the word to use for the content. Here, the Levenshtein distance LD is calculated for each of the attribute type, color, and thickness with respect to another article, and the sum of the Levenshtein distance LD for the type, color, and thickness is calculated. Then, the word with the largest sum (words of type, color, and thickness) is selected as the word used for the utterance content.

なお、レーベンシュタイン距離ＬＤを求める方法は、図８（Ａ）および（Ｂ）を用いて説明したとおりであり、重複した説明は省略する。 Note that the method for obtaining the Levenshtein distance LD is as described with reference to FIGS. 8A and 8B, and redundant description is omitted.

図９は図４に示したメモリ２０４（ＲＡＭ）のメモリマップ５００の一例を示す図解図である。図９に示すように、ＲＡＭは、プログラム記憶領域５０２およびデータ記憶領域５０４を含む。 FIG. 9 is an illustrative view showing one example of a memory map 500 of the memory 204 (RAM) shown in FIG. As shown in FIG. 9, the RAM includes a program storage area 502 and a data storage area 504.

プログラム領域５０２には、サーバ２０の全体制御を実行するための情報処理プログラムが記憶され、この情報処理プログラムは、動作制御プログラム５０２ａ、音声認識プログラム５０２ｂおよび発話内容生成プログラム５０２ｃなどによって構成される。これらのプログラムは、一度に全部または必要に応じて部分的に、ＨＤＤから読み出され、ＲＡＭのプログラム記憶領域５０２に記憶される。ただし、プログラムは、図示しないＲＯＭに記憶しておき、そこから読み出してもよい。 An information processing program for executing overall control of the server 20 is stored in the program area 502, and this information processing program includes an operation control program 502a, a speech recognition program 502b, an utterance content generation program 502c, and the like. These programs are read from the HDD all at once or partially as necessary, and stored in the program storage area 502 of the RAM. However, the program may be stored in a ROM (not shown) and read from there.

動作制御プログラム５０２ａは、ロボット１２の指差し動作をなどの身体動作についての制御情報を算出し、ロボット１２に指示するためのプログラムである。音声認識プログラム５０２ｂは、ロボット１２から送信される音声信号に対応する音声を認識するためのプログラムである。発話内容生成プログラム５０２ｃは、確認行動における発話内容を生成するためのプログラムである。 The motion control program 502a is a program for calculating control information regarding body motion such as pointing motion of the robot 12 and instructing the robot 12. The voice recognition program 502b is a program for recognizing a voice corresponding to a voice signal transmitted from the robot 12. The utterance content generation program 502c is a program for generating the utterance content in the confirmation action.

図示は省略するが、プログラム記憶領域５０２には、人間の視線方向を検出するためのプログラムなどの他のプログラムも記憶される。 Although illustration is omitted, the program storage area 502 also stores other programs such as a program for detecting a human gaze direction.

また、データ記憶領域５０４には、物品ローカル辞書データ５０４ａ、音声認識ローカル辞書データ５０４ｂ、発話辞書データ５０４ｃおよび個人正誤情報データ５０４ｄなどが記憶される。さらに、データ記憶領域５０４には、辞書登録フラグ５０４ｅが設けられる。 The data storage area 504 stores article local dictionary data 504a, speech recognition local dictionary data 504b, utterance dictionary data 504c, personal correct / incorrect information data 504d, and the like. Further, the data storage area 504 is provided with a dictionary registration flag 504e.

物品ローカル辞書データ５０４ａは、物品辞書ＤＢ１２２に記憶されたオリジナルの物品辞書（グローバル辞書）から、ロボット１２から送信されたユーザＩＤで特定される人間１６を中心として所定の範囲内に存在する物品（本）２４のレコードを抽出した一部の物品辞書のデータである。音声認識ローカル辞書データ５０４ｂは、音声認識辞書ＤＢ１２６に記憶されたオリジナルの音声認識辞書（グローバル辞書）から、物品ローカル辞書データ５０４ａに対応する一部の物品辞書に登録された物品等を認識するために抽出した一部の音声認識辞書のデータである。 The article local dictionary data 504a is an article (within a predetermined range centered on the person 16 specified by the user ID transmitted from the robot 12 from the original article dictionary (global dictionary) stored in the article dictionary DB 122 ( Book) Data of some article dictionaries obtained by extracting 24 records. The voice recognition local dictionary data 504b recognizes articles and the like registered in a part of the article dictionary corresponding to the article local dictionary data 504a from the original voice recognition dictionary (global dictionary) stored in the voice recognition dictionary DB 126. This is the data of a part of voice recognition dictionary extracted.

発話辞書データ５０４ｃは、サーバ２０がロボット１２に、人間１６に対して発話させる音声の内容すなわち発話内容を生成するために必要な情報についてのデータである。 The utterance dictionary data 504c is data about information necessary for the server 20 to make the robot 12 speak the voice content to the human 16, that is, the utterance content.

この実施例では、物品の属性として、種類、色および厚みを含むようにしてあるため、発話内容は、次のような定型の文章で決定されている。たとえば、色と種類についての単語を使用する場合には、「○○の△△ですか？」という発話内容に決定されている。ただし、「○○」のところには、特定した物品Ｘの属性に記述された色（赤色、青色、黄色、茶色、白色、黒色など）についての単語が入り、「△△」のところには、特定した物品Ｘの属性に記述された種類（漫画、小説、雑誌など）の単語が入る。以下、同じ。 In this embodiment, since the type, color, and thickness are included as the attributes of the article, the utterance content is determined by the following fixed text. For example, when a word about color and type is used, the utterance content is “?? △△△?”. However, a word about the color (red, blue, yellow, brown, white, black, etc.) described in the attribute of the specified article X is entered at “XX”, and “△△” is entered at “△△”. A word of a type (manga, novel, magazine, etc.) described in the attribute of the specified article X is entered. same as below.

また、厚みと種類についての単語が使用される場合には、「××△△ですか？」という発話内容に決定されている。ただし、「××」のところには、特定した物品Ｘの属性に記述された厚み（厚い、薄い）の単語が入る。以下、同じ。 When words about thickness and type are used, the utterance content is “XXΔΔ?”. However, a word having a thickness (thick or thin) described in the attribute of the specified article X is entered at “XX”. same as below.

さらに、色と厚みについての単語が使用される場合には、「○○の××本ですか？」という発話内容に決定されている。ただし、この実施例では、物品として本２４を用いてを説明してあるため、「本」を発話するようにしてあるが、他の物品を用いる場合には、その普通名詞が用いられる。 Furthermore, when words about color and thickness are used, the utterance content is “XX book of XX?”. However, in this embodiment, since the book 24 is used as an article, the “book” is spoken. However, when other articles are used, the common noun is used.

個人正誤情報データ５０４ｄは、ユーザＩＤに対応して音声認識の成功率が記述されたテーブルについてのデータである。たとえば、音声認識の正誤は、コミュニケーションの回数（累計）における音声認識を成功した回数（特定した物品（本）２４が正しかった回数）の割合（パーセンテージ）で表される。 Personal correct / incorrect information data 504d is data on a table in which the success rate of voice recognition is described in correspondence with the user ID. For example, the correctness / incorrectness of voice recognition is represented by a ratio (percentage) of the number of times of successful voice recognition (the number of times the specified article (book) 24 is correct) in the number of communication (cumulative).

辞書登録フラグ５０４ｅは、確認行動における発話で使用することが決定され、物品Ｘを指示するための１つの属性についての単語と物品Ｘの普通名詞についての単語とで構成される語または複数の属性についての単語で構成される語（以下、これらを「特定語」という。）を音声認識辞書に登録するか否かを判断するためのフラグである。辞書登録フラグ５０４ｅは、１ビットのレジスタで構成されて、当該フラグがオンであれば、レジスタにデータ値「１」が設定され、当該フラグがオフであれば、レジスタにデータ値「０」が設定される。ただし、特定語を音声認識辞書に登録する場合に、辞書登録フラグ５０４ｅはオンされ、それを音声認識辞書に登録しない場合には、辞書登録フラグ５０４ｅはオフされる。 The dictionary registration flag 504e is determined to be used in the utterance in the confirmation action, and is a word or a plurality of attributes composed of a word for one attribute for indicating the item X and a word for a common noun of the item X This is a flag for determining whether or not words (hereinafter referred to as “specific words”) composed of the words about “” are registered in the speech recognition dictionary. The dictionary registration flag 504e is composed of a 1-bit register, and if the flag is on, the data value “1” is set in the register, and if the flag is off, the data value “0” is stored in the register. Is set. However, the dictionary registration flag 504e is turned on when a specific word is registered in the speech recognition dictionary, and the dictionary registration flag 504e is turned off when it is not registered in the speech recognition dictionary.

たとえば、確認行動の発話において、物品Ｘの種類（ここでは、“漫画”）と色（ここでは、“赤”）の単語が使用される場合には、特定語として“赤色の漫画”が登録される。同様に、物品Ｘの種類と厚み（ここでは“厚い”）の単語が使用される場合には、特定語として“厚い漫画”が登録される。また、確認行動の発話において、物品Ｘの色と厚みの単語が使用される場合には、さらに物品Ｘの普通名詞の単語が用いられ、特定語として“赤色の厚い本”が登録される。説明は省略するが、他の種類、色および厚みの単語が使用される場合についても同様である。 For example, if the word of the type of article X (here “manga”) and color (here “red”) is used in the utterance of the confirmation action, “red cartoon” is registered as the specific word Is done. Similarly, when a word of the type and thickness (here, “thick”) of the item X is used, “thick cartoon” is registered as the specific word. Further, when the word of the color and thickness of the article X is used in the utterance of the confirmation action, the word of the common noun of the article X is further used, and “red thick book” is registered as the specific word. Although the description is omitted, the same applies to the case where words of other types, colors, and thicknesses are used.

なお、図示は省略するが、データ記憶領域５０４には、情報処理プログラムの実行に必要な他のデータが記憶され、必要に応じて、カウンタ（タイマ）や他のフラグ等も設けられる。 Although not shown, the data storage area 504 stores other data necessary for executing the information processing program, and is provided with a counter (timer), other flags, and the like as necessary.

図１０は、図４に示したＣＰＵ２００の確認行動決定処理のフロー図である。図１０に示すように、ＣＰＵ２００は、確認行動決定処理を開始すると、ステップＳ１で、ロボット１２の位置と、特定した物品Ｘの位置とから指差し動作を生成する。 FIG. 10 is a flowchart of the confirmation action determination process of the CPU 200 shown in FIG. As illustrated in FIG. 10, when starting the confirmation action determination process, the CPU 200 generates a pointing operation from the position of the robot 12 and the position of the identified article X in step S1.

次のステップＳ３では、初対面のユーザであるかどうかを判断する。ここでは、ＣＰＵ２００は、ロボット１２が認識した人間１６とコミュニケーションをとるのがはじめてであるか否か、つまり、当該人間１６の指示する物品（本）２４を特定するのがはじめてであるか否かを判断する。具体的には、ＣＰＵ２００は、ロボット１２から送信されたユーザＩＤが、個人正誤情報データ５０４ｄに登録されているかどうかを判断する。 In the next step S3, it is determined whether or not the user is the first meeting. Here, whether or not the CPU 200 is the first to communicate with the human 16 recognized by the robot 12, that is, whether or not it is the first time to specify the article (book) 24 instructed by the human 16. Judging. Specifically, the CPU 200 determines whether or not the user ID transmitted from the robot 12 is registered in the personal correctness / error information data 504d.

ステップＳ３で“ＹＥＳ”であれば、つまり初対面のユーザであれば、ステップＳ５で、確認行動の発話内容に含む（発話に使用する）単語を、特定した物品Ｘの名称に決定して、ステップＳ１３に進む。一方、ステップＳ３で“ＮＯ”であれば、つまり初対面のユーザでなければ、ステップＳ７で、当該ユーザについての音声認識の成功率が７０％以上であるかどうかを判断する。ただし、ＣＰＵ２００は、個人正誤情報データ５０４ｄを参照して、ロボット１２から送信されたユーザＩＤに対応して記述された成功率を取得する。 If “YES” in the step S3, that is, if the user is a first-time user, in a step S5, the word included in the utterance content of the confirmation action (used for the utterance) is determined as the name of the specified article X, and the step Proceed to S13. On the other hand, if “NO” in the step S3, that is, if the user is not the first meeting user, in a step S7, it is determined whether or not the success rate of the speech recognition for the user is 70% or more. However, the CPU 200 refers to the personal correctness / error information data 504d and acquires the success rate described in correspondence with the user ID transmitted from the robot 12.

なお、ステップＳ７では、音声認識の成功率が高いか低いかを判断する閾値として７０％を設定してあるが、これに限定される必要はなく、閾値はこのシステム１０を適用する環境や使用形態等によって自由に変更することができる。 In step S7, 70% is set as a threshold for determining whether the success rate of speech recognition is high or low. However, the threshold is not limited to this, and the threshold is not limited to the environment in which the system 10 is used or used. It can be freely changed depending on the form.

ステップＳ７で“ＮＯ”であれば、つまり音声認識の成功率が７０％未満であれば、ステップＳ５に進む。一方、ステップＳ７で“ＹＥＳ”であれば、つまり音声認識の成功率が７０％以上であれば、ステップＳ９で、後述する単語選択処理（図１１参照）を実行する。そして、ステップＳ１１で、辞書登録フラグ５０４ｅをオンして、ステップＳ１３に進む。図示および説明は省略したが、ＣＰＵ２００は、確認行動決定処理を開始したときに、辞書登録フラグ５０４ｅをオフする。 If “NO” in the step S7, that is, if the success rate of the speech recognition is less than 70%, the process proceeds to a step S5. On the other hand, if “YES” in the step S7, that is, if the success rate of the speech recognition is 70% or more, a word selection process (see FIG. 11) described later is executed in a step S9. In step S11, the dictionary registration flag 504e is turned on, and the process proceeds to step S13. Although illustration and description are omitted, the CPU 200 turns off the dictionary registration flag 504e when the confirmation action determination process is started.

ステップＳ１３では、発話内容を生成する。ここでは、ＣＰＵ２００は、発話辞書データ５０４ｃを参照して、確認行動における発話内容を生成する。このとき、ステップＳ５で選択された物品Ｘの名称を示す単語またはステップＳ９で選択された属性についての単語が用いられる。 In step S13, the utterance content is generated. Here, the CPU 200 refers to the utterance dictionary data 504c and generates the utterance content in the confirmation action. At this time, a word indicating the name of the article X selected in step S5 or a word regarding the attribute selected in step S9 is used.

次のステップＳ１５では、辞書登録フラグ５０４ｅがオンであるかどうかを判断する。ステップＳ１５で“ＮＯ”であれば、つまり辞書登録フラグ５０４ｅがオフであれば、そのままステップＳ１９に進む。一方、ステップＳ１５で“ＹＥＳ”であれば、つまり辞書登録フラグ５０４ｅがオンであれば、ステップＳ１７で、生成された発話内容に含まれる特定語を、物品Ｘに対応して当該ユーザＩＤとともに音声認識ＤＢ１２６内の音声認識辞書データに記憶（登録ないし追加）して、ステップＳ１９に進む。 In the next step S15, it is determined whether the dictionary registration flag 504e is on. If “NO” in the step S15, that is, if the dictionary registration flag 504e is turned off, the process directly proceeds to a step S19. On the other hand, if “YES” in the step S15, that is, if the dictionary registration flag 504e is turned on, the specific word included in the generated utterance content is voiced together with the user ID corresponding to the article X in a step S17. The data is stored (registered or added) in the speech recognition dictionary data in the recognition DB 126, and the process proceeds to step S19.

したがって、これ以降に、当該ユーザＩＤで特定されるユーザが当該物品Ｘを指示する場合に特定語を用いると、当該特定語を音声認識することにより、対応する当該物品Ｘを特定することができる。したがって、円滑なコミュニケーションを図ることができる。このような結果をもたらすのは、上述したように、ロボット１２が発声した内容を人間が真似する傾向があるためである。 Therefore, if the specific word is used when the user specified by the user ID indicates the article X thereafter, the corresponding article X can be specified by voice recognition of the specific word. . Therefore, smooth communication can be achieved. Such a result is brought about because a person tends to imitate the contents uttered by the robot 12 as described above.

ステップＳ１９では、ステップＳ１で生成した指差し動作と、ステップＳ１３で決定した発話内容とをロボット１２に送信して、確認行動決定処理を終了する。これに応じて、ロボット１２では、サーバ２０から指示された指差し動作を実行するとともに、サーバ２０から指示された発話内容を発話（音声出力）する。つまり、ロボット１２は、特定した物品Ｘについての確認行動を実行する。そして、図示は省略するが、その後のロボット１２へのユーザの返答（“はい”または“いいえ”）に応じて、サーバ２０は、この返答の内容を音声認識することにより、個人正誤情報データ５０４ｄを更新する。 In step S19, the pointing action generated in step S1 and the utterance content determined in step S13 are transmitted to the robot 12, and the confirmation action determination process is terminated. In response to this, the robot 12 performs a pointing operation instructed from the server 20 and utters (speech outputs) the utterance content instructed from the server 20. That is, the robot 12 performs a confirmation action for the identified article X. Although illustration is omitted, the server 20 recognizes the content of the response by voice according to the user's response (“Yes” or “No”) to the robot 12 thereafter, so that the personal correct / incorrect information data 504d. Update.

なお、このような確認決定処理は、ロボット１２からの要求がある場合に実行され、ロボット１２からの要求が無い場合には、待機状態となっている。 Note that such confirmation determination processing is executed when there is a request from the robot 12, and when there is no request from the robot 12, it is in a standby state.

図１１は、図１０のステップＳ９に示した単語選択処理のフロー図である。図１１に示すように、ＣＰＵ２００は、単語選択処理を開始すると、ステップＳ３１で、特定した物品Ｘの近傍に他の物品が在るかどうかを判断する。ここでは、ＣＰＵ２００は、物品Ｘに装着された無線タグ１８から発信された電波を受信したアンテナ１２４と同じアンテナ１２４によって電波を受信された物品２４が在るかどうかを判断する。 FIG. 11 is a flowchart of the word selection process shown in step S9 of FIG. As shown in FIG. 11, when the word selection process is started, the CPU 200 determines whether or not there is another article in the vicinity of the specified article X in step S31. Here, the CPU 200 determines whether or not there is an article 24 that has received radio waves by the same antenna 124 as the antenna 124 that has received radio waves transmitted from the wireless tag 18 attached to the article X.

ステップＳ３１で“ＹＥＳ”であれば、つまり物品Ｘの近傍に他の物品２４が在る場合には、ステップＳ３３で、後述する物品Ｘの属性組を生成する処理（図１２参照）を実行し、ステップＳ３５で、後述する物品Ｘの近傍にある他の物品の属性組を生成する処理（図１３参照）を実行する。続いて、ステップＳ３７で、後述する物品Ｘの属性組を低減する処理（図１４参照）を実行し、ステップＳ３９で、物品Ｘの属性組から単語数が最小の属性組を抽出する。そして、ステップＳ４１で、後述する単語選択（１）処理（図１５参照）を実行して、確認行動決定処理にリターンする。 If “YES” in the step S31, that is, if there is another article 24 in the vicinity of the article X, a process (see FIG. 12) for generating an attribute set of the article X described later is executed in a step S33. In step S35, a process (see FIG. 13) for generating attribute sets of other articles in the vicinity of the article X, which will be described later, is executed. Subsequently, in step S37, processing for reducing the attribute group of the article X described later (see FIG. 14) is executed, and in step S39, the attribute group having the smallest number of words is extracted from the attribute group of the article X. In step S41, a word selection (1) process (see FIG. 15) described later is executed, and the process returns to the confirmation action determination process.

また、ステップＳ３１で“ＮＯ”であれば、つまり物品Ｘの近傍に他の物品２４が無い場合には、ステップＳ４３で、後述する物品Ｘの呼び名の候補を決定する処理（図１６参照）を実行する。次のステップＳ４５では、ユーザ１６の近傍に他の物品２４が在るかどうかを判断する。ここでは、ＣＰＵ２００は、特定されたユーザＩＤが示すユーザ１６に装着された無線タグ１８から発信された電波を受信したアンテナ１２４と同じアンテナ１２４によって電波を受信された物品２４が在るかどうかを判断する。 If “NO” in the step S31, that is, if there is no other article 24 in the vicinity of the article X, a process for determining a candidate for the name of the article X described later (see FIG. 16) is performed in a step S43. Run. In the next step S 45, it is determined whether there is another article 24 in the vicinity of the user 16. Here, the CPU 200 determines whether or not there is an article 24 that has received radio waves from the same antenna 124 that received the radio waves transmitted from the wireless tag 18 attached to the user 16 indicated by the identified user ID. to decide.

ステップＳ４５で“ＮＯ”であれば、つまりユーザの近傍に他の物品２４が無ければ、ステップＳ５３で、候補の呼び名から１つの単語を所定のルールで選択して、確認行動決定処理にリターンする。たとえば、ＣＰＵ２００は、ステップＳ５３では、呼び名の候補から１の単語をランダムに選択したり、予めシステム１０ないしサーバ２０の管理者等が決定した１の属性についての単語を選択したりする。 If “NO” in the step S45, that is, if there is no other article 24 in the vicinity of the user, in a step S53, one word is selected from the candidate names according to a predetermined rule, and the process returns to the confirmation action determining process. . For example, in step S53, the CPU 200 randomly selects one word from candidate names, or selects a word for one attribute determined in advance by the system 10 or the administrator of the server 20.

一方、ステップＳ４５で“ＹＥＳ”であれば、つまりユーザの近傍に他の物品２４が在れば、ステップＳ４７で、物品ローカル辞書データ５０４ａを参照して、近傍の他の物品２４の属性を検索する。ここでは、ＣＰＵ２００は、他の物品２４の物品ＩＤを検索する。続くステップＳ４９では、検索した物品ＩＤに対応して記載されたレコードから、当該他の物品２４の属性を取得する。以下、物品２４の属性を検索したり、属性を取得したりする場合について同様である。そして、ステップＳ５１で、後述する単語選択（２）処理（図１７参照）を実行して、確認行動決定処理にリターンする。 On the other hand, if “YES” in the step S45, that is, if there is another article 24 in the vicinity of the user, the attribute of the other article 24 in the vicinity is searched with reference to the article local dictionary data 504a in a step S47. To do. Here, the CPU 200 searches for the article ID of another article 24. In the subsequent step S49, the attribute of the other article 24 is acquired from the record described corresponding to the retrieved article ID. The same applies to the case where the attribute of the article 24 is searched or the attribute is acquired. In step S51, a word selection (2) process (see FIG. 17) described later is executed, and the process returns to the confirmation action determination process.

図１２に示すように、ＣＰＵ２００は、図１１のステップＳ３３に示した物品Ｘの属性組を生成する処理を開始すると、ステップＳ６１で、物品Ｘの属性を検索する。続くステップＳ６３では、物品Ｘの属性を取得する。 As shown in FIG. 12, when the CPU 200 starts the process for generating the attribute set of the article X shown in step S33 of FIG. 11, the CPU 200 searches for the attribute of the article X in step S61. In a succeeding step S63, the attribute of the article X is acquired.

続いて、ステップＳ６５では、属性のべき集合を計算する。ここでは、上述したように、物品Ｘの属性Ｆｃについてのべき集合Ｐｏｗｅｒ（Ｆｃ）が計算される。そして、ステップＳ６７で、べき集合Ｐｏｗｅｒ（Ｆｃ）を物品Ｘの属性組として生成して、単語選択処理にリターンする。 In step S65, a power set of attributes is calculated. Here, as described above, the power set Power (Fc) for the attribute Fc of the article X is calculated. In step S67, a power set Power (Fc) is generated as an attribute set of the article X, and the process returns to the word selection process.

図１３に示すように、ＣＰＵ２００は、図１１のステップＳ３５に示した近傍の他の物品の属性組を生成する処理を開始すると、ステップＳ７１で、他の物品Ｋ（Ｋは他の物品を識別するための自然数である。以下、同じ。）の属性を検索し、ステップＳ７３で、他の物品Ｋの属性を取得する。次のステップＳ７５では、属性のべき集合Ｐｏｗｅｒ（ＦＫ）を計算する。そして、ステップＳ７７で、べき集合Ｐｏｗｅｒ（ＦＫ）を他の物品Ｋの属性組として生成し、単語選択処理にリターンする。 As illustrated in FIG. 13, when the CPU 200 starts the process of generating the attribute set of another article in the vicinity shown in step S 35 in FIG. 11, in step S 71, another article K (K identifies another article). The same is true for the following), and the attributes of other articles K are acquired in step S73. In the next step S75, a power set Power (FK) of attributes is calculated. In step S77, a power set Power (FK) is generated as an attribute set of another article K, and the process returns to the word selection process.

なお、図１３に示す近傍の他の物品の属性組の生成処理は、他の物品毎に実行される。 It should be noted that the attribute group generation process for other articles in the vicinity shown in FIG. 13 is executed for each other article.

図１４に示すように、ＣＰＵ２００は、図１１のステップＳ３７に示した物品Ｘの属性組を低減する処理を開始すると、ステップＳ８１で、物品Ｘの属性組のべき集合Ｐｏｗｅｒ（Ｆｃ）と、他の物品Ｋの属性組のべき集合Ｐｏｗｅｒ（Ｋ）の共通要素を抽出する。ただし、他の物品Ｋが複数存在する場合には、他の物品Ｋ毎に共通要素が抽出される。次のステップＳ８３では、物品Ｘの属性組から共通要素を削除して、単語選択処理にリターンする。 As shown in FIG. 14, when the CPU 200 starts the process of reducing the attribute set of the article X shown in step S37 of FIG. 11, in step S81, the set Power (Fc) that should be the attribute set of the article X and the other The common element of the power set Power (K) of the attribute set of the article K is extracted. However, when there are a plurality of other articles K, common elements are extracted for each of the other articles K. In the next step S83, the common element is deleted from the attribute set of the article X, and the process returns to the word selection process.

図１５に示すように、ＣＰＵ２００は、図１１に示すステップＳ４１に示した単語選択（１）処理を開始すると、物品Ｘの属性組の文字列と、他の物品Ｋの属性組の文字列のレーベンシュタイン距離ＬＤの和を算出する。次のステップＳ９１では、レーベンシュタイン距離ＬＤの和が最大となる属性組を確認行動で使用する単語として選択して、単語選択処理にリターンする。 As shown in FIG. 15, when the CPU 200 starts the word selection (1) process shown in step S 41 shown in FIG. 11, the character string of the attribute group of the item X and the character string of the attribute group of the other item K are displayed. The sum of the Levenshtein distance LD is calculated. In the next step S91, the attribute group that maximizes the sum of the Levenshtein distances LD is selected as a word to be used in the confirmation action, and the process returns to the word selection process.

なお、物品Ｘの属性組の低減処理の結果、物品Ｘの属性組が一組しか残っていない場合には、単語選択（１）処理を実行するまでもなく、当該一組の属性組が確認行動で使用する単語として選択される。 As a result of the process of reducing the attribute group of the item X, if only one attribute group of the item X remains, the attribute group is confirmed without performing the word selection (1) process. Selected as a word to use in action.

図１６に示すように、ＣＰＵ２００は、図１１に示すステップＳ４３に示した物品Ｘの呼び名の候補を生成する処理を開始すると、ステップＳ１０１で、物品Ｘの属性を検索する。次のステップＳ１０３では、物品Ｘの属性（この実施例では、種類、色、厚みについての単語）を取得する。そして、ステップＳ１０５で、各属性を確認行動で使用する単語の候補として決定し、単語選択処理にリターンする。 As illustrated in FIG. 16, when the CPU 200 starts the process for generating the candidate name of the article X illustrated in step S 43 illustrated in FIG. 11, the CPU 200 searches for the attribute of the article X in step S 101. In the next step S103, the attribute of the article X (in this embodiment, a word about type, color, and thickness) is acquired. In step S105, each attribute is determined as a word candidate to be used in the confirmation action, and the process returns to the word selection process.

図１７に示すように、ＣＰＵ２００は、図１１に示すステップＳ５１に示した単語選択（２）処理を開始すると、ステップＳ１１１で、候補の属性の文字列と、ユーザの位置近傍の他の物品毎の属性の文字列のレーベンシュタイン距離ＬＤの和を計算する。ただし、他の物品Ｘ毎に、種類、色および厚みのそれぞれについてのレーベンシュタイン距離ＬＤが求められ、その和が計算される。そして、ステップＳ１１３では、レーベンシュタイン距離ＬＤの和が最大となる属性（種類、色または厚み）を確認行動で使用する単語として選択し、単語選択処理にリターンする。 As shown in FIG. 17, when the CPU 200 starts the word selection (2) process shown in step S51 shown in FIG. 11, in step S111, the character string of the candidate attribute and each other article in the vicinity of the user's position. The sum of the Levenshtein distance LD of the character string of the attribute is calculated. However, the Levenshtein distance LD for each type, color, and thickness is obtained for each of the other articles X, and the sum is calculated. In step S113, the attribute (type, color or thickness) that maximizes the sum of the Levenshtein distances LD is selected as a word to be used in the confirmation action, and the process returns to the word selection process.

この実施例によれば、特定した物品の近傍に存在する他の物品とは異なる属性組を抽出し、さらに、他の物品の属性組とレーベンシュタイン距離が最大の属性組を確認行動で使用する単語として選択するので、予め音声認識率を検出する必要が無く、確認行動で使用する単語を簡単に決定することができる。したがって、確認行動における発話内容を簡単に生成することができる。 According to this embodiment, an attribute set different from other articles existing in the vicinity of the specified article is extracted, and the attribute set having the maximum Levenshtein distance is used in the confirmation action. Since it selects as a word, it is not necessary to detect a speech recognition rate beforehand, and the word used by confirmation action can be determined easily. Therefore, the utterance content in the confirmation action can be easily generated.

また、この実施例によれば、抽出された属性組のうち、単語数が最小となる属性組を抽出するので、人間にとって真似し易い言葉を発話内容に含めることができる。 Further, according to this embodiment, since the attribute group having the smallest number of words is extracted from the extracted attribute groups, words that are easy to imitate for humans can be included in the utterance content.

さらに、この実施例によれば、他の物品とは異なる属性組を発話内容に含めるとともに、当該属性組で決定される言葉（特定語）を、音声認識辞書にユーザに対応して追加的に登録するので、それ以降において、特定語が使用されたとしても、音声認識により特定語が指示する物品を特定することができる。したがって、当該ユーザとのコミュニケーションを円滑に行うことができる。 Furthermore, according to this embodiment, an attribute set different from other articles is included in the utterance content, and words (specific words) determined by the attribute set are additionally added to the speech recognition dictionary corresponding to the user. Since the registration is performed thereafter, even if the specific word is used, the article indicated by the specific word can be specified by voice recognition. Therefore, communication with the user can be performed smoothly.

なお、この実施例では、サーバが音声認識処理および確認行動決定処理を実行するようにしたが、これらの処理をロボットで実行するようにしてもよい。かかる場合には、音声認識辞書データ（音声認識ローカル辞書データ）、物品辞書データ（物品ローカル辞書データ）、発話辞書データおよび個人正誤情報データをロボット内部のメモリやロボットがアクセス可能な外部メモリに記憶する必要がある。 In this embodiment, the server executes the voice recognition process and the confirmation action determination process. However, these processes may be executed by a robot. In such a case, speech recognition dictionary data (speech recognition local dictionary data), article dictionary data (article local dictionary data), utterance dictionary data, and personal errata information data are stored in a robot internal memory or an external memory accessible by the robot. There is a need to.

１０ …コミュニケーションシステム
１２ …コミュニケーションロボット
１４ …ネットワーク
１８ …無線タグ
２０ …サーバ
２４ …物品（本）
８０ …ＣＰＵ
１２０ …カメラ
１２４ …アンテナ
２００ …ＣＰＵ
２０８ …無線タグ読取装置 DESCRIPTION OF SYMBOLS 10 ... Communication system 12 ... Communication robot 14 ... Network 18 ... Wireless tag 20 ... Server 24 ... Goods (book)
80 ... CPU
120 ... Camera 124 ... Antenna 200 ... CPU
208... RFID tag reader

Claims

A communication system for identifying an article designated by a person by voice recognition and confirming by voice whether the identified article is designated by the person,
Storage means for storing a name of an article and a plurality of words related to the article;
First creation means for reading a plurality of words related to the identified article from the storage means and creating a first power set for the read words;
A second creation unit that reads a plurality of words related to other articles existing in the vicinity of the identified article from the storage unit and creates a second power set for the read words for each of the other articles;
Deleting means for deleting elements common between the first power set created by the first creating means and each of the second power sets created by the second creating means from the first power set;
As a result of deletion by the deletion unit, a first selection unit that selects an element having the smallest number of words among elements included in the first power set;
Of the elements selected by the first selection means, second selection means for selecting an element having the highest degree of dissimilarity with a character string of a plurality of words related to the other article, and the second selection means A communication system comprising utterance content generation means for generating utterance content when confirming by voice including a word included in a selected element.

A calculating means for calculating a Levenshtein distance between the character string of the first word included in the element selected by the first selecting means and each of the character strings of the plurality of second words related to the other article; Prepared,
2. The communication system according to claim 1, wherein the second selection unit selects an element including a first word having a maximum Levenshtein distance calculated by the calculation unit.

First determination means for determining whether or not the other article exists in the vicinity of the identified article;
Candidate determination means for determining each of a plurality of words related to the specified article as a candidate to be included in the utterance content when the first determination means determines that the other article does not exist in the vicinity of the specified article And third selection means for selecting a word having the highest dissimilarity with a character string of a plurality of words related to other articles existing in the vicinity of the human among the words determined by the candidate determination means. In addition,
The communication system according to claim 1, wherein the utterance content generation unit generates utterance content when confirming by voice including a word included in the element selected by the third selection unit.

Second determination means for determining whether or not there is another article in the vicinity of the human when the first determination means determines that the other article does not exist in the vicinity of the specified article; and And a fourth selecting means for selecting one word from the candidates determined by the candidate determining means according to a predetermined rule when it is determined by the determining means that no other article exists in the vicinity of the person. Prepared,
4. The communication system according to claim 3, wherein the utterance content generation unit generates utterance content when confirming by voice including a word included in the element selected by the fourth selection unit. 5.

It is used for a communication system that identifies an article designated by a person by voice recognition and confirms by voice whether the identified article is designated by the person, and generates utterance contents when confirming by voice. An utterance content generation device,
Storage means for storing a name of an article and a plurality of words related to the article;
First creation means for reading a plurality of words related to the identified article from the storage means and creating a first power set for the read words;
A second creation unit that reads a plurality of words related to other articles existing in the vicinity of the identified article from the storage unit and creates a second power set for the read words for each of the other articles;
Deleting means for deleting elements common between the first power set created by the first creating means and each of the second power sets created by the second creating means from the first power set;
As a result of deletion by the deletion unit, a first selection unit that selects an element having the smallest number of words among elements included in the first power set;
Of the elements selected by the first selection means, second selection means for selecting an element having the highest degree of dissimilarity with a character string of a plurality of words related to the other article, and the second selection means An utterance content generation device comprising utterance content generation means for generating an utterance content for confirmation by voice including a word included in a selected element.

Used for a communication system that identifies an article designated by a person by voice recognition and confirms by voice whether the identified article is designated by the person. The name of the article and a plurality of words related to the article An utterance content generation program of an utterance content generation device for generating an utterance content when confirming by voice,
In the processor of the utterance content generation device,
A first creation step of reading a plurality of words related to the identified article from the storage means and creating a first power set for the read words;
A second creation step of reading a plurality of words related to other articles existing in the vicinity of the identified article from the storage means and creating a second power set for the read words for each of the other articles;
Deleting a common element between the first power set created in the first creation step and each of the second power sets created in the second creation step, from the first power set;
A first selection step of selecting an element having the smallest number of words among elements included in the first power set as a result of deletion in the deletion step;
A second selection step of selecting an element having the highest degree of dissimilarity to a plurality of word strings related to the other article among the elements selected in the first selection step; and An utterance content generation program for executing an utterance content generation step for generating an utterance content when confirming by voice including a word included in a selected element.

Used for a communication system that identifies an article designated by a person by voice recognition and confirms by voice whether the identified article is designated by the person. The name of the article and a plurality of words related to the article Utterance content generation method of the utterance content generation device for generating the utterance content when confirming with voice,
The processor of the utterance content generation device,
(A) reading a plurality of words related to the specified article from the storage means, creating a first power set for the read words;
(B) A plurality of words related to other articles existing in the vicinity of the identified article are read from the storage unit, and a second power set for the read words is created for each of the other articles,
(C) Delete elements common between the first power set created in step (a) and the second power set created in step (b) from the first power set. ,
(D) selecting an element having the smallest number of words among elements included in the first power set as a result of deletion in the step (c);
(E) Of the elements selected in the step (d), select an element having the highest degree of dissimilarity with a character string of a plurality of words related to the other article; and (f) the step (e The utterance content generation method of generating the utterance content when confirming by voice including the word included in the element selected in (1).