JP7317403B2

JP7317403B2 - Processing device, processing method, and program

Info

Publication number: JP7317403B2
Application number: JP2022014985A
Authority: JP
Inventors: 珠実中野
Original assignee: Japan Science and Technology Agency
Current assignee: Japan Science and Technology Agency
Priority date: 2018-03-22
Filing date: 2022-02-02
Publication date: 2023-07-31
Anticipated expiration: 2038-03-22
Also published as: JP2022063279A

Description

本発明は、人と対話装置とのコミュニケーションを支援する技術に関する。 The present invention relates to technology for supporting communication between a person and an interactive device.

人と対話装置との疑似的なコミュニケーション（以下、単に「コミュニケーション」という。）を実現する技術が、従来から提案されている。特許文献１は、対話型の玩具を開示している。また、目の瞬きを模した動作を行う対話装置が提案されている。特許文献２は、ＣＧキャラクタタの目をあらかじめ設定される瞬きの間隔により開閉させることを開示している。特許文献３は、頭の頷き動作のタイミングを起点とし、経時的に指数分布させたタイミングで瞬き動作をするロボットを開示している。なお、話し手及び聞き手の瞬きに関して、本件の発明者らによって、非特許文献１及び非特許文献２に以下の事項が開示されている。非特許文献１は、話し手と聞き手の瞬きが時間遅れで同期することを開示している。また、非特許文献１は、話し手の瞬きは、話の終わり掛け、又は発話の合間で増加することを開示している。非特許文献２は、ロボットである話し手と人間である聞き手との間で、瞬きが時間遅れで同期することを開示している。 Techniques for realizing pseudo-communication (hereinafter simply referred to as "communication") between a person and an interactive device have been conventionally proposed. US Pat. No. 5,300,000 discloses an interactive toy. Also, an interactive device has been proposed that simulates the blinking of an eye. Patent Literature 2 discloses opening and closing the eyes of a CG character at preset intervals between blinks. Patent Literature 3 discloses a robot that starts from the timing of a nodding motion of the head and performs a blinking motion at timings that are exponentially distributed over time. Regarding the blinking of the speaker and the listener, the following items are disclosed in Non-Patent Document 1 and Non-Patent Document 2 by the inventors of this case. Non-Patent Document 1 discloses that the blinks of the speaker and the listener are synchronized with a time delay. In addition, Non-Patent Document 1 discloses that a speaker's blinking increases near the end of a talk or between utterances. Non-Patent Document 2 discloses the time-delayed synchronization of blinks between a robot speaker and a human listener.

特表２０１０－５２５８４８号公報Japanese Patent Publication No. 2010-525848 特許第５６３９４４０号Patent No. 5639440 特開２０００－３４９９２０号公報JP-A-2000-349920

Tamami Nakano and Shigeru Kitazawa、" Eyeblink entrainment at breakpoints of speech"、 Experimental Brain Research、205(4)、 p.577-81、［online］、［平成３０年３月１２日検索］、インターネット＜ＵＲＬ：https://www.ncbi.nlm.nih.gov/pubmed/20700731＞、（２０１０年）Tamami Nakano and Shigeru Kitazawa, "Eyeblink entrainment at breakpoints of speech", Experimental Brain Research, 205(4), p.577-81, [online], [searched March 12, 2018], Internet <URL: https ://www.ncbi.nlm.nih.gov/pubmed/20700731>, (2010) Kyohei Tatsukawa, Tamami Nakano, Hiroshi Ishiguro and Yuichiro Yoshikawa、" Eyeblink Synchrony in Multimodal Human-Android Interaction"、 Scientific Reports、6:39718、［online］、［平成３０年３月１２日検索］、インターネット＜ＵＲＬ：https://www.nature.com/articles/srep39718＞、（２０１６年）Kyohei Tatsukawa, Tamami Nakano, Hiroshi Ishiguro and Yuichiro Yoshikawa, "Eyeblink Synchrony in Multimodal Human-Android Interaction", Scientific Reports, 6:39718, [online], [searched March 12, 2018], Internet <URL: https ://www.nature.com/articles/srep39718>, (2016)

ユーザと対話装置との間のコミュニケーションとは無関係に、対話装置が瞬き動作をしても、コミュニケーションの質の向上に寄与することが難しい場合がある。 Regardless of the communication between the user and the interactive device, even if the interactive device blinks, it may be difficult to contribute to improving the quality of communication.

そこで、本発明は、瞬き動作を用いて対話装置とユーザとのコミュニケーションを支援することを目的とする。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to support communication between an interactive device and a user by using blinking motions.

本発明の一実施形態は、対話装置の周辺の環境を示す環境情報を取得する環境情報取得部と、前記環境情報に基づいて前記対話装置が行う瞬き動作の頻度を制御する瞬き動作制御部と、を有する処理装置を提供する。
上記処理装置において、前記瞬き動作制御部は、前記対話装置の周辺の環境と、前記環境の下で人間が行う瞬きとの関係を学習し、当該関係に基づいて前記対話装置に瞬き動作をさせてもよい。
上記処理装置において、前記環境情報は、ユーザから前記対話装置に対する対話情報であってもよい。
上記処理装置において、前記瞬き動作制御部は、前記環境情報が所定の条件をみたすとき、前記対話装置が行う瞬き動作の頻度を高くしてもよい。
上記処理装置において、前記瞬き動作制御部は、前記学習結果にもとづいて前記対話情報が意外性の高い話題になったと判断したときには瞬き動作の頻度を高くしてもよい。
本発明の一実施形態は、コンピュータが実行する処理方法であって、対話装置の周辺の環境を示す環境情報を取得し、前記環境情報に基づいて前記対話装置が行う瞬き動作の頻度を制御する処理方法を提供する。
本発明の一実施形態は、コンピュータに、対話装置の周辺の環境を示す環境情報を取得し、前記環境情報に基づいて前記対話装置が行う瞬き動作の頻度を制御することを実現させるためのプログラムを提供する。
本発明の一実施形態は、特定の期間または場面における対話装置による第１瞬き動作回数を取得する第１取得部と、前記特定の期間における前記対話装置のユーザによる第２瞬き動作回数を取得する第２取得部と、所定の期間における前記第１瞬き動作回数および前記第２瞬き動作回数に応じた対話処理を行う処理部と、を有する処理装置を提供する。
上記処理装置において、前記処理部は、前記特定の期間または場面における前記第１瞬き動作回数および前記第２瞬き動作回数に基づいた評価値を算出し、前記評価値と所定の閾値との関係に基づいて対話処理を行ってもよい。
上記処理装置において、前記評価値が所定の閾値未満であるとき、前記処理部は、現在の話題を変更してもよい。
上記処理装置において、前記評価値が所定の閾値未満であるとき、前記処理部は、対話処理を終了してもよい。
上記処理装置において、前記評価値が所定の閾値以上であるとき、前記処理部は、現在の話題を継続してもよい。
本発明の一実施形態は、コンピュータが実行する処理方法であって、特定の期間または場面における対話装置による第１瞬き動作回数を取得し、前記特定の期間における前記対話装置のユーザによる第２瞬き動作回数を取得し、所定の期間における前記第１瞬き動作回数および前記第２瞬き動作回数に応じた対話処理を行う、処理方法を提供する。
本発明の一実施形態は、コンピュータに、特定の期間または場面における対話装置による第１瞬き動作回数を取得し、前記特定の期間における前記対話装置のユーザによる第２瞬き動作回数を取得し、所定の期間における前記第１瞬き動作回数および前記第２瞬き動作回数に応じた対話処理を行う、ことを実現させるためのプログラムを提供する。
本発明の一実施形態は、対話装置の瞬き動作のタイミングを取得する第１取得部と、前記対話装置のユーザの瞬きのタイミングを取得する第２取得部と、前記瞬き動作のタイミングと前記ユーザの瞬きのタイミングとの差異に応じた処理を行う処理部と、を有する処理装置を提供する。
An embodiment of the present invention comprises an environment information acquisition unit that acquires environment information indicating the environment around an interaction device, and a blinking action control unit that controls the frequency of blinking actions performed by the interaction device based on the environment information. A processing apparatus is provided having:
In the above processing device, the blinking motion control unit learns the relationship between the surrounding environment of the interactive device and the blinking performed by a human under the environment, and causes the interactive device to perform the blinking motion based on the relationship. may
In the above processing device, the environment information may be dialogue information from a user to the dialogue device.
In the above processing device, the blinking motion control section may increase the frequency of blinking motions performed by the interactive device when the environment information satisfies a predetermined condition.
In the above processing device, the blinking motion control unit may increase the frequency of blinking motions when determining that the dialogue information has become a highly unexpected topic based on the learning result.
An embodiment of the present invention is a processing method executed by a computer, in which environmental information indicating the environment around an interactive device is obtained, and the frequency of blinking performed by the interactive device is controlled based on the environmental information. Provide processing methods.
An embodiment of the present invention is a program for causing a computer to obtain environmental information indicating the environment around an interactive device and to control the frequency of blinking actions performed by the interactive device based on the environmental information. I will provide a.
An embodiment of the present invention comprises: a first acquisition unit that acquires the number of first blinking actions performed by an interactive device during a specific period or scene; A processing device is provided that includes a second acquisition unit and a processing unit that performs interactive processing according to the number of first blinking operations and the number of second blinking operations in a predetermined period.
In the above processing device, the processing unit calculates an evaluation value based on the number of times of the first blinking action and the number of times of the second blinking action in the specific period or scene, and determines the relationship between the evaluation value and a predetermined threshold value. You may perform interactive processing based on.
In the above processing device, the processing unit may change the current topic when the evaluation value is less than a predetermined threshold.
In the above processing device, the processing unit may end the interactive processing when the evaluation value is less than a predetermined threshold.
In the above processing device, the processing unit may continue the current topic when the evaluation value is equal to or greater than a predetermined threshold.
An embodiment of the present invention is a computer-implemented processing method, which obtains the number of first blinking actions by an interactive device in a specific period or scene, Provided is a processing method that acquires the number of operations and performs interactive processing according to the number of times of the first blinking operation and the number of times of the second blinking operation in a predetermined period.
According to an embodiment of the present invention, a computer acquires a first number of blinking actions by an interactive device in a specific period or scene, acquires a second number of blinking actions by a user of the interactive device in the specific period, and obtains a predetermined number of blinking actions. A program is provided for realizing interactive processing according to the number of times of the first blinking action and the number of times of the second blinking action in the period of .
An embodiment of the present invention comprises: a first acquisition unit that acquires a blinking timing of a dialogue device; a second acquisition unit that acquires a blinking timing of a user of the dialogue device; and a processing unit that performs processing according to the difference from the timing of blinking.

上記処理装置において、前記処理部は、前記差異に基づく指標値に応じた処理を行って
もよい。 In the above processing device, the processing unit may perform processing according to the index value based on the difference.

上記処理装置において、前記処理部は、前記ユーザの瞬きのタイミングが前記瞬き動作のタイミングに応じた所定期間内に含まれる度合いに応じた処理を行ってもよい。 In the above processing device, the processing unit may perform processing according to the degree to which the blinking timing of the user is included within a predetermined period corresponding to the timing of the blinking motion.

上記処理装置において、前記所定期間は、前記瞬き動作のタイミングから５００ミリ秒以下の時点を含んでもよい。 In the above processing device, the predetermined period of time may include a time point of 500 milliseconds or less from the timing of the blinking action.

上記処理装置において、前記処理部は、前記ユーザの瞬きのタイミング及び前記瞬き動作のタイミングを所定の時間軸上に時刻順に並べた第１データにおける前記度合いと、前記ユーザの瞬きのタイミング及び前記瞬き動作のタイミングの少なくとも一方の順番を変更した第２データにおける前記度合いとに応じて、前記処理を行ってもよい。 In the above processing device, the processing unit stores the degree in first data in which the user's blink timing and the blink action timing are arranged in chronological order on a predetermined time axis, the user's blink timing and the blink. The processing may be performed according to the degree in the second data in which the order of at least one of the operation timings is changed.

上記処理装置において、前記処理部は、前記対話装置に前記差異に応じた対話処理を行わせてもよい。 In the above processing device, the processing unit may cause the dialogue device to perform dialogue processing according to the difference.

上記処理装置において、前記処理部は、前記対話装置の識別子と対応付けて、前記差異に応じた評価データを出力してもよい。 In the above processing device, the processing unit may output evaluation data corresponding to the difference in association with an identifier of the interactive device.

上記処理装置において、前記対話装置の周辺の環境を示す環境情報を取得する環境情報取得部と、前記環境情報に応じた第１タイミングに、前記対話装置に瞬き動作をさせる瞬き動作制御部と、を有してもよい。 In the above processing device, an environment information acquisition unit that acquires environment information indicating an environment around the interaction device; a blinking operation control unit that causes the interaction device to perform a blinking operation at a first timing according to the environment information; may have

上記処理装置において、前記ユーザが瞬きをするタイミングと前記環境とを対応付けたデータを記憶装置に蓄積させる記憶制御部を有し、前記瞬き動作制御部は、前記第１タイミングを、前記記憶装置に蓄積されたデータと前記環境情報とに応じたタイミングとしてもよい。 In the above processing device, a storage control unit is provided for accumulating in a storage device data that associates the timing at which the user blinks with the environment, and the blinking operation control unit stores the first timing in the storage device. The timing may be determined according to the data accumulated in the storage and the environmental information.

上記処理装置において、前記瞬き動作制御部は、さらに、前記第１タイミングとは異なる第２タイミングに前記対話装置に瞬き動作をさせてもよい。 In the processing device described above, the blinking motion control section may cause the interactive device to perform a blinking motion at a second timing different from the first timing.

上記処理装置において、目の瞼に相当する瞼部と、前記瞼部を開閉させることにより、前記瞼部の瞬き動作を制御する瞬き動作制御部と、を有し、前記第１取得部は、前記瞼部の瞬き動作のタイミングを取得してもよい。 The processing device described above includes an eyelid portion corresponding to an eyelid, and a blinking motion control portion that controls a blinking motion of the eyelid portion by opening and closing the eyelid portion, wherein the first obtaining portion comprises: The timing of the blinking motion of the eyelid may be acquired.

上記処理装置において、表示部と、前記表示部に表示されたオブジェクトの瞬き動作を制御する瞬き動作制御部と、を有し、前記第１取得部は、前記オブジェクトの瞬き動作のタイミングを取得してもよい。 The above processing device has a display unit and a blinking action control unit that controls a blinking action of an object displayed on the display unit, and the first acquisition unit acquires the timing of the blinking action of the object. may

本発明の一実施形態は、対話装置の瞬き動作のタイミング、及び前記対話装置のユーザの瞬きのタイミングを取得し、前記瞬き動作のタイミングと前記ユーザの瞬きのタイミングとの差異に応じた処理を行う、処理方法を提供する。 An embodiment of the present invention acquires the timing of a blinking action of an interactive device and the timing of a blinking of a user of the interactive device, and performs processing according to the difference between the timing of the blinking action and the timing of the user's blinking. to do, to provide a treatment method.

本発明の一実施形態は、コンピュータに、対話装置の瞬き動作のタイミング、及び前記対話装置のユーザの瞬きのタイミングを取得し、前記瞬き動作のタイミングと前記ユーザの瞬きのタイミングとの差異に応じた処理を行う機能を実現させるためのプログラムを提供する。 In one embodiment of the present invention, a computer acquires the timing of a blinking action of an interactive device and the timing of a blinking of a user of the interactive device, and according to the difference between the timing of the blinking action and the timing of the user's blinking, Provide a program for realizing the function to perform the processing.

本発明によれば、瞬き動作を用いて対話装置とユーザとのコミュニケーションを支援することができる。 According to the present invention, it is possible to support communication between the interactive device and the user by using blinking motions.

本発明の第１実施形態である対話装置の外観構成の一例を示す図である。It is a figure showing an example of the appearance composition of the interactive device which is a 1st embodiment of the present invention. 本発明の第１実施形態である対話装置が行う瞬き動作を説明する図である。FIG. 4 is a diagram illustrating a blinking action performed by the interactive device according to the first embodiment of the present invention; 本発明の第１実施形態である対話装置のハードウェア構成を示すブロック図である。1 is a block diagram showing the hardware configuration of a dialogue device that is a first embodiment of the present invention; FIG. 本発明の第１実施形態である対話装置の機能構成を示すブロック図である。1 is a block diagram showing the functional configuration of a dialogue device that is a first embodiment of the present invention; FIG. 本発明の第１実施形態の対話装置が実行する処理を示すフローチャートである。4 is a flow chart showing processing executed by the interactive device according to the first embodiment of the present invention; 本発明の第１実施形態の指標算出処理を示すフローチャートである。4 is a flowchart showing index calculation processing according to the first embodiment of the present invention; 本発明の第１実施形態のタイミング差の算出方法を説明する図である。It is a figure explaining the calculation method of the timing difference of 1st Embodiment of this invention. 本発明の第１実施形態のタイミング差の出現頻度の分布を示すグラフを示す。4 shows a graph showing the distribution of appearance frequencies of timing differences according to the first embodiment of the present invention; 本発明の第１実施形態のランダムデータの一例を示す図である。It is a figure which shows an example of the random data of 1st Embodiment of this invention. 本発明の第１実施形態の評価値の一例を示す図である。It is a figure which shows an example of the evaluation value of 1st Embodiment of this invention. 本発明の第１実施形態の検証に係る聞き手の瞬きの頻度の分布を示すグラフである。5 is a graph showing the frequency distribution of listener's blinks according to the verification of the first embodiment of the present invention. 本発明の第１実施形態の検証に係る回答結果毎に聞き手の瞬きの頻度の分布を示すグラフである。FIG. 10 is a graph showing the frequency distribution of the listener's blinks for each answer result related to the verification of the first embodiment of the present invention; FIG. 本発明の第１実施形態の検証に係る評価値を、性別及び商品別に示したグラフである。It is the graph which showed the evaluation value which concerns on the verification of 1st Embodiment of this invention according to sex and goods. 本発明の第１実施形態の検証に係る商品への関心度を、性別及び商品別に示したグラフである。FIG. 10 is a graph showing the degree of interest in products according to the verification of the first embodiment of the present invention by sex and product; FIG. 本発明の第２実施形態である対話装置の機能構成を示すブロック図である。FIG. 4 is a block diagram showing the functional configuration of a dialogue device that is a second embodiment of the present invention; 本発明の第２実施形態の対話装置が実行する学習処理を示すフローチャートである。9 is a flow chart showing learning processing executed by the interactive device according to the second embodiment of the present invention; 本発明の第２実施形態の対話装置が実行する瞬き動作に関する処理を示すフローチャートである。9 is a flow chart showing processing related to a blinking action executed by the interactive device according to the second embodiment of the present invention; 本発明の第３実施形態である対話装置のハードウェア構成を示すブロック図である。FIG. 11 is a block diagram showing the hardware configuration of a dialogue device that is a third embodiment of the present invention; 本発明の第３実施形態である対話装置の機能構成を示すブロック図である。FIG. 11 is a block diagram showing the functional configuration of a dialogue device that is a third embodiment of the present invention; 本発明の第３実施形態の対話装置が実行する処理を示すフローチャートである。FIG. 11 is a flow chart showing processing executed by a dialogue device according to a third embodiment of the present invention; FIG. 本発明の一実施形態の処理装置の機能構成を示すブロック図である。It is a block diagram showing a functional configuration of a processing device of one embodiment of the present invention.

以下、本発明の一実施形態について、図面を参照しながら詳細に説明する。以下に示す実施形態は本発明の実施形態の一例であって、本発明はこれらの実施形態に限定されるものではない。なお、本実施形態で参照する図面において、同一部分又は同様な機能を有する部分には同一の符号又は類似の符号（数字の後にＡ、Ｂなどを付しただけの符号）を付し、その繰り返しの説明は省略する場合がある。 An embodiment of the present invention will be described in detail below with reference to the drawings. The embodiments shown below are examples of embodiments of the present invention, and the present invention is not limited to these embodiments. In the drawings referred to in this embodiment, the same parts or parts having similar functions are denoted by the same reference numerals or similar reference numerals (reference numerals followed by A, B, etc.). may be omitted.

発明者は、話し手及び聞き手の瞬きのタイミングの差異を、当該話し手及び聞き手が行うコミュニケーションの質の評価に用いることができる、という知見を得た。例えば、話し手及び聞き手の瞬きのタイミングの一致度合いが高い場合、聞き手は話し手の発話に対して高い関心を示していると推測することができる。反対に、その一致度合いが低い場合、聞き手は話し手の発話に対してさほど関心を示していないと推測することができる。このような知見を得るに至った検証については、後述する。以下、対話装置とそのユーザとのコミュニケーションを実現する技術に、このような知見を適用した場合の実施の形態を説明する。 The inventors have found that the difference in blink timing between the speaker and the listener can be used to evaluate the quality of communication performed by the speaker and the listener. For example, when the blink timings of the speaker and the listener match to a high degree, it can be inferred that the listener shows a high degree of interest in the speaker's utterance. Conversely, if the degree of agreement is low, it can be inferred that the listener does not show much interest in the speaker's utterance. The verification that led to such findings will be described later. An embodiment in which such knowledge is applied to a technique for realizing communication between an interactive device and its user will be described below.

［第１実施形態］
図１は、本発明の第１実施形態である対話装置１０の外観構成の一例を示す図である。対話装置１０は、ユーザＵと対話する処理装置である。ユーザＵは、対話装置１０のユーザである。ユーザＵは、対話装置１０と対面し、対話装置１０と対話によるコミュニケーションをとる。 [First embodiment]
FIG. 1 is a diagram showing an example of the external configuration of a dialogue device 10 that is the first embodiment of the present invention. The interaction device 10 is a processing device that interacts with the user U. FIG. A user U is a user of the interactive device 10 . A user U faces the dialogue device 10 and communicates with the dialogue device 10 through dialogue.

対話装置１０は、生物を模した外観を有するロボットである。対話装置１０は、例えば、人間若しくはその他の動物（例えば犬又は猫）、又は架空の人物（例えば、アニメーションの登場人物）を模した外観を有する。対話装置１０の外観については、問わない。 The interactive device 10 is a robot having an appearance imitating a living thing. The interactive device 10 has, for example, an appearance that resembles a human or other animal (eg, dog or cat), or a fictional character (eg, animation character). The appearance of the interactive device 10 does not matter.

対話装置１０は、顔部１０１と、瞼部１０２とを有する。顔部１０１は、顔に相当する部位である、瞼部１０２は、顔部１０１に配置され、目の瞼に相当する部位である。瞼部１０２は、開閉することにより、目の瞬きを模した動作（以下「瞬き動作」という。）をする。本実施形態では、２つの瞼部１０２が同じ動作をする。 The interactive device 10 has a face portion 101 and an eyelid portion 102 . A facial portion 101 is a portion corresponding to the face, and an eyelid portion 102 is arranged on the facial portion 101 and is a portion corresponding to eyelids. By opening and closing the eyelid part 102, the eyelid part 102 performs an action simulating the blinking of the eye (hereinafter referred to as "blinking action"). In this embodiment, the two eyelids 102 perform the same action.

図２は、対話装置１０の瞬き動作を説明する図である。対話装置１０は、平常時において、瞼部１０２を開状態とする。対話装置１０は、瞬き動作をするタイミングで、瞼部１０２を開状態から閉状態に遷移させ（矢印Ａ１）、閉状態から開状態に遷移させる（矢印Ａ２）。瞼部１０２の開状態から閉状態、及び閉状態から開状態の遷移のタイミングは、瞼部１０２が目の瞬きを模した動作をするようにあらかじめ設定されている。 FIG. 2 is a diagram for explaining the blinking action of the interactive device 10. As shown in FIG. The interactive device 10 normally opens the eyelid portion 102 . The interactive device 10 transitions the eyelid 102 from the open state to the closed state (arrow A1) and from the closed state to the open state (arrow A2) at the timing of blinking. The timing of transition from the open state to the closed state and from the closed state to the open state of the eyelid portion 102 is set in advance so that the eyelid portion 102 behaves like a blink of an eye.

対話装置１０は、口、鼻、及びその他の部位を顔部１０１に有してもよい。対話装置１０は、さらに、顔部１０１に配置されたこれらの各部位を動作させてもよいが、本実施形態では説明を省略する。 The interactive device 10 may have a mouth, nose and other parts on the face 101 . The interactive device 10 may also move these parts arranged on the face 101, but the description is omitted in this embodiment.

対話装置１０の設置場所、及び用途は特に問わない。対話装置１０は、例えば商業施設（例えば店舗）、公共施設又はその他の施設に設置される。この場合、ユーザＵは、当該施設の利用者である。対話装置１０は、医療用、玩具又はその他の用途で用いられてもよい。 The installation location and use of the interactive device 10 are not particularly limited. The interactive device 10 is installed, for example, in commercial facilities (for example, shops), public facilities, or other facilities. In this case, user U is a user of the facility. The interactive device 10 may be used in medical applications, toys, or other applications.

図３は、対話装置１０のハードウェア構成を示すブロック図である。対話装置１０は、制御部１１と、音声入力部１２と、音声出力部１３と、記憶部１４と、撮像部１５と、瞼部１０２とを有する。制御部１１は、対話装置１０の各部を制御する。制御部１１は、例えば、ＣＰＵで例示される演算処理装置、及びメモリを含む。メモリは、例えば、演算処理装置がワークエリアとして使用するＲＡＭ、及び制御プログラムを記憶するＲＯＭを含む。 FIG. 3 is a block diagram showing the hardware configuration of the interactive device 10. As shown in FIG. The interactive device 10 has a control unit 11 , a voice input unit 12 , a voice output unit 13 , a storage unit 14 , an imaging unit 15 and an eyelid unit 102 . The control unit 11 controls each unit of the interactive device 10 . The control unit 11 includes, for example, an arithmetic processing unit exemplified by a CPU, and a memory. The memory includes, for example, a RAM used as a work area by the arithmetic processing unit, and a ROM for storing control programs.

音声入力部１２は、音声の入力を受け付ける。音声入力部１２は、入力を受け付けた音声を音声信号に変換して、制御部１１に供給する。音声入力部１２は、例えば、マイクロフォン、Ａ（Ａｎａｌｏｇ）／Ｄ（Ｄｉｇｉｔａｌ）変換回路、及びフィルタを含む。 The voice input unit 12 receives input of voice. The voice input unit 12 converts the received voice into a voice signal and supplies the voice signal to the control unit 11 . The voice input unit 12 includes, for example, a microphone, an A (Analog)/D (Digital) conversion circuit, and a filter.

音声出力部１３は、音声を出力する。音声出力部１３は、制御部１１から供給された音声信号から変換した音を出力する。音声出力部１３は、例えば、Ｄ／Ａ変換回路、及びスピーカを含む。 The audio output unit 13 outputs audio. The audio output unit 13 outputs sound converted from the audio signal supplied from the control unit 11 . The audio output unit 13 includes, for example, a D/A conversion circuit and a speaker.

記憶部１４は、データを記憶する。記憶部１４は、例えば、プログラム１４１、及び対
話データ１４２を記憶する。プログラム１４１は、制御部１１に所定の機能を実現させるためのプログラムである。 The storage unit 14 stores data. The storage unit 14 stores, for example, a program 141 and dialogue data 142 . The program 141 is a program for causing the control unit 11 to implement a predetermined function.

対話データ１４２は、対話装置１０がユーザＵと対話するためのデータである。対話データ１４２は、例えば、入力データと出力データとを対応付けたデータを複数記憶している。入力データは、ユーザＵが発話すると想定される発話の内容を文字列で表したデータである。出力音声は、当該発話に対する応答の内容を文字列で表したデータである。例えば、入力データが「名前は？」である場合、当該入力データに対応付けられる出力データは「私の名前はＸＸです。」（「ＸＸ」は、対話装置１０の名称）である。 The dialogue data 142 is data for the dialogue device 10 to dialogue with the user U. FIG. The dialog data 142 stores, for example, a plurality of data in which input data and output data are associated with each other. The input data is data in which the content of an utterance assumed to be uttered by the user U is represented by a character string. The output voice is data representing the content of the response to the utterance in a character string. For example, when the input data is "What is your name?", the output data associated with the input data is "My name is XX" ("XX" is the name of the interactive device 10).

対話データ１４２は、話題を識別する識別子を、入力データ及び出力データに対応付けて含んでもよい。例えば、サッカーに関する発話に用いられる入力データ及び出力データには、第１識別子「ＩＤ００１」が対応付けられる。食事に関する発話に用いられる入力データ及び出力データには、第２識別子「ＩＤ００２」が対応付けられる。 Dialogue data 142 may include identifiers that identify topics in association with input data and output data. For example, the first identifier "ID001" is associated with input data and output data used for utterances related to soccer. The second identifier “ID002” is associated with the input data and output data used for utterances regarding meals.

なお、対話データ１４２は、別の形式のデータであってもよい。記憶部１４は、例えば、光学式記録媒体、磁気記録媒体、及び半導体記録媒体で例示される任意の形式の記録媒体（記憶装置）を含みうる。 It should be noted that the dialogue data 142 may be data in another format. The storage unit 14 can include any type of recording medium (storage device) exemplified by, for example, an optical recording medium, a magnetic recording medium, and a semiconductor recording medium.

撮像部１５は、被写体を撮像し、撮像した画像を示す撮像データを生成する。被写体は、ユーザＵである。撮像部１５は、例えば、ＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）イメージセンサで例示される撮像素子、及びレンズを含む。撮像部１５のレンズは、例えば顔部１０１における瞼部１０２の近傍に設けられるが、顔部１０１におけるその他の位置、又は顔部１０１以外の位置に設けられてもよい。 The imaging unit 15 captures an image of a subject and generates imaging data representing the captured image. The subject is the user U. The imaging unit 15 includes, for example, an imaging element exemplified by a CCD (Charge Coupled Device) image sensor, and a lens. The lens of the imaging unit 15 is provided, for example, near the eyelid 102 on the face 101 , but may be provided at another position on the face 101 or at a position other than the face 101 .

瞼部１０２は、制御部１１の制御に応じて開閉する。瞼部１０２は、例えば、開閉機構（例えば、ダイヤフラム及びシリンダ）、及び当該開閉機構を駆動させる駆動回路を含む。瞬き動作を実現するための機構については、種々の公知技術を適用し得る。 The eyelid part 102 opens and closes under the control of the control part 11 . The eyelid 102 includes, for example, an opening/closing mechanism (eg diaphragm and cylinder) and a drive circuit for driving the opening/closing mechanism. Various known techniques can be applied to the mechanism for realizing the blinking action.

図４は、対話装置１０の機能構成を示すブロック図である。対話装置１０の制御部１１は、プログラム１４１を実行することにより、瞬き動作制御部１１１と、第１取得部１１２と、瞬き検出部１１３と、第２取得部１１４と、処理部１１５とに相当する機能を実現する。 FIG. 4 is a block diagram showing the functional configuration of the interactive device 10. As shown in FIG. By executing the program 141, the control unit 11 of the interactive device 10 corresponds to the blinking operation control unit 111, the first acquisition unit 112, the blink detection unit 113, the second acquisition unit 114, and the processing unit 115. Realize the function to

瞬き動作制御部１１１は、瞼部１０２を開閉させることにより、対話装置１０の瞬き動作を制御する。瞬き動作制御部１１１は、例えば、瞬き動作をするための瞬き制御データを、瞼部１０２に出力する。瞼部１０２は、瞬き制御データに応じて開閉する。 The blinking motion control unit 111 controls the blinking motion of the interactive device 10 by opening and closing the eyelid part 102 . The blinking motion control unit 111 outputs, for example, blinking control data for performing a blinking motion to the eyelid unit 102 . The eyelid part 102 opens and closes according to the blink control data.

第１取得部１１２は、対話装置１０（瞼部１０２）の瞬き動作のタイミングを取得する。第１取得部１１２は、例えば、瞬き動作制御部１１１から瞬き動作のタイミングを示すデータを取得する。 The first acquisition unit 112 acquires the timing of the blinking motion of the interactive device 10 (eyelid unit 102). The first acquisition unit 112 acquires, for example, data indicating the timing of the blinking motion from the blinking motion control unit 111 .

瞬き検出部１１３は、ユーザＵの瞬きを検出する。具体的には、瞬き検出部１１３は、撮像部１５により生成された撮像データに基づいて、ユーザＵの瞬きを検出する。 The blink detection unit 113 detects user U's blink. Specifically, the blink detection unit 113 detects blinks of the user U based on the imaging data generated by the imaging unit 15 .

第２取得部１１４は、ユーザＵの瞬きのタイミングを取得する。第２取得部１１４は、本実施形態では、ユーザＵの瞬きのタイミングを示すデータ（以下「瞬きデータ」という。）を取得する。瞬きデータは、瞬きが検出された時刻を、その時刻順に並べたデータ（第１データ）である。第２取得部１１４は、例えば、瞬き検出部１１３による瞬きの検出結果に基づいて、瞬きデータを生成する。 The second acquisition unit 114 acquires the timing of user U's blinking. In the present embodiment, the second acquisition unit 114 acquires data indicating timing of blinking of the user U (hereinafter referred to as “blink data”). Blink data is data (first data) in which times when blinks are detected are arranged in order of time. The second acquisition unit 114 generates blink data, for example, based on the result of blink detection by the blink detection unit 113 .

処理部１１５は、対話装置１０の瞬き動作のタイミングとユーザＵの瞬きのタイミングとの差異に応じた処理を行う。処理部１１５は、本実施形態では、対話処理を行う。対話処理は、対話データ１４２を用いて対話するための処理である。具体的には、対話処理は、音声入力部１２を介して入力された音声を認識して入力データに変換する処理を含む。また、対話処理は、当該入力データに対応付けられた出力データを音声に変換し、音声入力部１２を介して出力する処理を含む。 The processing unit 115 performs processing according to the difference between the timing of the blinking action of the interactive device 10 and the timing of the user U's blinking. The processing unit 115 performs interactive processing in this embodiment. Dialogue processing is processing for having a dialogue using the dialogue data 142 . Specifically, the dialogue processing includes processing for recognizing speech input via the speech input unit 12 and converting it into input data. Further, the dialogue processing includes processing for converting output data associated with the input data into speech and outputting the speech via the speech input unit 12 .

次に、対話装置１０の動作を説明する。図５は、対話装置１０が実行する処理を示すフローチャートである。 Next, the operation of the interactive device 10 will be described. FIG. 5 is a flow chart showing the processing executed by the interactive device 10. As shown in FIG.

処理部１１５は、対話処理を開始する（ステップＳ１）。対話処理を開始する契機は問わない。処理部１１５は、ユーザＵの存在を認識したタイミングであってもよい。対話装置１０のユーザは、例えば、撮像データが示す画像から認識される人、又は撮像部１５の撮像方向に位置する人、撮像部１５の位置に応じた位置に居る人である。対話装置１０のユーザは、対話装置１０にログインした人であってもよい。また、処理部１１５は、音声入力部１２を介して入力された音声から所定の音声（例えば、挨拶を示す音声）を認識した場合、又は所定の操作を受け付けた場合に、対話処理を開始してもよい。 The processing unit 115 starts interactive processing (step S1). It does not matter what the trigger is for starting the interactive process. The processing unit 115 may recognize the presence of the user U at the timing. The user of the interactive device 10 is, for example, a person recognized from the image indicated by the imaging data, a person positioned in the imaging direction of the imaging unit 15 , or a person at a position corresponding to the position of the imaging unit 15 . A user of the interactive device 10 may be a person who has logged into the interactive device 10 . Further, when the processing unit 115 recognizes a predetermined voice (for example, a greeting voice) from the voice input via the voice input unit 12 or receives a predetermined operation, the processing unit 115 starts dialogue processing. may

次に、処理部１１５は、撮像部１５に撮像を開始させる（ステップＳ２）。撮像が開始すると、対話装置１０は以下で説明する処理を行う。 Next, the processing unit 115 causes the imaging unit 15 to start imaging (step S2). When imaging starts, the interactive device 10 performs the processing described below.

まず、瞬き動作に関する処理を説明する。瞬き動作制御部１１１は、瞬き動作をするかどうかを判断する（ステップＳ１１）。瞬き動作制御部１１１は、例えば、対話装置１０の発話中、又は発話が終了したタイミングで、瞬き動作をすると判断する。発話が終了したタイミングは、例えば、話の切れ目となるタイミングである。瞬き動作のタイミングは、ランダムなタイミングを含んでもよい。 First, the processing related to the blinking motion will be described. The blinking motion control section 111 determines whether or not to perform a blinking motion (step S11). The blinking motion control unit 111 determines to perform a blinking motion, for example, during speech of the interactive device 10 or at the timing when the speech ends. The timing at which the speech ends is, for example, the timing at which the conversation ends. The timing of the blinking action may include random timing.

ステップＳ１１で「ＹＥＳ」と判断した場合、瞬き動作制御部１１１は、対話装置１０に瞬き動作をさせる（ステップＳ１２）。ステップＳ１１で「ＮＯ」と判断した場合、瞬き動作制御部１１１は、瞬き動作をしない。そして、対話装置１０の処理はステップＳ３に進む。 If it is determined as "YES" in step S11, the blinking motion control section 111 causes the interactive device 10 to perform a blinking motion (step S12). If it is determined "NO" in step S11, the blinking motion control section 111 does not perform the blinking motion. Then, the processing of the interactive device 10 proceeds to step S3.

次に、対話装置１０とユーザとのコミュニケーションの質の評価に関する処理を説明する。 Next, the process for evaluating the quality of communication between the interactive device 10 and the user will be described.

第１取得部１１２は、対話装置１０の瞬き動作のタイミングを取得する（ステップＳ２１）。次に、瞬き検出部１１３は、撮像部１５から供給された撮像データに基づいて、ユーザＵの瞬きを検出する（ステップＳ２２）。瞬きの検出のアルゴリズムは、種々の公知技術が適用されてよい。瞬き検出部１１３は、例えば、撮像データが示す画像から、ユーザＵの目の周縁に沿って複数の特徴点を抽出する。瞬き検出部１１３は、例えば、Ｈａａｒ－ｌｉｋｅに基づいて特徴点を抽出する。瞬き検出部１１３は、複数のフレームの撮像データに基づいて、抽出した特徴点の移動の方向、及びその速度の時間的な変化を特定することにより、ユーザＵの瞬きの有無を検出する。例えば人の瞬きに起因して、およそ０～３００ミリ秒の間に、特徴点の急激な速度の変化が生じる。そこで、瞬き検出部１１３は、所定期間内の速度変化が閾値以上となった場合、ユーザＵの瞬きがあったことを検出する。 The first acquisition unit 112 acquires the timing of the blinking motion of the interactive device 10 (step S21). Next, the blink detection unit 113 detects blinks of the user U based on the imaging data supplied from the imaging unit 15 (step S22). Various known techniques may be applied to the blink detection algorithm. The blink detection unit 113 extracts, for example, a plurality of feature points along the periphery of the user's U eyes from the image indicated by the imaging data. The blink detection unit 113 extracts feature points based on Haar-like, for example. The blink detection unit 113 detects whether or not the user U blinks by specifying the direction of movement of the extracted feature points and the temporal change in the speed thereof based on the imaging data of a plurality of frames. For example, due to human blinking, there is a sudden change in velocity of feature points between approximately 0 and 300 milliseconds. Therefore, the blink detection unit 113 detects that the user U has blinked when the change in speed within a predetermined period is greater than or equal to the threshold.

次に、第２取得部１１４は、瞬きの検出結果に基づいて、ユーザＵの瞬きのタイミングを示す瞬きデータを取得する（ステップＳ２３）。 Next, the second acquisition unit 114 acquires blink data indicating the blink timing of the user U based on the blink detection result (step S23).

次に、処理部１１５は、指標算出処理を行う（ステップＳ２４）。指標算出処理は、対話装置１０とユーザＵとのコミュニケーションの質の指標を算出する処理である。 Next, the processing unit 115 performs index calculation processing (step S24). The index calculation process is a process of calculating an index of the quality of communication between the interactive device 10 and the user U. FIG.

図６は、指標算出処理を示すフローチャートである。以下、指標算出処理を、具体例を挙げて説明する。 FIG. 6 is a flowchart showing index calculation processing. The index calculation process will be described below with specific examples.

まず、処理部１１５は、ユーザＵの瞬きのタイミングと、対話装置１０の瞬き動作のタイミングとの差異を算出する（ステップＳ２４１）。処理部１１５は、例えば、所定期間におけるユーザＵの瞬きのタイミングと、瞼部１０２の瞬き動作のタイミングとのすべての組み合わせについて差異（以下「タイミング差」という。）を算出する。所定期間は、例えば３０秒であるが、３０秒未満であってもよいし、３０秒よりも長くてもよい。所定期間は、例えば、対話装置１０の発話が終了するタイミングから所定時間前まで遡った期間の全体又は一部の期間である。 First, the processing unit 115 calculates the difference between the blinking timing of the user U and the blinking motion timing of the interactive device 10 (step S241). For example, the processing unit 115 calculates differences (hereinafter referred to as “timing differences”) for all combinations of the blink timings of the user U and the blinking motion timings of the eyelids 102 in a predetermined period. The predetermined period is, for example, 30 seconds, but may be less than 30 seconds or longer than 30 seconds. The predetermined period is, for example, the whole or a part of the period before a predetermined period of time from the timing when the speech of the dialogue device 10 ends.

図７は、タイミング差の算出方法を説明する図である。図７に示すタイミングチャートは、ユーザＵの瞬きのタイミング、及び対話装置１０が瞬き動作をしたタイミングを示す。図７に示すように、ユーザＵの瞬きのタイミングを、その時刻順に、ｔ１１，ｔ１２，・・・ｔ１Ｂと表す。対話装置１０の瞬き動作のタイミングを、その時刻順に、ｔ２１，ｔ２２，・・・ｔ２７と表す。この場合、処理部１１５は、ｔ１１，ｔ１２，・・・ｔ１Ｂの各々について、ｔ２１，ｔ２２，・・・ｔ２７の各々との差異を算出する。瞬きのタイミングｔ１ｉと瞬き動作のタイミングｔ２ｊとの差異であるタイミング差を、以下「Δｔｉｊ」と表す。この場合、処理部１１５は、タイミング差ＴＤ｛Δｔ１１、Δｔ１２、・・・、Δｔ１７、Δｔ２１、Δｔ２２、・・・、Δｔ２７、・・・ΔｔＢ１、ΔｔＢ２、・・・、ΔｔＢ７｝を算出する。 FIG. 7 is a diagram explaining a method of calculating the timing difference. The timing chart shown in FIG. 7 shows the timing of blinking of the user U and the timing of the blinking motion of the interactive device 10 . As shown in FIG. 7, the blink timings of the user U are expressed as t11, t12, . . . t1B in order of time. The timings of the blinking motions of the interactive device 10 are expressed as t21, t22, . . . t27 in order of time. In this case, the processing unit 115 calculates the difference between each of t11, t12, . . . t1B and each of t21, t22, . The timing difference between the blinking timing t1i and the blinking timing t2j is hereinafter referred to as "Δtij". In this case, the processing unit 115 calculates timing differences TD {Δt11, Δt12, . . . , Δt17, Δt21, Δt22, .

図８は、タイミング差の出現頻度の分布を示すグラフＤＧを示す。図８のグラフＤＧにおいて、横軸がタイミング差に対応し、縦軸が各タイミング差の出現の度合い（つまり、出現頻度）に対応する。図８に示す例では、或る時間範囲Ｔ内で、出現頻度が高くなっている。 FIG. 8 shows a graph DG showing the distribution of the appearance frequency of timing differences. In the graph DG of FIG. 8, the horizontal axis corresponds to the timing difference, and the vertical axis corresponds to the degree of appearance of each timing difference (that is, appearance frequency). In the example shown in FIG. 8, within a certain time range T, the appearance frequency is high.

ところで、グラフＤＧで示される出現頻度の分布は、ユーザＵと対話装置１０とのコミュニケーションだけでなく、ユーザＵの瞬きの特性（例えば、回数及び頻度）、及び対話装置１０の瞬き動作の特性（例えば、回数及び頻度）に起因して生じたと考えられる。例えば、ユーザＵの瞬きの頻度、及び対話装置１０の瞬き動作の頻度が高い場合ほど、小さいタイミング差の出現頻度が高くなりやすい。このため、タイミング差ＴＤが、どの程度、対話装置１０とユーザＵとのコミュニケーションに起因して生じたかを明らかにする必要がある。そこで、処理部１１５は、サロゲートデータ法に基づいて、出現頻度の分布を解析する。 By the way, the appearance frequency distribution shown by the graph DG is not only the communication between the user U and the dialogue device 10, but also the characteristics of the blinking of the user U (for example, the number and frequency) and the characteristics of the blinking action of the dialogue device 10 ( for example, number and frequency). For example, when the frequency of blinking of the user U and the frequency of the blinking action of the interactive device 10 are high, the frequency of occurrence of small timing differences tends to be high. Therefore, it is necessary to clarify how much the timing difference TD is caused by the communication between the interactive device 10 and the user U. Therefore, the processing unit 115 analyzes the appearance frequency distribution based on the surrogate data method.

すなわち、処理部１１５は、ランダムデータ（第２データ）を生成する（ステップＳ２４２）。ランダムデータは、時間軸上で対話装置１０の瞬き動作の間隔の順番をランダムに変更したデータを含む。 That is, the processing unit 115 generates random data (second data) (step S242). The random data includes data obtained by randomly changing the order of blinking motion intervals of the interactive device 10 on the time axis.

図９は、ランダムデータＲ１～ＲＫの一例を示す図である。処理部１１５は、Ｋ通り（例えば、１０００通り）のランダムデータＲ１～ＲＫを生成する。図９に示すランダムデータＲ１～ＲＫにおいては、対話装置１０の瞬き動作の間隔の順番が変更され、ユーザＵの瞬きの間隔の順番は変更されていない。なお、図７に示すタイミング「ｔ２ｊ」と、図９に示すタイミング「ｔ２ｊａ」とが対応する。 FIG. 9 is a diagram showing an example of random data R1-RK. The processing unit 115 generates K types (for example, 1000 types) of random data R1 to RK. In the random data R1 to RK shown in FIG. 9, the order of the blinking intervals of the interactive device 10 is changed, and the order of the blinking intervals of the user U is not changed. Note that the timing "t2j" shown in FIG. 7 corresponds to the timing "t2ja" shown in FIG.

次に、処理部１１５は、生成したランダムデータの各々について、ユーザＵの瞬きのタイミングと、対話装置１０の瞬き動作のタイミングとの差異であるタイミング差を算出する（ステップＳ２４３）。タイミング差の算出方法は、ステップＳ２４１と同じでよい。瞬きのタイミングｔ１ｉａと瞬き動作のタイミングｔ２ｊａとの差異であるタイミング差を、以下「Δｔｉｊａ」と表す。図９に示す場合、処理部１１５は、ランダムデータに基づいて、タイミング差ＴＲ｛Δｔ１１ａ、Δｔ１５ａ、・・・、Δｔ１３ａ、Δｔ２１ａ、Δｔ２５ａ、・・・、Δｔ２３ａ、・・・ΔｔＢ１ａ、ΔｔＢ５ａ、・・・、ΔｔＢ３ａ｝を算出する。ランダムデータにおけるタイミング差の出現頻度は、例えば、図８のグラフＲＧで示される。なお、グラフＲＧは、ランダムデータＲ１～ＲＫのタイミング差の出現頻度の平均を示す。 Next, the processing unit 115 calculates a timing difference, which is the difference between the blinking timing of the user U and the blinking motion timing of the interactive device 10, for each of the generated random data (step S243). The timing difference calculation method may be the same as in step S241. The timing difference, which is the difference between the blinking timing t1ia and the blinking timing t2ja, is hereinafter referred to as "Δtija". In the case shown in FIG. 9, the processing unit 115 calculates timing differences TR {Δt11a, Δt15a, . ·, ΔtB3a} are calculated. The appearance frequency of the timing difference in random data is indicated by graph RG in FIG. 8, for example. Graph RG indicates the average frequency of appearance of the timing differences of random data R1 to RK.

次に、処理部１１５は、瞬きデータに基づくタイミング差と、ランダムデータに基づくタイミング差とに応じた評価値を算出する（ステップＳ２４４）。評価値は、対話装置１０とユーザＵとのコミュニケーションの質の指標となる指標値である。ランダムデータは、対話装置１０の瞬き動作の間隔をランダムに変更したデータである。このため、ランダムデータは、対話装置１０の瞬き動作の回数、及び間隔を維持したまま、時系列の情報が崩されたデータといえる。よって、タイミング差ＴＤの分布と、ランダムデータＲ１～ＲＫの分布とを比較することによって、ユーザＵの瞬きデータが示すタイミング差の出現分布が、対話装置１０とユーザＵとのコミュニケーションに起因して現れた度合いを把握することができる。 Next, processing unit 115 calculates an evaluation value corresponding to the timing difference based on blink data and the timing difference based on random data (step S244). The evaluation value is an index value that indicates the quality of communication between the interactive device 10 and the user U. FIG. Random data is data obtained by randomly changing intervals between blinking motions of the interactive device 10 . For this reason, the random data can be said to be data in which time-series information has been disrupted while maintaining the number and interval of blinking motions of the interactive device 10 . Therefore, by comparing the distribution of the timing difference TD with the distribution of the random data R1 to RK, it can be determined that the appearance distribution of the timing difference indicated by the blink data of the user U is caused by the communication between the interactive device 10 and the user U. The degree of appearance can be grasped.

処理部１１５は、評価値をＺ値によって算出する。すなわち、処理部１１５は、瞬きデータが示すタイミング差ＴＤ｛Δｔ１１、Δｔ１２、・・・、Δｔ１７、Δｔ２１、Δｔ２２、・・・、Δｔ２７、・・・ΔｔＢ１、ΔｔＢ２、・・・、ΔｔＢ７｝の各々から、ランダムデータＲ１～ＲＫにおけるタイミング差の平均値を減じ、さらに、得られた値をランダムデータＲ１～ＲＫにおけるタイミング差の標準偏差で除することによって、評価値を算出する。例えば、タイミング差ＴＤの分布がランダムデータの分布と同じである場合、評価値は「０」である。この場合、ユーザＵの瞬きが、対話装置１０とユーザＵとのコミュニケーションに起因の影響を受けていないと推測することができる。一方、評価値が大きく、タイミング差ＴＤの分布とランダムデータの分布との差異が大きい場合、ユーザＵの瞬きが、対話装置１０とユーザＵとのコミュニケーションの影響を受けていると推測することができる。図８を用いて説明すると、出現頻度の差異Δが大きい場合ほど、コミュニケーションの影響をより強く受けていると推測され、評価値は大きくなる。 The processing unit 115 calculates the evaluation value using the Z value. That is, the processing unit 115 calculates each timing difference TD {Δt11, Δt12, . . . , Δt17, Δt21, Δt22, . , the average value of the timing differences in the random data R1 to RK is subtracted, and the obtained value is divided by the standard deviation of the timing differences in the random data R1 to RK to calculate the evaluation value. For example, when the distribution of the timing difference TD is the same as the distribution of random data, the evaluation value is "0". In this case, it can be inferred that user U's blinking is not affected by communication between interactive device 10 and user U. On the other hand, when the evaluation value is large and the difference between the distribution of the timing difference TD and the distribution of the random data is large, it can be inferred that the blink of the user U is affected by the communication between the interactive device 10 and the user U. can. To explain using FIG. 8, the greater the difference Δ in appearance frequency, the greater the influence of communication, and the greater the evaluation value.

図１０は、評価値の一例を示すグラフである。図１０に示すグラフにおいて、横軸はタイミング差に対応し、縦軸は評価値に対応する。タイミング差が正の値である場合、ユーザＵの瞬きのタイミングが対話装置１０の瞬き動作のタイミングよりも遅れていることを意味する。タイミング差が負の値である場合、ユーザＵの瞬きのタイミングが対話装置１０の瞬き動作のタイミングよりも早いことを意味する。また、タイミング差は、ここでは、２５０ミリ秒刻みで表す。図１０において、タイミング差「０」ミリ秒は、タイミング差が０ミリ秒以上２５０ミリ秒未満であることを示す。タイミング差「＋２５０」ミリ秒及び「－２５０」ミリ秒は、タイミング差が２５０秒以上５００ミリ秒未満であることを示す。本実施形態では、処理部１１５は、ユーザＵの瞬きのタイミングが対話装置１０の瞬き動作のタイミングよりも後の所定期間内に含まれる度合いに基づいて、対話装置１０とユーザＵのコミュニケーションの質の評価値を算出する。具体的には、処理部１１５は、タイミング差「＋２５０」ミリ秒に対応する評価値を、対話装置１０とユーザＵとのコミュニケーションの質の評価値として算出する。すなわち、処理部１１５は、ユーザＵの瞬きのタイミングが、対話装置１０の瞬き動作のタイミングよりも遅れ、かつそのタイミング差が２５０ミリ秒以上５００ミリ秒未満である瞬きの出現頻度に基づいて、評価値を算出する。図１０に示す例では、評価値は「０．４」である。以上が、指標算出処理の説明である。 FIG. 10 is a graph showing an example of evaluation values. In the graph shown in FIG. 10, the horizontal axis corresponds to the timing difference, and the vertical axis corresponds to the evaluation value. If the timing difference is a positive value, it means that the blinking timing of the user U is behind the blinking motion timing of the interactive device 10 . If the timing difference is a negative value, it means that the user U blinks earlier than the interactive device 10 blinks. Also, the timing difference is expressed here in increments of 250 milliseconds. In FIG. 10, a timing difference of "0" milliseconds indicates that the timing difference is 0 milliseconds or more and less than 250 milliseconds. A timing difference of "+250" milliseconds and "-250" milliseconds indicates that the timing difference is greater than or equal to 250 seconds and less than 500 milliseconds. In this embodiment, the processing unit 115 determines the quality of communication between the dialogue device 10 and the user U, based on the degree to which the timing of the blinking of the user U is included within a predetermined period after the timing of the blinking action of the dialogue device 10. Calculate the evaluation value of Specifically, the processing unit 115 calculates the evaluation value corresponding to the timing difference of “+250” milliseconds as the evaluation value of the quality of communication between the interactive device 10 and the user U. FIG. That is, the processing unit 115, based on the occurrence frequency of blinking, where the blinking timing of the user U is later than the blinking operation timing of the interactive device 10 and the timing difference is 250 milliseconds or more and less than 500 milliseconds, Calculate the evaluation value. In the example shown in FIG. 10, the evaluation value is "0.4". The above is the description of the index calculation processing.

なお、ランダムデータは、対話装置１０の瞬き動作の間隔の順番が変更されておらず、ユーザＵの瞬きの間隔の順番が変更されたデータであってもよい。また、ランダムデータは、対話装置１０の瞬き動作の間隔の順番、及びユーザＵの瞬きの間隔の順番が変更されたデータであってもよい。指標算出処理が終了すると、対話装置１０の処理はステップＳ３に進む。 The random data may be data in which the order of the blinking intervals of the user U is changed without changing the order of the blinking motion intervals of the interactive device 10 . The random data may be data in which the order of the blinking intervals of the interactive device 10 and the order of the blinking intervals of the user U are changed. When the index calculation process ends, the process of the interactive device 10 proceeds to step S3.

次に、図５に戻って対話処理に関する処理を説明する。処理部１１５は、ステップＳ２４の指標算出処理で算出された評価値が閾値以上であるかどうかを判断する（ステップＳ３１）。評価値は、ここでは、直近の期間に対応する評価値である。閾値は、ユーザＵが対話装置１０との対話に関心があるかどうかを判断する際の指標となる値である。閾値は、例えば、あらかじめ決められた値である。 Next, referring back to FIG. 5, processing related to interactive processing will be described. The processing unit 115 determines whether the evaluation value calculated in the index calculation process of step S24 is equal to or greater than the threshold (step S31). The evaluation value here is the evaluation value corresponding to the most recent period. The threshold is a value that serves as an index for determining whether the user U is interested in interacting with the interactive device 10 . The threshold is, for example, a predetermined value.

ステップＳ３１で「ＹＥＳ」と判断した場合、処理部１１５は、第１対話処理を行う（ステップＳ３２）。ステップＳ３２で「ＮＯ」と判断した場合、処理部１１５は、第１対話処理とは異なる第２対話処理を行う（ステップＳ３３）。すなわち、処理部１１５は、評価値が閾値以上であるか否かに応じて異なる対話処理を行う。評価値が閾値以上である場合、ユーザＵの対話装置１０とのコミュニケーションへの関心度合いが高いと推測できる。よって、処理部１１５は、例えば、現在の話題を変更しない第１対話処理を行う。処理部１１５は、例えば、ユーザＵとサッカーについて対話をしていた場合、引き続きサッカーについて対話をする。この際、処理部１１５は、対話データに含まれる識別子「ＩＤ００１」に対応付けられた入力データ及び出力データに基づいて、第１対話処理を行う。 If "YES" is determined in step S31, the processing unit 115 performs first interactive processing (step S32). If "NO" is determined in step S32, the processing unit 115 performs a second dialogue process different from the first dialogue process (step S33). That is, the processing unit 115 performs different interactive processing depending on whether the evaluation value is equal to or greater than the threshold. If the evaluation value is equal to or greater than the threshold, it can be inferred that the user U has a high degree of interest in communication with the interactive device 10 . Therefore, the processing unit 115 performs, for example, the first interactive process without changing the current topic. For example, when the processing unit 115 is having a dialogue about soccer with the user U, the processing unit 115 continues to have a dialogue about soccer. At this time, the processing unit 115 performs the first dialogue process based on the input data and output data associated with the identifier "ID001" included in the dialogue data.

一方、評価値が閾値未満である場合、ユーザＵの対話装置１０とのコミュニケーションへの関心度合いが低いと推測できる。よって、処理部１１５は、現在の話題を変更した第２対話処理を行う。処理部１１５は、例えば、ユーザＵとサッカーについて対話をしていた場合、今日のランチについての対話に変更する。この際、処理部１１５は、対話データに含まれる識別子「ＩＤ００２」に対応付けられた入力データ及び出力データに基づいて、第２対話処理を行う。 On the other hand, if the evaluation value is less than the threshold, it can be inferred that the user U has a low degree of interest in communication with the interactive device 10 . Therefore, the processing unit 115 performs the second dialogue process with the current topic changed. For example, when the processing unit 115 is having a dialogue with the user U about soccer, the processing unit 115 changes the dialogue to a dialogue about today's lunch. At this time, the processing unit 115 performs the second dialogue process based on the input data and output data associated with the identifier “ID002” included in the dialogue data.

以上のとおり、処理部１１５は、対話装置１０の瞬き動作のタイミングとユーザＵの瞬きのタイミングとの差異に基づく指標値（本実施形態では、評価値）に応じた処理を行う。ただし、上述した第１対話処理及び第２対話処理は一例であり、種々の変形が可能である。処理部１１５は、評価値が閾値未満になると直ちに対話の話題を変更した第２対話処理を行うのではなく、第１対話処理を継続してもよい。この場合、処理部１１５は、評価値が閾値未満である期間が所定期間継続した場合、又は評価値が閾値未満となった回数が所定回数以上となった場合に、第１対話処理から第２対話処理に変更してもよい。そして、対話装置１０の処理はステップＳ３に進む。 As described above, the processing unit 115 performs processing according to an index value (evaluation value in the present embodiment) based on the difference between the blinking timing of the interactive device 10 and the blinking timing of the user U. However, the above-described first dialogue processing and second dialogue processing are examples, and various modifications are possible. The processing unit 115 may continue the first dialogue process instead of immediately performing the second dialogue process in which the topic of dialogue is changed when the evaluation value becomes less than the threshold value. In this case, when the period in which the evaluation value is less than the threshold continues for a predetermined period of time, or when the number of times the evaluation value is less than the threshold exceeds a predetermined number of times, the processing unit 115 performs You can change to interactive processing. Then, the processing of the interactive device 10 proceeds to step S3.

ステップＳ３において、処理部１１５は、対話処理を終了するかどうかを判断する。処理部１１５は、例えば、撮像部１５から供給された撮像データに基づいて、ユーザＵの存在を認識しなくなった場合には、対話処理を終了すると判断する。処理部１１５は、音声入力部１２を介して入力された音声から所定の音声（例えば、別れのあいさつを示す音声）を認識した場合、又は所定の操作を受け付けた場合に、対話処理を終了すると判断してもよい。 At step S3, the processing unit 115 determines whether or not to end the interactive process. For example, when the presence of the user U is no longer recognized based on the imaging data supplied from the imaging unit 15, the processing unit 115 determines to end the dialogue processing. When the processing unit 115 recognizes a predetermined voice (for example, a voice indicating a farewell greeting) from the voice input via the voice input unit 12 or receives a predetermined operation, the processing unit 115 terminates the dialogue processing. You can judge.

対話処理を継続すると判断した場合（ステップＳ３；ＮＯ）、対話装置１０の処理は、ステップＳ１１，Ｓ２１，Ｓ３１に戻される。対話処理を終了すると判断した場合（ステップＳ３；ＹＥＳ）、処理部１１５は対話処理を終了する。 If it is determined to continue the dialogue processing (step S3; NO), the processing of the dialogue device 10 is returned to steps S11, S21 and S31. If it is determined to end the dialogue processing (step S3; YES), the processing unit 115 ends the dialogue processing.

対話装置１０によれば、ユーザＵの瞬きのタイミングと、対話装置１０の瞬き動作のタイミングとの差異に応じて、ユーザＵの対話装置１０とのコミュニケーションに対する関心度合いを定量的に評価することができる。さらに、対話装置１０は、この評価を対話処理に反映させることにより、対話装置１０とユーザＵとのコミュニケーションを支援することができる。また、対話装置１０は、ユーザＵの瞬きという自然な動作に基づいて、コミュニケーションの質を評価することができる。よって、対話装置１０によれば、ユーザＵに評価のために必要な動作を要求しなくとも、当該評価を行うことができる。 According to the dialogue device 10, the degree of interest of the user U in communication with the dialogue device 10 can be quantitatively evaluated according to the difference between the timing of the blinking of the user U and the timing of the blinking action of the dialogue device 10. can. Further, the dialogue device 10 can support communication between the dialogue device 10 and the user U by reflecting this evaluation in dialogue processing. Further, the interactive device 10 can evaluate the quality of communication based on the natural action of the user U blinking. Therefore, according to the interactive device 10, the evaluation can be performed without requesting the user U to perform an action necessary for the evaluation.

ここで、ユーザＵの瞬きのタイミングと、対話装置１０の瞬き動作のタイミングとの差異を、コミュニケーションの質の評価に用いることができる根拠を説明する。発明者らは、以下で説明する方法で、話し手及び聞き手の瞬きのタイミング差が、話し手と聞き手とのコミュニケーションに対する関心度合いの指標になることを確認する検証を行った。 Here, the reason why the difference between the blinking timing of the user U and the blinking motion of the interactive device 10 can be used to evaluate the quality of communication will be described. The inventors conducted verification by the method described below to confirm that the difference in blink timing between the speaker and the listener serves as an indicator of the degree of interest in communication between the speaker and the listener.

話し手は、商品の実演販売を業とする実演販売士である。実演販売士は、「女性向けのドライヤー」、「女性向けの化粧品」、「男性向けの腕時計」、及び「男性向けの電気シェーバー」の４つの商品について、それぞれ商品紹介を行った。各商品の紹介時間は約３分である。聞き手は、男性が１８人、女性が２０人の計３８人の大学生である。３８人の聞き手は、話し手が行った商品紹介の様子を撮像した動画を視聴した後、各商品紹介を面白く感じたかどうかを回答した。話し手と聞き手との瞬きのタイミング差については、話し手及び聞き手をそれぞれ撮像した動画から検出し、話し手の瞬きのタイミングの前後における、聞き手の瞬きのタイミングを解析した。 The speaker is a demonstrator whose business is demonstrating sales of goods. The demonstrating sales person introduced four products, namely, "dryer for women", "cosmetics for women", "watch for men", and "electric shaver for men". The introduction time for each product is about 3 minutes. The interviewees were a total of 38 university students, 18 men and 20 women. Thirty-eight listeners, after watching the video of the product introduction by the speaker, answered whether they found each product introduction interesting. The difference in blink timing between the speaker and the listener was detected from moving images of the speaker and the listener, and the listener's blink timing before and after the speaker's blink timing was analyzed.

図１１は、３８人の聞き手の瞬きの頻度の分布を示すグラフである。図１１に示すグラフにおいて、横軸は時刻に対応し、縦軸は評価値（Ｚ値）に対応する。時刻は、話し手の瞬きのタイミングを「０」とし、それよりも早い聞き手の瞬きのタイミングを負の値で、それよりも遅い聞き手の瞬きのタイミングを正の値で示す。すなわち、図１１のグラフの横軸はタイミング差を示す。図１１においても、図１０と同様、タイミング差を２５０秒刻みで表す。この検証においては、評価値の算出にあたり、話し手の瞬きの間隔の順番を変更し、聞き手の瞬きの間隔の順番を変更しない方法によりランダムデータが用いられている。図１１に示すように、話し手の瞬きから＋２５０ミリ秒以上＋５００ミリ秒未満の時間範囲内で遅れて聞き手の瞬きが増大し、評価値が高くなった。なお、ｐ値は、０．０００００１である。 FIG. 11 is a graph showing the frequency distribution of blinks of 38 listeners. In the graph shown in FIG. 11, the horizontal axis corresponds to time and the vertical axis corresponds to evaluation value (Z value). For time, the timing of the blink of the speaker is set to "0", the timing of the blink of the listener earlier than that is indicated by a negative value, and the timing of the listener's blink later is indicated by a positive value. That is, the horizontal axis of the graph in FIG. 11 indicates the timing difference. In FIG. 11 as well, the timing difference is expressed in increments of 250 seconds as in FIG. In this verification, in calculating the evaluation value, random data is used by a method in which the order of blink intervals of the speaker is changed and the order of blink intervals of the listener is not changed. As shown in FIG. 11, the listener's blink increased with a delay of +250 milliseconds or more and less than +500 milliseconds from the speaker's blink, and the evaluation value increased. Note that the p-value is 0.000001.

図１２は、回答結果毎の聞き手の瞬きの頻度の分布を示すグラフである。図１２に示すグラフにおいても、図１１と同様、横軸は時刻に対応し、縦軸は評価値に対応する。実線のグラフは、商品紹介を面白いと回答した聞き手についての評価値である。破線のグラフは、商品紹介をつまらないと回答した聞き手についての評価値である。 FIG. 12 is a graph showing the distribution of listener blink frequency for each answer result. In the graph shown in FIG. 12, as in FIG. 11, the horizontal axis corresponds to the time and the vertical axis corresponds to the evaluation value. The solid line graph is the evaluation value of listeners who answered that the product introduction was interesting. The dashed line graph is the evaluation value for listeners who answered that the product introduction was boring.

図１２に示すように、商品紹介を面白いと回答した聞き手については、話者の瞬きから＋２５０ミリ秒以上＋５００ミリ秒未満の時間範囲内で遅れて瞬きが増大し、評価値が高くなった。一方、商品紹介をつまらないと回答した聞き手については、このような瞬きの増大、及び評価値の高まりは確認できなかった。なお、ｐ値は、０．００４である。 As shown in FIG. 12, the listeners who answered that the product introduction was interesting had a delay of +250 milliseconds or more and less than +500 milliseconds from the speaker's blinks, and their blinks increased and the evaluation value increased. On the other hand, for the listeners who answered that the product introduction was boring, such an increase in blinking and an increase in the evaluation value could not be confirmed. Note that the p-value is 0.004.

図１３は、評価値を聞き手の性別及び商品別に示したグラフである。図１３に示すように、「女性向けの化粧品」については、女性の聞き手については評価値が高い値を示す一方、男性の聞き手については評価値が低い値を示した。「男性向けの腕時計」及び「男性向けの電気シェーバー」については、男性の聞き手については評価値が高い値を示す一方、女性の聞き手については評価値が低い値を示した。 FIG. 13 is a graph showing evaluation values according to listener's gender and product. As shown in FIG. 13, regarding "cosmetics for women", female listeners showed high evaluation values, while male listeners showed low evaluation values. Regarding "watches for men" and "electric shavers for men", male listeners showed high evaluation values, while female listeners showed low evaluation values.

図１４は、商品への関心度を聞き手の性別及び商品別に示したグラフである。図１４に示す商品に対する関心度は、各被験者が「非常に退屈」と回答した場合の関心度を「１」、「少し退屈」と回答した場合の関心度を「２」、「少し面白い」と回答した場合の関心度を「３」、「非常に面白い」と回答した場合の関心度を「４」とし、男女それぞれについてその関心度の平均をとった値を示す。関心度の値が大きいほど、聞き手が商品紹介に高い関心を示したことを意味する。図１３と図１４とを対比すると、各商品について、評価値と商品紹介への関心度とが相関することが確認できた。 FIG. 14 is a graph showing the degree of interest in products according to the sex of listeners and products. The degree of interest in the product shown in FIG. 14 is "1" when each subject answers "extremely bored", "2" when "slightly bored", and "a little interesting". The level of interest is "3" when the answer is yes, and the level of interest is "4" when the answer is "very interesting". A larger value of the degree of interest means that the listener showed a higher interest in the product introduction. By comparing FIG. 13 and FIG. 14, it was confirmed that there is a correlation between the evaluation value and the degree of interest in product introduction for each product.

以上の検証により、発明者らは、話し手と聞き手との瞬きのタイミング差が、聞き手の話し手の対話への関心度合いと相関する、という知見を得られた。 From the above verification, the inventors have obtained the knowledge that the difference in blink timing between the speaker and the listener correlates with the listener's degree of interest in the speaker's dialogue.

［第２実施形態］
第２実施形態は、対話装置１０の周辺の環境に基づいて、対話装置１０の瞬き動作のタイミングを制御する。以下の説明において、上述した第１実施形態の要素と同一の要素は同じ符号を付して表す。本実施形態の対話装置１０のハードウェア構成は、上述した第１実施形態と同じでよい。 [Second embodiment]
The second embodiment controls the timing of the blinking action of the interactive device 10 based on the environment around the interactive device 10 . In the following description, elements that are the same as those of the first embodiment described above are denoted by the same reference numerals. The hardware configuration of the interactive device 10 of this embodiment may be the same as that of the first embodiment described above.

図１５は、本実施形態の対話装置１０の機能構成を示すブロック図である。対話装置１０の制御部１１は、プログラム１４１を実行することにより、瞬き動作制御部１１１と、第１取得部１１２と、瞬き検出部１１３と、第２取得部１１４と、処理部１１５と、環境情報取得部１１６と、記憶制御部１１７とに相当する機能を実現する。 FIG. 15 is a block diagram showing the functional configuration of the interactive device 10 of this embodiment. By executing the program 141, the control unit 11 of the interactive device 10 controls the blink operation control unit 111, the first acquisition unit 112, the blink detection unit 113, the second acquisition unit 114, the processing unit 115, the environment Functions corresponding to the information acquisition unit 116 and the storage control unit 117 are realized.

環境情報取得部１１６は、対話装置１０の周辺の環境を示す環境情報を取得する。環境情報は、ここでは、対話装置１０がユーザＵによって使用されているときの環境、換言すると、対話装置１０がユーザＵと対話しているときの環境を示す。環境情報は、例えば、音声情報、音圧情報、プロソディ、動き情報、及び周辺情報のうちの１つ以上を含む。音声情報は、音声入力部１２を介して入力された音声を示す情報、音声出力部１３を介して出力される音声を示す情報、又はこれらの両方を含む。音圧情報は、当該音声情報の所定の周波数帯域（例えば、可聴域）における音圧を示す。プロソディは、発話において現れる音声学的性質を示し、例えば抑揚である。動き情報は、ユーザＵの体動（例えば、顔、体又は表情の動き）を示す。周辺情報は、ユーザＵの周辺の環境を示す（例えば、ユーザＵが居る空間の明るさ）。音声情報、音圧情報、及びプロソディは、音声入力部１２を介して入力された音声、及び音声出力部１３に供給される音声信号に基づいて特定される。動き情報、及び周辺情報は、撮像部１５を用いて特定される。環境情報は、別の計測装置を用いて取得されてもよい。また、環境情報は、さらに、対話装置１０の想定年齢、性別、職業、及びユーザＵに関する情報を含んでもよい。 The environment information acquisition unit 116 acquires environment information indicating the environment around the interactive device 10 . The environmental information here indicates the environment when the interactive device 10 is used by the user U, in other words, the environment when the interactive device 10 is interacting with the user U. FIG. Environmental information includes, for example, one or more of audio information, sound pressure information, prosody, motion information, and ambient information. The audio information includes information indicating audio input via the audio input unit 12, information indicating audio output via the audio output unit 13, or both. The sound pressure information indicates the sound pressure in a predetermined frequency band (for example, audible range) of the audio information. Prosody refers to phonetic properties that appear in speech, such as intonation. The movement information indicates body movement of the user U (for example, movement of the face, body, or facial expression). The peripheral information indicates the environment around the user U (for example, the brightness of the space where the user U is present). The audio information, the sound pressure information, and the prosody are specified based on the audio input via the audio input section 12 and the audio signal supplied to the audio output section 13 . Motion information and peripheral information are specified using the imaging unit 15 . Environmental information may be obtained using another measurement device. Moreover, the environment information may further include the assumed age, gender, occupation, and information about the user U of the interactive device 10 .

記憶制御部１１７は、ユーザＵが瞬きをしたタイミングと対話装置１０が使用されている環境とを対応付けたを示すデータを、学習データ１４３として記憶部１４に蓄積させる。すなわち、学習データ１４３は、対話装置１０の周辺の環境と、ユーザＵが実際にした瞬きをするタイミングとの関係を学習した結果を示すデータである。なお、記憶制御部１１７は、記憶部１４以外の記憶装置、例えばクラウトストレージサービスに係る記憶装置に、学習データ１４３を蓄積してもよい。 The storage control unit 117 causes the storage unit 14 to accumulate, as learning data 143, data indicating the association between the timing at which the user U blinks and the environment in which the interactive device 10 is used. That is, the learning data 143 is data indicating the result of learning the relationship between the surrounding environment of the interactive device 10 and the timing at which the user U actually blinks. Note that the storage control unit 117 may store the learning data 143 in a storage device other than the storage unit 14, such as a storage device related to the cloud storage service.

瞬き動作制御部１１１は、環境情報取得部１１６が取得した環境情報に応じたタイミング（第１タイミング）に、対話装置１０に瞬き動作をさせる。具体的には、瞬き動作制御部１１１は、記憶部１４に記憶された学習データ１４３と、環境情報取得部１１６が取得した環境情報とに応じたタイミングに、対話装置１０に瞬き動作をさせる。瞬き動作制御部１１１は、例えば、対話装置１０に瞬き動作をさせるタイミング、瞬き動作の頻度、及びこれらの両方を制御する。 The blinking motion control unit 111 causes the interactive device 10 to perform a blinking motion at a timing (first timing) according to the environment information acquired by the environment information acquiring unit 116 . Specifically, blinking control unit 111 causes interactive device 10 to blink at timing corresponding to learning data 143 stored in storage unit 14 and environment information acquired by environment information acquiring unit 116 . The blinking motion control unit 111 controls, for example, the timing at which the interactive device 10 performs the blinking motion, the frequency of the blinking motion, and both of these.

次に、対話装置１０の動作を説明する。図１６は、対話装置１０が実行する学習処理を示すフローチャートである。学習処理は、対話処理と並行して行われる。 Next, the operation of the interactive device 10 will be described. FIG. 16 is a flow chart showing the learning process executed by the interactive device 10. As shown in FIG. Learning processing is performed in parallel with interactive processing.

環境情報取得部１１６は、環境情報を取得する（ステップＳ４１）。次に、第２取得部１１４は、ユーザＵの瞬きのタイミングを取得する（ステップＳ４２）。次に、記憶制御部１１７は、ステップＳ４１で取得した環境情報が示す環境と、ステップＳ４２で取得した瞬きのタイミングとを対応付けた示すデータを、学習データ１４３として記憶部１４に蓄積させる（ステップＳ４３）。以上が、学習処理の説明である。 The environment information acquisition unit 116 acquires environment information (step S41). Next, the second acquisition unit 114 acquires the blink timing of the user U (step S42). Next, the storage control unit 117 causes the storage unit 14 to accumulate data indicating the environment indicated by the environment information acquired in step S41 and the blink timing acquired in step S42 as learning data 143 (step S43). The above is the description of the learning process.

図１７は、対話装置１０が実行する瞬き動作に関する処理を示すフローチャートである。図１７の処理は、図５で説明したステップＳ１１，Ｓ１２の処理に代えて実行される。 FIG. 17 is a flow chart showing a process related to a blinking action executed by the interactive device 10. As shown in FIG. The processing of FIG. 17 is executed in place of the processing of steps S11 and S12 described with reference to FIG.

環境情報取得部１１６は、環境情報を取得する（ステップＳ５１）。次に、瞬き動作制御部１１１は、瞬き動作をするかどうかを判断する（ステップＳ５２）。ここでは、瞬き動作制御部１１１は、ステップＳ５１で取得された環境情報と学習データ１４３とに基づいて、瞬き動作をするかどうかを判断する。動作制御部１１１は、例えば、機械学習により瞬き動作の有無、および瞬き動作をするタイミングを判断する。機械学習のアルゴリズムは、例えばニューラルネットワークであるが、これ以外のアルゴリズムでもよい。なお、学習データ１４３が所定の量だけ蓄積されるまでは、瞬き動作制御部１１１は、上述した第１実施形態のステップＳ１１と同じ方法で、瞬き動作をするかどうかを判断してもよい。 The environment information acquisition unit 116 acquires environment information (step S51). Next, the blinking motion control section 111 determines whether or not to perform a blinking motion (step S52). Here, based on the environmental information acquired in step S51 and the learning data 143, the blinking motion control unit 111 determines whether or not to perform the blinking motion. The motion control unit 111 determines the presence or absence of the blinking motion and the timing of the blinking motion by machine learning, for example. The machine learning algorithm is, for example, a neural network, but other algorithms may be used. Until a predetermined amount of learning data 143 is accumulated, the blinking motion control section 111 may determine whether to perform the blinking motion in the same manner as in step S11 of the first embodiment described above.

瞬き動作制御部１１１は、瞬き動作をすると判断した場合は（ステップＳ５２；ＹＥＳ）、対話装置１０に瞬き動作をさせる（ステップＳ５３）。瞬き動作制御部１１１は、瞬き動作をしないと判断した場合は（ステップＳ５２；ＮＯ）、対話装置１０に瞬き動作をさせない。瞬き処理部１１５は、対話装置１０の周辺の環境と、その環境下で人間が行う瞬きとの関係を学習し、その関係に従って対話装置１０に瞬き動作をさせる。瞬き動作制御部１１１は、例えば、意外性の高い話題になったと判断したときは、瞬き動作の頻度を高くしてもよい。以上が、瞬き動作に関する処理の説明である。 When the blinking motion control unit 111 determines that the blinking motion is to be performed (step S52; YES), the blinking motion control unit 111 causes the interactive device 10 to perform the blinking motion (step S53). When the blinking motion control unit 111 determines not to perform the blinking motion (step S52; NO), the interactive device 10 does not perform the blinking motion. The blink processing unit 115 learns the relationship between the surrounding environment of the interactive device 10 and the blinking performed by humans in that environment, and causes the interactive device 10 to perform the blinking action according to the relationship. For example, the blinking motion control unit 111 may increase the frequency of the blinking motion when determining that the topic has become highly unexpected. The above is the description of the processing related to the blinking action.

本実施形態の対話装置１０によれば、上述した第１実施形態と同等の効果に加え、学習データ１４３を用いてより自然なタイミングで瞬き動作をすることができる。これにより、対話装置１０とユーザＵとのコミュニケーションの質の向上を期待することができる。 According to the interactive device 10 of the present embodiment, in addition to the effects equivalent to those of the above-described first embodiment, learning data 143 can be used to perform a blinking motion at a more natural timing. As a result, an improvement in the quality of communication between the interactive device 10 and the user U can be expected.

瞬き動作制御部１１１は、環境情報に応じたタイミングに加えて、さらに別のタイミング（第２タイミング）に、対話装置１０に瞬き動作をさせてもよい。瞬き動作制御部１１１は、例えば、ランダムなタイミングに、対話装置１０に所定期間内に所定回数（例えば、１分間に２０回）の瞬き動作をさせる。瞬き動作制御部１１１は、所定の規則に従ったタイミングに、対話装置１０に瞬き動作をさせてもよい。これにより、対話装置１０は、より自然なタイミングで瞬き動作をする効果が期待できる。 The blinking motion control unit 111 may cause the interactive device 10 to perform the blinking motion at another timing (second timing) in addition to the timing according to the environment information. For example, at random timing, the blinking motion control unit 111 causes the interactive device 10 to blink a predetermined number of times (for example, 20 times per minute) within a predetermined period of time. The blinking motion control unit 111 may cause the interactive device 10 to perform a blinking motion at timing according to a predetermined rule. As a result, the interactive device 10 can be expected to have the effect of performing the blinking motion at a more natural timing.

記憶制御部１１７は、評価値が閾値以上である期間において、学習データ１４３を蓄積してもよい。これにより、対話装置１０は、質の良いコミュニケーションが行われているときに人間が行う瞬きに従って瞬き動作をすることができる。 The storage control unit 117 may accumulate the learning data 143 during the period when the evaluation value is equal to or greater than the threshold. As a result, the interactive device 10 can perform a blinking action in accordance with the blinking performed by a person during high-quality communication.

なお、学習データ１４３があらかじめ記憶部１４に記憶されている場合、対話装置１０は学習データを蓄積する機能（すなわち、記憶制御部１１７）を有しなくてもよい。 Note that if the learning data 143 is stored in the storage unit 14 in advance, the interactive device 10 does not have to have the function of accumulating the learning data (that is, the storage control unit 117).

［第３実施形態］
第３実施形態は、対話装置が表示装置として機能する点で、上述した第１実施形態の対話装置と相違する。以下の説明において、上述した第１実施形態の要素と同一の要素は同じ符号を付して表し、上述した第１実施形態の要素に対応する要素については同じ符号の末尾に「Ａ」を付して表す。 [Third Embodiment]
The third embodiment differs from the first embodiment described above in that the dialogue device functions as a display device. In the following description, elements that are the same as the elements of the first embodiment described above are denoted by the same reference numerals, and elements that correspond to the elements of the first embodiment described above are denoted by the same reference numerals with an "A" at the end. to represent

図１８は、本発明の第３実施形態である対話装置１０Ａの外観構成の一例を示す図である。対話装置１０Ａは、表示領域１６１を有する。表示領域１６１は、画像が表示される領域である。表示領域１６１はオブジェクト２０を表示する。オブジェクト２０は、上述した第１実施形態で説明した対話装置１０の外観と同様の画像である。オブジェクト２０は、顔部２０１と、瞼部２０２とを有する。顔部２０１は、顔に相当する部位である。瞼部２０２は、顔部２０１に配置され、目の瞼に相当する部位である。瞼部２０２は、開閉することにより瞬き動作をする。本実施形態の瞬き動作は、表示領域１６１への画像の表示により行われる点で、上述した第１実施形態と相違する。なお、図１８に示すオブジェクト２０は一例に過ぎず、少なくとも瞬き動作を表現する画像を含んでいればよい。例えば、オブジェクト２０は、少なくとも一つの瞼部を含む。図１に示す外観の対話装置１０の瞼部１０２に代えて表示部が設けられ、該表示部に瞼部２０２に相当するオブジェクトが表示されてもよい。 FIG. 18 is a diagram showing an example of the external configuration of a dialogue device 10A that is the third embodiment of the present invention. The interactive device 10A has a display area 161. FIG. A display area 161 is an area where an image is displayed. A display area 161 displays the object 20 . The object 20 is an image similar to the appearance of the interactive device 10 described in the first embodiment. The object 20 has a face 201 and eyelids 202 . The facial part 201 is a part corresponding to the face. The eyelid part 202 is a part that is arranged on the face part 201 and corresponds to the eyelid. The eyelid part 202 performs a blinking action by opening and closing. The blinking action of this embodiment is different from that of the above-described first embodiment in that an image is displayed on the display area 161 . Note that the object 20 shown in FIG. 18 is merely an example, and may include at least an image representing a blinking motion. For example, object 20 includes at least one eyelid. A display section may be provided in place of the eyelid section 102 of the interactive device 10 having the appearance shown in FIG. 1, and an object corresponding to the eyelid section 202 may be displayed on the display section.

ユーザＵは、対話装置１０Ａとコミュニケーションをとる人である。ユーザＵは、対話装置１０Ａと対面し、表示領域１６１に表示されたオブジェクト２０を観察して対話を行う。 A user U is a person who communicates with the interactive device 10A. The user U faces the interactive device 10A, observes the object 20 displayed in the display area 161, and interacts with it.

図１９は、対話装置１０Ａのハードウェア構成を示すブロック図である。対話装置１０は、制御部１１と、音声入力部１２と、音声出力部１３と、記憶部１４と、撮像部１５と、表示部１６とを有する。記憶部１４は、制御部１１に所定の機能を実現させるためのプログラム１４１Ａを記憶する。表示部１６は、画像を表示する表示領域１６１を有する。表示部１６は、例えば液晶ディスプレイ、有機ＥＬディスプレイ又はその他の表示装置である。 FIG. 19 is a block diagram showing the hardware configuration of the interactive device 10A. The dialogue device 10 has a control unit 11 , an audio input unit 12 , an audio output unit 13 , a storage unit 14 , an imaging unit 15 and a display unit 16 . The storage unit 14 stores a program 141A for causing the control unit 11 to perform a predetermined function. The display unit 16 has a display area 161 that displays an image. The display unit 16 is, for example, a liquid crystal display, an organic EL display, or another display device.

図２０は、対話装置１０Ａの機能構成を示すブロック図である。対話装置１０の制御部１１は、プログラム１４１Ａを実行することにより、瞬き動作制御部１１１Ａと、第１取得部１１２と、瞬き検出部１１３と、第２取得部１１４と、処理部１１５とに相当する機能を実現する。瞬き動作制御部１１１Ａは、表示部１６の表示領域１６１に表示されたオブジェクト２０に、瞬き動作をさせる。瞬き動作制御部１１１Ａは、例えば、瞬き動作させるための瞬き制御データを、表示部１６に供給する。瞬き制御データは、表示部１６の表示を制御するデータである。表示部１６は、瞬き制御データに応じて、オブジェクト２０に瞬き動作をさせる。第１取得部１１２は、オブジェクト２０（瞼部２０２）の瞬き動作のタイミングを取得する。 FIG. 20 is a block diagram showing the functional configuration of the interactive device 10A. By executing the program 141A, the control unit 11 of the interactive device 10 corresponds to the blink operation control unit 111A, the first acquisition unit 112, the blink detection unit 113, the second acquisition unit 114, and the processing unit 115. Realize the function to The blinking motion control unit 111A causes the object 20 displayed in the display area 161 of the display unit 16 to perform a blinking motion. The blinking control unit 111A supplies, for example, blinking control data for blinking to the display unit 16 . The blink control data is data for controlling display on the display unit 16 . The display unit 16 causes the object 20 to blink in accordance with the blink control data. The first acquisition unit 112 acquires the timing of the blinking motion of the object 20 (eyelid part 202).

対話装置１０Ａの動作は、瞬き動作が表示部１６の制御により行われる点を除き、上述した第１実施形態と同じである。 The operation of the interactive device 10A is the same as that of the first embodiment described above, except that the blinking operation is performed under the control of the display unit 16. FIG.

なお、本実施形態の構成は、上述した第２実施形態の対話装置１０に適用することもできる。 The configuration of this embodiment can also be applied to the interactive device 10 of the second embodiment described above.

［変形例］
本発明は上記の実施形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。以下、第１実施形態の対話装置１０の変形例として説明するが、第２実施形態の対話装置１０、及び第３実施形態の対話装置１０Ａにも適用することができる。 [Modification]
The present invention is not limited to the above embodiments, and can be modified as appropriate without departing from the scope of the invention. A modification of the dialogue device 10 of the first embodiment will be described below, but the dialogue device 10 of the second embodiment and the dialogue device 10A of the third embodiment are also applicable.

上述した実施形態の処理部１１５の評価値の算出方法は、一例に過ぎない。処理部１１５は、例えば、時系列でユーザＵの瞬きの回数及び対話装置１０の瞬き動作の回数を計数し、特定の期間（場面）における瞬き及び瞬き動作の回数に基づいて、評価値を算出してもよい。この場合、処理部１１５は、瞬き及び瞬き動作の回数が多い期間ほど、ユーザＵの対話装置１０とのコミュニケーションに対する関心の度合いが高いことを示す評価値を算出する。瞬き及び瞬き動作が多い期間は、それ以外の期間よりも、ユーザＵの瞬きのタイミングと対話装置１０の瞬き動作のタイミングとのタイミング差が小さいと考えられるからである。また、処理部１１５は、評価値を算出しないで、瞬き動作のタイミングとユーザの瞬きのタイミングとの差異に応じた処理を行ってもよい。 The calculation method of the evaluation value of the processing unit 115 of the embodiment described above is merely an example. For example, the processing unit 115 counts the number of blinks of the user U and the number of blinking actions of the interactive device 10 in time series, and calculates an evaluation value based on the number of blinks and blinking actions in a specific period (scene). You may In this case, the processing unit 115 calculates an evaluation value indicating that the degree of interest of the user U in communication with the interactive device 10 is higher in a period in which the number of times of blinking and blinking is greater. This is because the timing difference between the timing of blinking of the user U and the timing of the blinking motion of the interactive device 10 is considered to be smaller than in other periods during periods when there are many blinks and blinking motions. Alternatively, the processing unit 115 may perform processing according to the difference between the timing of the blinking motion and the timing of the user's blinking without calculating the evaluation value.

ユーザＵの瞬きのタイミングと対話装置１０のタイミングとの差異に応じた処理は、発話処理に限られない。処理部１１５は、例えば、対話装置１０を評価する処理を行ってもよい。この場合、処理部１１５は、対話装置１０の識別子と対応付けて評価データを出力する。評価データは、対話装置１０の評価を示す。評価データは、評価値を示すデータであってもよいし、評価値を用いて生成されたデータであってもよい。評価データの出力は、例えば、送信、印刷、表示又はその他の方法により行われる。この変形例によれば、対話装置１０が行うコミュニケーションの質を評価することができる。 The process according to the difference between the timing of blinking of the user U and the timing of the interactive device 10 is not limited to speech processing. The processing unit 115 may perform processing for evaluating the interactive device 10, for example. In this case, the processing unit 115 outputs evaluation data in association with the identifier of the interactive device 10 . The evaluation data indicates the evaluation of the interactive device 10. FIG. The evaluation data may be data indicating an evaluation value, or may be data generated using the evaluation value. Output of the evaluation data is performed, for example, by transmission, printing, display, or other methods. According to this modification, the quality of communication performed by the interactive device 10 can be evaluated.

瞬きの検出は、撮像データを用いる方法以外の方法で行われてもよい。当該方法として、電波センサ（例えば、４ＧＨｚ電波センサーモジュール）、赤外線センサ、ドップラセンサなどの非接触のセンサを用いる方法がある。また、顔の筋力の動きに基づいて瞬きを検出するセンサを用いる方法がある。 Blink detection may be performed by a method other than the method using imaging data. As the method, there is a method using a non-contact sensor such as a radio wave sensor (for example, a 4 GHz radio wave sensor module), an infrared sensor, or a Doppler sensor. There is also a method using a sensor that detects a blink based on the movement of facial muscles.

上述した実施形態で説明した制御部１１が実現した機能の一部又は全部を、対話装置の外部の処理装置が有してもよい。この場合、当該処理装置は、例えば、対話装置を通信（例えば、公衆通信回線を介した通信）により制御する。この制御には、瞬き動作の制御、及び対話処理の制御の一方、又は両方が含まれてもよい。当該処理装置は、複数の対話装置を制御してもよい。要するに、本開示に係る処理装置は、図２１に示すように、対話装置の瞬き動作のタイミングを取得する第１取得部３０１と、前記対話装置のユーザの瞬きのタイミングを取得する第２取得部３０２と、前記瞬き動作のタイミングと前記ユーザの瞬きのタイミングとの差異に応じた処理を行う処理部３０３と、を有する。 Some or all of the functions realized by the control unit 11 described in the above embodiment may be provided by a processing device external to the interactive device. In this case, the processing device, for example, controls the dialogue device by communication (for example, communication via a public communication line). This control may include one or both of blinking control and interactive processing control. The processing device may control multiple interactive devices. In short, as shown in FIG. 21 , the processing device according to the present disclosure includes a first acquisition unit 301 that acquires the timing of the blinking action of the dialogue device, and a second acquisition unit that acquires the timing of the blinking of the user of the dialogue device. 302, and a processing unit 303 that performs processing according to the difference between the timing of the blinking motion and the timing of the blinking of the user.

上述した実施形態の構成及び動作の一部が省略されてもよい。上述した実施形態で説明したかった構成及び動作が追加されてもよい。また、上述した実施形態で説明した処理の実行順は一例に過ぎず、適宜変更されてもよい。 Some of the configurations and operations of the above-described embodiments may be omitted. Configurations and operations that were described in the above embodiments may be added. Also, the execution order of the processes described in the above embodiment is merely an example, and may be changed as appropriate.

制御部１１が実現する機能は、複数のプログラムの組み合わせによって実現され、又は複数のハードウェア資源の連係によって実現され得る。制御部１１の機能がプログラムを用いて実現される場合、この機能を実現するためのプログラム１４１，１４１Ａが、各種の磁気記録媒体、光記録媒体、光磁気記録媒体、半導体メモリ等のコンピュータ読み取り可能な記録媒体に記憶した状態で提供されてもよい。また、このプログラムは、ネットワークを介して配信されてもよい。また、本発明は、処理方法としても把握することができる。 The functions realized by the control unit 11 can be realized by combining a plurality of programs or by linking a plurality of hardware resources. When the functions of the control unit 11 are realized using a program, the programs 141 and 141A for realizing this function are stored in computer-readable media such as various magnetic recording media, optical recording media, magneto-optical recording media, and semiconductor memories. may be provided in a state stored in a suitable recording medium. Also, this program may be distributed via a network. The present invention can also be grasped as a processing method.

１０，１０Ａ：対話装置、１１：制御部、１２：音声入力部、１３：音声出力部、１４：記憶部、１５：撮像部、１６：表示部、２０：オブジェクト、１０１：顔部、１０２：瞼部、１１１，１１１Ａ：瞬き動作制御部、１１２：第１取得部、１１３：瞬き検出部、１１４：第２取得部、１１５：処理部、１１６：環境情報取得部、１１７：記憶制御部、１４１，１４１Ａ：プログラム、１４２：対話データ、１４３：学習データ、１６１：表示領域、２０１：顔部、２０２：瞼部、３０１：瞬き動作制御部、３０２：取得部、３０３：処理部 10, 10A: interactive device, 11: control unit, 12: voice input unit, 13: voice output unit, 14: storage unit, 15: imaging unit, 16: display unit, 20: object, 101: face part, 102: Eyelid part 111, 111A: blinking operation control unit 112: first acquisition unit 113: blink detection unit 114: second acquisition unit 115: processing unit 116: environment information acquisition unit 117: storage control unit 141, 141A: Program, 142: Dialogue Data, 143: Learning Data, 161: Display Area, 201: Face, 202: Eyelid, 301: Blink Operation Control Unit, 302: Acquisition Unit, 303: Processing Unit

Claims

an environment information acquisition unit that acquires environment information indicating the environment around the interactive device;
a blinking motion control unit that controls the frequency of blinking motions performed by the interactive device based on the environment information;
The blinking motion control unit learns the relationship between the surrounding environment of the interactive device and the blinking performed by a human under the environment, and causes the interactive device to perform the blinking motion based on the relationship.
processing equipment.

The environment information is dialogue information from a user to the dialogue device,
2. The processing apparatus of claim 1 .

The blinking motion control unit increases the frequency of blinking motions performed by the interactive device when the environmental information satisfies a predetermined condition.
3. A processing apparatus according to claim 2 .

The blinking motion control unit increases the frequency of the blinking motion when determining that the dialogue information has become a highly unexpected topic based on the learning result.
4. A processing apparatus according to claim 3 .

A computer-implemented processing method comprising:
Acquiring environmental information indicating the surrounding environment of the interactive device,
controlling the frequency of blinking actions performed by the interactive device based on the environmental information;
learning the relationship between the surrounding environment of the dialogue device and the blinking performed by a human under the environment, and causing the dialogue device to perform a blinking action based on the relationship;
Processing method.

to the computer,
Acquiring environmental information indicating the surrounding environment of the interactive device,
controlling the frequency of blinking actions performed by the interactive device based on the environmental information;
A program for learning the relationship between the surrounding environment of the dialogue device and the blinking performed by a person under the environment, and causing the dialogue device to perform the blinking motion based on the relationship.