JP2008146318A

JP2008146318A - Emotion estimation apparatus

Info

Publication number: JP2008146318A
Application number: JP2006332187A
Authority: JP
Inventors: Mikio Danno; 幹男段野; Masahiro Miyaji; 正廣宮治; Taizo Umezaki; 太造梅崎
Original assignee: Toyota Motor Corp; Toyota InfoTechnology Center Co Ltd
Current assignee: Toyota Motor Corp; Toyota InfoTechnology Center Co Ltd
Priority date: 2006-12-08
Filing date: 2006-12-08
Publication date: 2008-06-26
Anticipated expiration: 2026-12-08
Also published as: JP4757787B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an emotion estimation apparatus that recognizes unspecified persons' expressions and estimates the persons' emotions. <P>SOLUTION: The emotion estimation apparatus 100 comprises an expression map generation means 11 for learning expression images of specific persons associated with predetermined emotions to generate expression maps based on a neural network, a region division means 12 for dividing the expression maps for the respective specific persons generated by the expression map generation means 11 into a plurality of regions according to the predetermined emotions, and an emotion estimation means 13 for estimating an unspecified person's emotion according to an expression image U of the unspecified person and the plurality of expression maps with the regions divided by the region division means 12. A region standardization means 14 is further provided for standardizing corresponding regions of the plurality of expression maps, and the emotion estimation means 13 estimates the unspecified person's emotion according to the expression image U of the unspecified person and the plurality of expression maps with the regions standardized. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、コホーネン型のニューラルネットワークによる自己組織化マップを用いた感情推定装置に関し、特に、不特定人物の表情を認識してその人物の感情を推定する感情推定装置に関する。 The present invention relates to an emotion estimation apparatus using a self-organizing map based on a Kohonen type neural network, and more particularly to an emotion estimation apparatus that recognizes the expression of an unspecified person and estimates the emotion of the person.

従来、映画のあるシーンを視聴したユーザが一般的に抱くと思われる感情を示す感情情報とそのシーンを視聴したユーザの表情画像とを関連づけて記憶し、その感情情報と表情画像との関係を学習させたニューラルネットワークによる感情モデルを用いて、特定のユーザの表情画像からその特定のユーザの感情を推定する情報処理装置が知られている（例えば、特許文献１参照。）。 Conventionally, emotional information indicating an emotion generally thought to be held by a user who has watched a scene of a movie is stored in association with the facial expression image of the user who has viewed the scene, and the relationship between the emotional information and the facial expression image is stored. There has been known an information processing apparatus that estimates a specific user's emotion from a specific user's facial expression image using a learned neural network emotion model (see, for example, Patent Document 1).

この情報処理装置は、ユーザ毎に感情モデルを作成し、そのユーザ毎の感情モデルに基づいてそのユーザ毎の感情を推定するので、映画のシーンに対する各ユーザの反応の差異による影響を受けることなく、より正確に各ユーザの感情を推定することができる。
特開２００５−３４６４７１号公報 Since this information processing apparatus creates an emotion model for each user and estimates the emotion for each user based on the emotion model for each user, it is not affected by the difference in the reaction of each user to the movie scene. It is possible to estimate the emotion of each user more accurately.
JP 2005-346471 A

しかしながら、特許文献１に記載の情報処理装置は、感情モデルに表情画像を学習させた特定人物の感情を推定するためのものであり、感情モデルに表情画像を学習させていない不特定人物の表情画像に基づいてその不特定人物の抱く感情を推定することができない。 However, the information processing apparatus described in Patent Document 1 is for estimating the emotion of a specific person who has learned an expression image from an emotion model, and the expression of an unspecified person who has not learned an expression image from the emotion model. The emotions of the unspecified person cannot be estimated based on the image.

上述の点に鑑み、本発明は、コホーネン型のニューラルネットワークを用い、不特定人物の表情を認識してその不特定人物の感情を推定する感情推定装置を提供することを目的とする。 In view of the above-described points, an object of the present invention is to provide an emotion estimation apparatus that uses a Kohonen-type neural network to recognize the facial expression of an unspecified person and estimate the emotion of the unspecified person.

上述の目的を達成するために、第一の発明に係る感情推定装置は、所定の感情に関連付けられた特定人物の表情画像を学習することによりニューラルネットワークによる表情マップを生成する表情マップ生成手段と、前記表情マップ生成手段が生成した前記特定人物のそれぞれに対応する表情マップを前記所定の感情に基づいて複数の領域に区分する領域区分手段と、不特定人物の表情画像と前記領域区分手段により区分された領域を有する複数の表情マップとに基づいて不特定人物の感情を推定する感情推定手段と、を備えることを特徴とする。 In order to achieve the above-mentioned object, an emotion estimation device according to a first invention comprises an expression map generation means for generating an expression map by a neural network by learning an expression image of a specific person associated with a predetermined emotion. The expression map corresponding to each of the specific persons generated by the expression map generation means is divided into a plurality of areas based on the predetermined emotion, the expression image of the unspecified person, and the area classification means Emotion estimation means for estimating an emotion of an unspecified person based on a plurality of facial expression maps having divided areas.

また、第二の発明は、第一の発明に係る感情推定装置であって、前記領域区分手段は、区分される領域の中心位置、形状又は面積の少なくとも一つを予め決定することを特徴とする。 Further, the second invention is the emotion estimation device according to the first invention, wherein the region segmenting means predetermines at least one of the center position, shape or area of the segmented region. To do.

また、第三の発明は、第一の発明に係る感情推定装置であって、前記領域区分手段が区分した領域であって、複数の表情マップのそれぞれで対応する領域を共通化する領域共通化手段を備え、前記感情推定手段は、前記不特定人物の表情画像と前記領域共通化手段により共通化された領域を有する複数の表情マップとに基づいて前記不特定人物の感情を推定することを特徴とする。 Further, the third invention is the emotion estimation device according to the first invention, wherein the region is divided by the region dividing means, and the region is shared by the corresponding region in each of the plurality of facial expression maps. Means for estimating the emotion of the unspecified person based on the facial expression image of the unspecified person and a plurality of facial expression maps having areas shared by the area sharing means. Features.

また、第四の発明は、第三の発明に係る感情推定装置であって、前記領域共通化手段は、前記対応する領域の形状、中心位置又は面積の少なくとも一つを共通にすることを特徴とする。 The fourth invention is the emotion estimation device according to the third invention, wherein the region sharing means shares at least one of the shape, center position or area of the corresponding region. And

上述の手段により、本発明は、コホーネン型のニューラルネットワークを用い、不特定人物の表情を認識してその不特定人物の感情を推定する感情推定装置を提供することができる。 With the above-described means, the present invention can provide an emotion estimation device that recognizes the facial expression of an unspecified person and estimates the emotion of the unspecified person using a Kohonen type neural network.

以下、図面を参照しつつ、本発明を実施するための最良の形態の説明を行う。 Hereinafter, the best mode for carrying out the present invention will be described with reference to the drawings.

図１は、本発明に係る感情推定装置１００の構成例を示すブロック図である。感情推定装置１００は、制御部１、撮像部２及び表示部３から構成され、撮像部２により撮像した人物の表情画像に基づいてその人物の感情を制御部１で推定し推定結果を表示部３に出力する。 FIG. 1 is a block diagram showing a configuration example of an emotion estimation apparatus 100 according to the present invention. The emotion estimation device 100 includes a control unit 1, an imaging unit 2, and a display unit 3. The emotion estimation apparatus 100 estimates the emotion of the person based on the facial expression image of the person captured by the imaging unit 2, and displays the estimation result. 3 is output.

制御部１は、ＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）等を備えたコンピュータであり、表情画像加工手段１０、表情マップ生成手段１１、領域区分手段１２、感情推定手段１３及び領域共通化手段１４に対応するプログラムをＲＯＭに記憶し、それらプログラムをＲＡＭ上に展開して対応する処理をＣＰＵに実行させる。 The control unit 1 is a computer including a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), etc., and includes a facial expression image processing means 10, a facial expression map generation means 11, an area classification means 12, Programs corresponding to the emotion estimation means 13 and the area sharing means 14 are stored in the ROM, and the programs are expanded on the RAM and the corresponding processing is executed by the CPU.

撮像部２は、人物の表情画像を撮像するための手段であり、例えば、ＣＣＤ（Charge Coupled Device）カメラやＣＭＯＳ(Complementary Metal-Oxide Semiconductor)カメラがある。 The imaging unit 2 is a means for capturing a facial expression image of a person, for example, a CCD (Charge Coupled Device) camera or a CMOS (Complementary Metal-Oxide Semiconductor) camera.

表示部３は、感情推定装置１００の推定結果を出力するための手段であり、例えば、液晶ディスプレイであって、「怒り」、「喜び」、「悲しみ」、「驚き」、「嫌悪」、「恐怖」、「諦め」等の感情のうち撮像部２が撮像した表情画像に合致する感情を文字、映像、イラスト、又は、後述の自己組織化マップ等で表現して出力する。なお、感情推定装置１００は、推定結果を音声出力するようにしてもよい。 The display unit 3 is a means for outputting the estimation result of the emotion estimation apparatus 100. For example, the display unit 3 is a liquid crystal display, and includes “anger”, “joy”, “sadness”, “surprise”, “disgust”, “ Of emotions such as “fear” and “praise”, emotions that match the facial expression image captured by the imaging unit 2 are expressed and output as characters, video, illustrations, a self-organizing map described later, or the like. The emotion estimation apparatus 100 may output the estimation result by voice.

次に、制御部１が有する表情画像加工手段１０について説明する。 Next, the facial expression image processing means 10 included in the control unit 1 will be described.

表情画像加工手段１０は、撮像部２で撮像した画像をニューラルネットワークに入力できるよう加工するための手段である。ここで、図２及び図３を参照しながら表情画像加工手段１０がニューラルネットワークに入力できるよう表情画像を加工する処理の流れについて説明する。なお、図２は、表情画像加工処理の流れを示すフローチャートであり、図３は、表情画像加工処理の各ステップを説明するための表情画像である。 The facial expression image processing means 10 is a means for processing so that an image captured by the imaging unit 2 can be input to a neural network. Here, the flow of processing for processing a facial expression image so that the facial expression image processing means 10 can input to the neural network will be described with reference to FIGS. 2 is a flowchart showing the flow of facial expression image processing, and FIG. 3 is a facial expression image for explaining each step of facial expression image processing.

最初に、表情画像加工手段１０は、撮像部２が撮像した図３（Ａ）に示すような表情画像（例えば、５１２×５１２ピクセルのビットマップファイルである。）にエッジ検出、フィルタ処理等の画像処理を施して図３（Ｂ）に示すように人物の目の位置を特定する（ステップＳ１）。 First, the facial expression image processing means 10 performs edge detection, filtering processing, etc. on the facial expression image (for example, a 512 × 512 pixel bitmap file) as shown in FIG. Image processing is performed to specify the position of the human eye as shown in FIG. 3B (step S1).

その後、表情画像加工手段１０は、同様に、図３（Ｃ）に示すように人物の鼻の位置を特定し（ステップＳ２）、二値化処理により表情画像をグレースケール化する（ステップＳ３）。なお、表情画像加工手段１０は、表情画像をグレースケール化した上で人物の目及び鼻の位置を特定するようにしてもよい。 Thereafter, the facial expression image processing means 10 similarly specifies the position of the person's nose as shown in FIG. 3C (step S2), and converts the facial expression image to gray scale by binarization processing (step S3). . The facial expression image processing means 10 may specify the positions of the eyes and nose of the person after converting the facial expression image to gray scale.

その後、表情画像加工手段１０は、特定した目及び鼻の位置に基づいて表情画像に写った顔の傾きを認識し、アフィン変換により表情画像を平行移動又は回転移動等させることにより、顔の傾きを補正し（ステップＳ４）、図３（Ｄ）に示すように鼻位置が表情画像の中心にくるようにして表情画像を正規化する（ステップＳ５）。 After that, the facial expression image processing means 10 recognizes the inclination of the face reflected in the facial expression image based on the specified eye and nose positions, and translates or rotates the facial expression image by affine transformation, thereby correcting the facial inclination. Is corrected (step S4), and the facial expression image is normalized so that the nose position is at the center of the facial expression image as shown in FIG. 3D (step S5).

その後、表情画像加工手段１０は、表情画像から所定サイズ（例えば、２５６×２５６ピクセル）の表情部分（例えば、人物の表情を表現する、目、鼻、口、頬を含む画像部分であって、図３（Ｅ）に示すような部分をいう。）を切り出し（ステップＳ６）、切り出した表情部分の画像をニューラルネットワークに対する入力画像として使用できるようＪＰＥＧ(Joint Photographic Expert Group)法等の画像圧縮技術を用いて所定サイズ（例えば、６４×６４ピクセル）に圧縮する（ステップＳ７）。 After that, the facial expression image processing means 10 is a facial expression part of a predetermined size (for example, 256 × 256 pixels) from the facial expression image (for example, an image part including eyes, nose, mouth, and cheeks expressing a facial expression of a person, The image compression technique such as JPEG (Joint Photographic Expert Group) method is used so that the image of the facial expression part cut out can be used as an input image to the neural network (step S6). To a predetermined size (for example, 64 × 64 pixels) (step S7).

次に、制御部１が有する表情マップ生成手段１１について説明する。 Next, the expression map generating means 11 included in the control unit 1 will be described.

表情マップ生成手段１１は、コホーネン型のニューラルネットワークを用いて自己組織化マップである表情マップを生成するための手段である。ここで、図４を参照しながら表情マップ生成手段１１が表情マップを生成する処理の流れについて説明する。なお、図４は、表情マップ生成処理の流れを示すフローチャートである。 The expression map generation means 11 is a means for generating an expression map which is a self-organizing map using a Kohonen type neural network. Here, a flow of processing in which the expression map generation unit 11 generates an expression map will be described with reference to FIG. FIG. 4 is a flowchart showing the flow of facial expression map generation processing.

「自己組織化マップ」は、多次元のデータを二次元平面に反映させるニューラルネットワークの手法であり、似た特徴を持つデータをマップ上の近くに配置し、そうでないものをマップ上の離れた位置に配置して高次元の情報を視覚化し、データ同士の関係を直感的に理解し易いものにする手法である。 "Self-organizing map" is a neural network technique that reflects multidimensional data on a two-dimensional plane, placing data with similar characteristics close to the map, and other things that are separated on the map This is a technique for visualizing high-dimensional information by placing it at a position and making it easy to intuitively understand the relationship between data.

最初に、表情マップ生成手段１１は、ＲＯＭ等の記憶装置に格納された学習係数α、学習半径ｒ及び設定学習回数ｔ等の値を読み出し（ステップＳ１１）、所定サイズ（３２×３２ノード）の表情マップにおける各ノードの値を乱数で初期化する（ステップＳ１２）。 First, the facial expression map generation means 11 reads values such as a learning coefficient α, a learning radius r, and a set learning count t stored in a storage device such as a ROM (step S11), and has a predetermined size (32 × 32 nodes). The value of each node in the facial expression map is initialized with a random number (step S12).

表情マップは、二次元座標で構成され、各座標に一つのニューロンを有する。ニューロンは、例えば、入力画像のピクセル数（例えば、６４×６４ピクセルの場合、４０９６個となる。）に等しい数の要素数を備える重みベクトルｗとして表される。なお、入力画像は、ピクセル数の要素数を有するベクトルとして扱われ、各要素は、例えば、各ピクセルの輝度の値とされる。 The facial expression map is composed of two-dimensional coordinates, and has one neuron at each coordinate. The neuron is represented as a weight vector w having the number of elements equal to the number of pixels of the input image (for example, 4096 in the case of 64 × 64 pixels). Note that the input image is handled as a vector having the number of elements of the number of pixels, and each element is, for example, a value of luminance of each pixel.

表情マップの初期化は、重みベクトルｗの各要素を乱数で初期化する処理であり、「設定学習回数ｔ」は、入力画像である学習用画像を学習（所定の条件を満たす重みベクトルｗの各要素を更新することをいう。）する回数を示す設定値である。 The expression map initialization is a process of initializing each element of the weight vector w with a random number, and the “set learning count t” is a learning image that is an input image (the weight vector w satisfying a predetermined condition). This is a setting value indicating the number of times each element is updated.

学習用画像は、映画や写真等、所定の感情を抱かせる情報を特定人物に提示しながらその特定人物の表情を撮像した表情画像群であり、各学習用画像には所定の感情が関連付けられる。 The learning image is a group of facial expression images obtained by capturing information on a specific person while presenting information that gives a predetermined emotion such as a movie or a photograph to the specific person. Each learning image is associated with a predetermined emotion. .

また、「学習係数α」は、ニューロンの重みを更新するために用いられる係数であり、以下の数式（１） The “learning coefficient α” is a coefficient used to update the weight of the neuron, and the following formula (1)

で利用される。ここで、ｉは、表情マップ上の位置を示す番号であり、ｘは、学習用画像を示すベクトルである。 Used in. Here, i is a number indicating the position on the facial expression map, and x is a vector indicating the learning image.

「学習半径ｒ」は、表情マップ上の範囲を特定するための設定値であり、表情マップ生成手段１１は、後述の類似度が最大となるニューロンを中心とし、半径をｒ（単位はニューロンである。）とする円に包含されるニューロンの重みベクトルｗを上述の数式（１）を用いて更新する。 “Learning radius r” is a set value for specifying a range on the expression map, and the expression map generation means 11 is centered on a neuron having the maximum similarity, which will be described later, and has a radius r (unit is a neuron). The weight vector w of the neuron included in the circle “Yes” is updated using the above equation (1).

図４を再度参照し、表情マップ生成手段１１が表情マップを生成する処理の流れについて説明を継続する。 Referring to FIG. 4 again, the description of the flow of processing in which the expression map generation unit 11 generates the expression map will be continued.

その後、表情画像加工手段１０によって加工された学習用画像を取得すると（ステップＳ１３）、表情マップ生成手段１１は、学習用画像と各ニューロンとの間の類似度を算出する（ステップＳ１４）。 Thereafter, when the learning image processed by the facial expression image processing means 10 is acquired (step S13), the facial expression map generation means 11 calculates the similarity between the learning image and each neuron (step S14).

類似度は、例えば、学習用画像ベクトルｘと各ニューロンの重みベクトルｗとの間のマハラノビス距離又はユークリッド距離で表されるが、好適には、マハラノビス距離が利用される。 The similarity is represented by, for example, the Mahalanobis distance or the Euclidean distance between the learning image vector x and the weight vector w of each neuron. Preferably, the Mahalanobis distance is used.

入力された学習用画像と全てのニューロンとの間の類似度が算出された後、表情マップ生成手段１１は、類似度が最大となるニューロンを特定し（ステップＳ１５）、表情マップ上で類似度が最大となるニューロンの位置を中心とし半径を学習半径ｒとする円で示される範囲内にある全てのニューロンが有する重みベクトルｗの各要素を数式（１）により更新する（ステップＳ１６）。 After the similarity between the input learning image and all the neurons is calculated, the facial expression map generation means 11 identifies the neuron with the maximum similarity (step S15), and the similarity on the facial expression map. Each element of the weight vector w possessed by all the neurons within the range indicated by the circle whose center is the position of the neuron where is the maximum and whose radius is the learning radius r is updated by Equation (1) (step S16).

その後、表情マップ生成手段１１は、予め用意された学習用画像（例えば、感情毎に一枚ずつ用意される。）の全てを使用して学習を行ったか否かを判定し（ステップＳ１７）、全ての学習用画像を未だ使用していない場合には（ステップＳ１７のＮＯ）、次の学習用画像をハードディスク等の記憶装置から読み出して（ステップＳ１３）、ステップＳ１４乃至ステップＳ１６を実行し、全ての学習用画像を使用して学習を実行するまで、この一連の処理を繰り返す。 Thereafter, the facial expression map generation means 11 determines whether learning has been performed using all of the learning images prepared in advance (for example, one for each emotion) (step S17). If all the learning images have not been used yet (NO in step S17), the next learning image is read from a storage device such as a hard disk (step S13), and steps S14 to S16 are executed. This series of processes is repeated until learning is performed using the learning images.

全ての学習用画像を使用して学習を実行した場合（ステップＳ１７のＹＥＳ）、表情マップ生成手段１１は、予め設定された設定学習回数ｔとこれまでに実行した学習の回数とを比較する（ステップＳ１８）。 When learning is performed using all the learning images (YES in step S17), the facial expression map generation unit 11 compares the preset learning number t with the number of learnings performed so far ( Step S18).

実行した学習の回数が設定学習回数ｔ以下の場合（ステップＳ１８のＮＯ）、表情マップ生成手段１１は、学習回数をインクリメントし（ステップＳ１９）、再度、全ての学習用画像を用いてステップＳ１３乃至ステップＳ１７を実行させる。 When the number of learnings performed is equal to or less than the set learning number t (NO in step S18), the facial expression map generation unit 11 increments the number of learnings (step S19) and uses all the learning images again to perform steps S13 to S13. Step S17 is executed.

表情マップ生成手段１１は、実行した学習の回数が設定学習回数ｔを上回るまで、ステップＳ１３乃至ステップＳ１７を繰り返し、実行した学習の回数が設定学習回数ｔを上回った場合（ステップＳ１８のＹＥＳ）、後述の領域区分手段１２により表情マップを感情毎に区分する。 The facial expression map generation means 11 repeats Steps S13 to S17 until the number of learnings performed exceeds the set learning number t, and when the number of performed learnings exceeds the set learning number t (YES in Step S18), The expression map is divided for each emotion by the area dividing means 12 described later.

次に、制御部１が有する領域区分手段１２について説明する。 Next, the area sorting means 12 included in the control unit 1 will be described.

領域区分手段１２は、表情マップ生成手段１１が生成した表情マップを感情毎の領域（以下、「感情領域」という。）に区分する手段である。 The area classifying means 12 is a means for classifying the facial expression map generated by the facial expression map generating means 11 into emotional areas (hereinafter referred to as “emotional areas”).

また、図５は、領域区分手段１２により区分された表情マップの例を示す図であり、格子で区切られた領域のそれぞれがニューロンに対応する。なお、領域区分手段１２は、各学習用画像に関連付けられた感情情報に基づいて、境界により区分された各感情領域をラベル付けし、例えば、図５に示す表情マップの領域Ｒ１に「怒り」ラベルを対応付け、領域Ｒ２に「喜び」ラベルを対応付けるようにする。その他、領域Ｒ３乃至領域Ｒ１１にも感情を表すラベルを対応付けるようにする。 FIG. 5 is a diagram showing an example of a facial expression map segmented by the region segmenting means 12, and each of the regions partitioned by the grid corresponds to a neuron. The area classification unit 12 labels each emotion area classified by the boundary based on the emotion information associated with each learning image, and, for example, “angry” is added to the area R1 of the facial expression map shown in FIG. The labels are associated with each other so that the “joy” label is associated with the region R2. In addition, labels representing emotions are also associated with the regions R3 to R11.

領域区分手段１２は、各感情を代表する複数の学習用画像のそれぞれに最も類似するニューロン（各感情を代表するニューロンであり、以下「中心ニューロン」という。）と、何れかの領域に区分しようとしているニューロン（以下、「対象ニューロン」という。）との間のマハラノビス距離又はユークリッド距離を算出し、算出値が最も小さくなる中心ニューロンとその対象ニューロンとが表情マップ上の同じ領域となるよう境界を引くようにする。 The area segmentation means 12 classifies a neuron most similar to each of the plurality of learning images representing each emotion (a neuron representing each emotion, hereinafter referred to as “central neuron”) and any one of the areas. The Mahalanobis distance or Euclidean distance between the target neuron (hereinafter referred to as “target neuron”) is calculated, and a boundary is set so that the central neuron with the smallest calculated value and the target neuron are in the same area on the expression map. To pull.

また、領域区分手段１２は、表情マップ上の隣接する二つのニューロンにおける重みベクトル間のマハラノビス距離又はユークリッド距離を算出し、算出値が所定値以上となるニューロン間に境界を引くようにしてもよい。 In addition, the region classification unit 12 may calculate the Mahalanobis distance or the Euclidean distance between the weight vectors in two adjacent neurons on the expression map, and draw a boundary between neurons where the calculated value is a predetermined value or more. .

或いは、領域区分手段１２は、複数の学習用画像のそれぞれとは無関係に、各感情を代表する中心ニューロンの表情マップ上の位置を予め決定しておき（この場合の中心ニューロンを特に「固定中心ニューロン」という。）、固定中心ニューロンと対象ニューロンとの間のマハラノビス距離又はユークリッド距離を算出し、算出値が最も小さくなる固定中心ニューロンと対象ニューロンとが表情マップ上の同じ領域となるよう境界を引くようにしてもよい（以下、この区分方法を「中心ニューロン固定クラスタリング」という。）。なお、この場合、固定中心ニューロンの表情マップ上の位置は、複数の表情マップで共通するものとする。 Alternatively, the region classifying unit 12 determines in advance the position of the central neuron representing each emotion on the facial expression map regardless of each of the plurality of learning images (in this case, the central neuron in this case is particularly “fixed center”). Neuron ”)), the Mahalanobis distance or Euclidean distance between the fixed central neuron and the target neuron is calculated, and the boundary is set so that the fixed central neuron and the target neuron with the smallest calculated value are in the same area on the facial expression map. (Hereinafter, this classification method is referred to as “central neuron fixed clustering”). In this case, the position of the fixed central neuron on the facial expression map is common to a plurality of facial expression maps.

これにより、領域区分手段１２は、特定人物の特定の表情画像による過大な影響を受けることなく、複数の表情マップ上の共通する位置に各感情領域を形成させることができる。 Thereby, the area | region classification means 12 can form each emotion area | region in the common position on several facial expression maps, without being influenced too much by the specific facial expression image of a specific person.

或いは、領域区分手段１２は、固定中心ニューロンと対象ニューロンとの間の表情マップ上の距離Ｌが所定距離以上の場合（例えば、表情マップ上のＭ１点（ｋ、ｌ）とＭ２点（ｍ、ｎ）との間の距離Ｌは、（ｋ−ｍ）^２＋（ｌ−ｎ）^２の平方根で表され、所定距離は、学習半径ｒと同じ距離（ニューロン数）とされる。）、その対象ニューロンとその固定中心ニューロンとが表情マップ上の同じ領域とならないように制限してもよい（以下、この区分方法を「条件付きクラスタリング」という。）。 Alternatively, the region segmenting means 12 may select the M1 point (k, l) and the M2 point (m, The distance L to n) is represented by the square root of (km) ² + (1−n) ² , and the predetermined distance is the same distance (number of neurons) as the learning radius r). The target neuron and its fixed central neuron may be restricted so as not to be in the same region on the expression map (hereinafter, this classification method is referred to as “conditional clustering”).

特定の固定中心ニューロンからの表情マップ上の距離が学習半径以上となる対象ニューロンは、学習時にその固定中心ニューロンの影響を受けていないからであり、この条件付きクラスタリングにより、領域区分手段１２は、その固定中心ニューロンの影響を受ける対象ニューロンだけをその固定中心ニューロンと同じ領域に区分するためである。 This is because a target neuron whose distance on a facial expression map from a specific fixed center neuron is equal to or larger than the learning radius is not affected by the fixed center neuron during learning. This is because only the target neuron affected by the fixed central neuron is divided into the same area as the fixed central neuron.

これにより、領域区分手段１２は、表情マップ上に各感情領域の飛び地や突出地等が生成されるのを防止することができる。 Thereby, the area | region classification means 12 can prevent the enclave of each emotion area | region, a protrusion land, etc. being produced | generated on an expression map.

図６は、領域区分手段１２による条件付きクラスタリングを説明するための図であり、図６（Ａ）が中心ニューロン固定クラスタリングによる区分を示し、図６（Ｂ）が条件付きクラスタリングによって改善された、図６（Ａ）の中心ニューロン固定クラスタリングによる区分を示す。なお、図６（Ａ）の灰色部分は、条件付きクラスタリングによって改善される部分を示す。 FIG. 6 is a diagram for explaining conditional clustering by the region segmenting means 12, FIG. 6 (A) shows segmentation by central neuron fixed clustering, and FIG. 6 (B) is improved by conditional clustering. FIG. 6A shows division by central neuron fixed clustering. In addition, the gray part of FIG. 6 (A) shows the part improved by conditional clustering.

或いは、領域区分手段１２は、固定中心ニューロン間の距離及び各感情領域の面積（表情マップ上の面積をいい、具体的には領域に含まれるニューロンの数をいう。）が各表情マップで等しくなるよう予め各感情領域の配置、形状及び大きさを設定して表情マップを区分するようにしてもよい。 Alternatively, the area segmentation means 12 has the same distance between fixed central neurons and the area of each emotional area (the area on the expression map, specifically the number of neurons included in the area) in each expression map. The expression map may be divided by setting the arrangement, shape, and size of each emotion area in advance.

なお、「固定中心ニューロン間の距離」は、ある表情から別の表情に遷移したときの距離をいい、「遷移」とは、ある表情から別の表情への移り変わりの過程をいう。また、ある表情から別の表情へ遷移するとき、表情変化は、エネルギー的に最小変化が保証される経路を通り、経路上にある表情の画像（ニューロン）は、表情間の中間画像（ニューロン）として扱われる。この経路は、ＤＰ(Dynamic Programming)マッチング等により決定される。 The “distance between fixed central neurons” refers to the distance when transitioning from one facial expression to another, and “transition” refers to the process of transition from one facial expression to another. Also, when transitioning from one facial expression to another, facial expression changes through a path that guarantees the minimum change in energy, and the facial expression image (neuron) on the path is an intermediate image (neuron) between facial expressions Are treated as This route is determined by DP (Dynamic Programming) matching or the like.

図７は、各感情領域の配置、形状及び大きさを等しくした表情マップの例を示す模式図であり、Ｅ１〜Ｅ７が七つの感情領域それぞれの固定中心ニューロンを示し、形状及び大きさが等しい六角形（点線で示す。）のそれぞれが各感情領域の形状及び大きさを示す。 FIG. 7 is a schematic diagram showing an example of an expression map in which the arrangement, shape, and size of each emotion area are equal. E1 to E7 show the fixed central neurons of each of the seven emotion areas, and have the same shape and size. Each hexagon (indicated by a dotted line) indicates the shape and size of each emotional area.

また、固定中心ニューロンの位置は、各六角形の中心に配置される。なお、各感情領域は、同じ順番で二次元平面上に循環的に配置され、例えば、固定中心ニューロンＥ１を有する感情領域は、固定中心ニューロンＥ７を有する感情領域を常に左側に隣接させる。 The position of the fixed central neuron is arranged at the center of each hexagon. Each emotion region is cyclically arranged on the two-dimensional plane in the same order. For example, an emotion region having a fixed central neuron E1 always has an emotion region having a fixed central neuron E7 adjacent to the left side.

このように、固定中心ニューロン間の距離を等しくすることは、各表情間の中間ニューロンの数を等しくすることとなり、感情推定装置１００は、特定の表情による過大な影響を受けることなく、表情画像を平等に扱いながら感情を推定することができる。 Thus, equalizing the distance between the fixed central neurons equalizes the number of intermediate neurons between each facial expression, and the emotion estimation apparatus 100 is not affected excessively by a specific facial expression, and the facial expression image Emotion can be estimated while treating

次に、制御部１が有する感情推定手段１３について説明する。 Next, the emotion estimation means 13 included in the control unit 1 will be described.

感情推定手段１３は、領域区分手段１２が区分した表情マップを用いて撮像部２で撮像された表情画像からその人物の感情を推定するための手段である。ここで、図８を参照しながら感情推定手段１３が感情を推定する処理の流れについて説明する。なお、図８は、感情推定処理の流れを示すフローチャートである。 The emotion estimation unit 13 is a unit for estimating the person's emotion from the facial expression image captured by the imaging unit 2 using the facial expression map segmented by the region segmenting unit 12. Here, the flow of processing in which the emotion estimation means 13 estimates an emotion will be described with reference to FIG. FIG. 8 is a flowchart showing the flow of emotion estimation processing.

最初に、感情推定手段１３は、撮像部２を介して感情推定の対象となる表情画像Ｐを取得する（ステップＳ２１）。 Initially, the emotion estimation means 13 acquires the facial expression image P used as the object of emotion estimation via the imaging part 2 (step S21).

その後、感情推定手段１３は、表情画像加工手段１０によりニューラルネットワークに入力できるよう表情画像Ｐを加工し（ステップＳ２２）、加工した表情画像Ｐ（ベクトルとして扱われる。）と表情マップ（表情マップ生成手段１１により既に学習が行われ、領域区分手段１２により既に区分された表情マップである。）上の各ニューロンとの間の類似度を算出する（ステップＳ２３）。 Thereafter, the emotion estimation means 13 processes the expression image P so that it can be input to the neural network by the expression image processing means 10 (step S22), and the processed expression image P (handled as a vector) and expression map (expression map generation). It is a facial expression map that has already been learned by the means 11 and has already been classified by the area classification means 12. The similarity between each neuron on the top is calculated (step S23).

なお、感情推定手段１３は、表情画像Ｐと各ニューロンとの間の類似度としてマハラノビス距離又はユークリッド距離を用い、マハラノビス距離又はユークリッド距離の値が小さい程、表情画像Ｐとの間の類似度が高いものとする。 The emotion estimation means 13 uses the Mahalanobis distance or the Euclidean distance as the similarity between the expression image P and each neuron, and the similarity between the expression image P and the expression image P decreases as the value of the Mahalanobis distance or the Euclidean distance decreases. High.

その後、感情推定手段１３は、算出した類似度の最大値と所定の閾値とを比較し（ステップＳ２４）、類似度の最大値が所定の閾値以上となる場合には（ステップＳ２４のＹＥＳ）、
類似度が最大となるニューロンが属する領域を特定し、その特定された領域が有するラベルを表情画像Ｐの人物が有する感情であると推定し、その旨を表示部３に出力する（ステップＳ２５）。 Thereafter, the emotion estimation means 13 compares the calculated maximum value of similarity with a predetermined threshold (step S24), and if the maximum value of similarity is equal to or greater than the predetermined threshold (YES in step S24),
The region to which the neuron having the maximum similarity belongs is specified, the label that the specified region has is estimated to be the emotion that the person of the facial expression image P has, and that effect is output to the display unit 3 (step S25). .

一方、類似度の最大値が所定の閾値を下回る場合（ステップＳ２４のＮＯ）、感情推定手段１３は、表情画像Ｐの人物が有する感情を推定できない、或いは、表情画像Ｐの人物が不特定人物（ニューラルネットワークが学習していない人物をいう。）であるとしてその旨を表示部３に出力する（ステップＳ２６）。 On the other hand, when the maximum value of the similarity is lower than the predetermined threshold (NO in step S24), the emotion estimation unit 13 cannot estimate the emotion that the person of the expression image P has, or the person of the expression image P is an unspecified person (It means a person who has not learned the neural network.) This is output to the display unit 3 (step S26).

図９は、感情推定手段１３によって算出された、表情画像Ｐと表情マップ上の各ニューロンとの間の類似度の分布を示す図であり、色が濃い程、類似度が高いことを示す。図９に示すように、類似度（類似度の最大値は、所定の閾値以上であるものとする。）が最大となるニューロンが領域Ｒ５の右上部分に位置する場合、感情推定手段１３は、表情画像Ｐの人物が有する感情が領域Ｒ５のラベルが示す感情（例えば、「驚き」である。）であると推定する。 FIG. 9 is a diagram showing the distribution of similarity between the facial expression image P and each neuron on the facial expression map calculated by the emotion estimation means 13, and the darker the color, the higher the similarity. As shown in FIG. 9, when the neuron with the maximum similarity (the maximum value of the similarity is equal to or greater than a predetermined threshold) is located in the upper right part of the region R5, the emotion estimation means 13 It is estimated that the emotion of the person in the facial expression image P is the emotion indicated by the label in the region R5 (for example, “surprise”).

また、感情領域の境界付近では、異なる感情領域に属するとしても、隣り合うニューロン同士が比較的高い類似度を有するため、その境界を跨いで類似度の分布が見られる場合もあるが、感情推定手段１３は、類似度が最大となるニューロンがどの感情領域に属するかで感情を推定するばかりではなく、類似度の分布全体がどの感情領域に属するかで感情を推定することもできるので、より正確な感情の推定が可能となる。 Also, even if the emotional region is near the boundary, even if it belongs to a different emotional region, adjacent neurons have a relatively high similarity, so there may be a distribution of similarity across the boundary. The means 13 not only estimates the emotion according to which emotion region the neuron with the maximum similarity belongs, but also can estimate the emotion according to which emotion region the entire similarity distribution belongs to. Accurate emotion estimation is possible.

次に、制御部１が有する領域共通化手段１４について説明する。 Next, the area sharing means 14 included in the control unit 1 will be described.

領域共通化手段１４は、表情マップ生成手段１１が生成した特定人物のそれぞれに対応する複数の表情マップにおける各感情領域を共通化するための手段であり、ひいては、不特定人物の表情画像からその不特定人物の感情を推定するために用いる表情マップを生成するための手段である。 The area sharing means 14 is a means for sharing each emotion area in a plurality of facial expression maps corresponding to each of the specific persons generated by the facial expression map generation means 11. This is a means for generating a facial expression map used to estimate the emotion of an unspecified person.

例えば、領域共通化手段１４は、表情マップ生成手段１１が特定人物毎に生成した複数の表情マップを取得し、複数の表情マップのそれぞれにおいて、対応する各感情領域（例えば、複数の表情マップのそれぞれが有する「怒り」領域をいう。）のセントロイドベクトル（中心ニューロン）の位置が等しくなるよう調整する。 For example, the region sharing unit 14 acquires a plurality of facial expression maps generated for each specific person by the facial expression map generation unit 11, and each emotion region (for example, a plurality of facial expression maps) in each of the plurality of facial expression maps. Adjust the positions of the centroid vectors (central neurons) of the “angry” area of each to be equal.

また、領域共通化手段１４は、複数の表情マップのそれぞれにおいて、対応する各感情領域の形状又は面積が等しくなるよう、対応する各感情領域の中心ニューロンの位置を調整するようにしてもよい。 Further, the area sharing means 14 may adjust the position of the central neuron of each corresponding emotion area so that the shape or area of each corresponding emotion area becomes equal in each of the plurality of facial expression maps.

この場合、領域共通化手段１４は、例えば、ＬＢＧ（Linde-Buzo-Gray）アルゴリズムを用いて各感情領域の形状又は面積を調整しながら各感情領域の中心ニューロンの位置を決定するようにしてもよい。 In this case, the region sharing means 14 may determine the position of the central neuron of each emotion region while adjusting the shape or area of each emotion region using, for example, an LBG (Linde-Buzo-Gray) algorithm. Good.

図１０は、人物Ａ、Ｂ、Ｃの三人の特定人物の学習用画像を用いて生成された三つの表情マップを示し、領域共通化手段１４により各感情領域における中心ニューロンＣＶ１乃至ＣＶ９の各表情マップ上の位置が共通化されていることを示す。 FIG. 10 shows three facial expression maps generated using learning images of three specific persons A, B, and C, and each of the central neurons CV1 to CV9 in each emotional region by the region sharing means 14 is shown. Indicates that the position on the facial expression map is shared.

その後、感情推定手段１３は、領域共通化手段１４によって共通化された複数の表情マップを用いて撮像部２で撮像された不特定人物の表情画像からその人物の感情を推定する。 Thereafter, the emotion estimation unit 13 estimates the emotion of the person from the facial expression image of the unspecified person captured by the imaging unit 2 using a plurality of facial expression maps shared by the area sharing unit 14.

なお、感情推定装置１００は、中心ニューロン固定クラスタリング又は条件付きクラスタリングを用いて領域区分手段１２が区分した感情領域を利用することで、領域共通化手段１４による共通化と同等の効果を得ることができ、この場合、領域共通化手段１４による共通化は不要となる。 The emotion estimation apparatus 100 can obtain the same effect as the sharing by the area sharing means 14 by using the emotion area divided by the area sorting means 12 using central neuron fixed clustering or conditional clustering. In this case, sharing by the area sharing unit 14 is not necessary.

ここで、図１１を参照しながら感情推定手段１３が感情を推定する処理の流れについて説明する。なお、図１１は、感情推定処理の流れを示すフローチャートである。 Here, the flow of processing in which the emotion estimation means 13 estimates an emotion will be described with reference to FIG. FIG. 11 is a flowchart showing the flow of emotion estimation processing.

最初に、感情推定手段１３は、撮像部２を介して感情推定対象となる不特定人物の表情画像Ｕを取得する（ステップＳ３１）。 First, the emotion estimation means 13 acquires the facial expression image U of an unspecified person who is an emotion estimation target via the imaging unit 2 (step S31).

その後、感情推定手段１３は、表情画像加工手段１０によりニューラルネットワークに入力できるよう表情画像Ｕを加工し（ステップＳ３２）、加工した表情画像Ｕ（ベクトルとして扱われる。）と複数の表情マップ（領域区分手段１２により中心ニューロンの位置が予め決定された複数の表情マップ、又は、領域共通化手段１４により中心ニューロンの位置が調整された複数の表情マップである。）上の各ニューロンとの間の類似度を算出する（ステップＳ３３）。 Thereafter, the emotion estimation means 13 processes the expression image U so that it can be input to the neural network by the expression image processing means 10 (step S32), and the processed expression image U (handled as a vector) and a plurality of expression maps (regions). A plurality of facial expression maps in which the position of the central neuron is determined in advance by the classifying means 12, or a plurality of facial expression maps in which the position of the central neuron is adjusted by the area sharing means 14). The similarity is calculated (step S33).

なお、感情推定手段１３は、表情画像Ｕと各ニューロンとの間の類似度としてマハラノビス距離又はユークリッド距離を用い、マハラノビス距離又はユークリッド距離の値が小さい程、表情画像Ｕとの間の類似度が高いものとする。 The emotion estimation means 13 uses the Mahalanobis distance or the Euclidean distance as the similarity between the expression image U and each neuron, and the similarity between the expression image U and the expression image U decreases as the value of the Mahalanobis distance or the Euclidean distance decreases. High.

その後、感情推定手段１３は、各表情マップにおいて、算出した類似度の最大値と所定の閾値とを比較し（ステップＳ３４）、類似度の最大値が所定の閾値以上となる場合には（ステップＳ３４のＹＥＳ）、類似度が最大となるニューロンが属する感情領域を特定し（ステップＳ３５）、特定した感情領域が示す値（例えば、感情の種類を示す番号である。）をその表情マップの値（以下、「表情マップ値」とする。）としてＲＡＭ等の記憶装置に保存する。 Thereafter, the emotion estimation means 13 compares the calculated maximum similarity value with a predetermined threshold value in each facial expression map (step S34), and if the maximum similarity value is equal to or greater than the predetermined threshold value (step S34). (YES in S34), the emotion region to which the neuron having the maximum similarity belongs is specified (step S35), and the value (for example, a number indicating the type of emotion) indicated by the specified emotion region is the value of the facial expression map. (Hereinafter referred to as “expression map value”) and stored in a storage device such as a RAM.

一方、類似度の最大値が所定の閾値を下回る場合（ステップＳ３４のＮＯ）、感情推定手段１３は、その表情マップ値を無効（例えば、値ゼロとする。）とする（ステップＳ３６）。 On the other hand, when the maximum value of the similarity is below the predetermined threshold (NO in step S34), the emotion estimation means 13 invalidates the facial expression map value (for example, sets the value to zero) (step S36).

その後、感情推定手段１３は、感情推定装置１００が有する全ての表情マップに対して類似度の算出が行われたか否かを判定し（ステップＳ３７）、全ての表情マップに対する類似度の算出が行われていない場合（ステップＳ３７のＮＯ）、別の表情マップを取得して（ステップＳ３８）、表情画像Ｕと新たに取得した表情マップにおける各ニューロンとの間の類似度を算出し、新たに取得した表情マップ値を保存する（ステップＳ３３〜ステップＳ３６）。 Thereafter, the emotion estimation means 13 determines whether or not the similarity is calculated for all the facial expression maps of the emotion estimation device 100 (step S37), and the similarity is calculated for all the facial expression maps. If not (NO in step S37), another facial expression map is obtained (step S38), and the similarity between the facial expression image U and each neuron in the newly obtained facial expression map is calculated and newly obtained. The performed facial expression map value is stored (steps S33 to S36).

全ての表情マップに対する類似度の算出が行われた場合（ステップＳ３７のＹＥＳ）、感情推定手段１３は、これまでにＲＡＭに保存した各表情マップ値に基づいて不特定人物の感情を推定する（ステップＳ３９）。 When the similarity is calculated for all the facial expression maps (YES in step S37), the emotion estimation means 13 estimates the emotion of the unspecified person based on the facial expression map values stored in the RAM so far ( Step S39).

感情推定手段１３は、例えば、全ての表情マップ値の中で最も頻度の高い表情マップ値が示す感情を不特定人物の感情として推定（出力）してもよく、高い類似度を有するニューロンを含む複数の表情マップをその類似度が高い順に所定数だけ抽出し、それら表情マップが有する表情マップ値の中で最も頻度の高い表情マップ値が示す感情を不特定人物の感情として出力してもよい。 The emotion estimation means 13 may estimate (output) the emotion indicated by the most frequent facial expression map value among all facial expression map values as the emotion of an unspecified person, and includes a neuron having a high similarity. A predetermined number of facial expression maps may be extracted in descending order of similarity, and the emotion represented by the facial expression map value having the highest frequency among facial expression map values included in those facial expression maps may be output as the emotion of an unspecified person. .

以上の構成により、感情推定装置１００は、特定人物の表情画像を用いてニューラルネットワークを学習させることにより、特定人物の感情をより正確に推定することができる。 With the above configuration, the emotion estimation apparatus 100 can more accurately estimate the emotion of the specific person by learning the neural network using the facial expression image of the specific person.

また、感情推定装置１００は、特定人物の表情画像と各ニューロンとの間の類似度の分布に基づいてその特定人物の感情を推定するので、学習用画像における表情とは異なる表情を撮像した表情画像が入力された場合にも、その特定人物の感情をより正確に推定することができる。 Moreover, since the emotion estimation device 100 estimates the emotion of the specific person based on the distribution of the similarity between the facial expression image of the specific person and each neuron, the facial expression obtained by capturing a facial expression different from the facial expression in the learning image Even when an image is input, the emotion of the specific person can be estimated more accurately.

また、感情推定装置１００は、複数の特定人物の学習用画像に基づいて複数の表情マップを生成し、かつ、生成した各表情マップの感情領域の形状、面積又はセントロイドベクトルの位置等を調整して一般化（人によって異なる各感情領域の位置、形状、面積等を共通化して、特定人物の表情の特徴が感情推定処理に過大な影響を与えないようにすることをいう。）するので、不特定人物の感情をより正確に推定することができる。 In addition, emotion estimation apparatus 100 generates a plurality of facial expression maps based on learning images of a plurality of specific persons, and adjusts the shape, area, or centroid vector position of the emotion region of each of the generated facial expression maps. Therefore, it is generalized (meaning that the position, shape, area, etc. of each emotional region that varies from person to person is shared so that the facial expression features of a specific person do not have an excessive influence on the emotion estimation process). It is possible to estimate the emotion of an unspecified person more accurately.

以上、本発明の好ましい実施例について詳説したが、本発明は、上述した実施例に制限されることはなく、本発明の範囲を逸脱することなしに上述した実施例に種々の変形及び置換を加えることができる。 Although the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to the above-described embodiments, and various modifications and substitutions can be made to the above-described embodiments without departing from the scope of the present invention. Can be added.

例えば、上述の実施例では、特定人物の複数の表情マップを用いて不特定人物の感情を推定する場合、感情推定装置１００は、領域区分手段１２により各感情領域の形状、面積又はセントロイドベクトルの位置を予め決定したり、又は、領域共通化手段１４により各感情領域の形状、面積又はセントロイドベクトルの位置を事後的に調整したりするが、このような決定又は調整を行うことなく感情推定手段１３による感情の推定を実行するようにしてもよい。 For example, in the above-described embodiment, when the emotion of an unspecified person is estimated using a plurality of facial expressions maps of the specified person, the emotion estimation device 100 uses the area classification unit 12 to determine the shape, area, or centroid vector of each emotion area. Is determined in advance, or the shape, area or centroid vector position of each emotional region is adjusted afterwards by the region sharing means 14, but without such determination or adjustment. Emotion estimation by the estimation means 13 may be executed.

高い類似度を有するニューロンを含む複数の表情マップをその類似度が高い順に所定数だけ抽出し、それら表情マップが有する表情マップ値の中で最も頻度の高い表情マップ値が示す感情を不特定人物の感情として出力することで、中心ニューロン固定クラスタリング、条件付きクラスタリング、又は、領域共通化手段１４を用いずとも、不特定人物に対して顔の特徴が共通する複数の特定人物の学習用画像に基づいて生成された表情マップを用いて、不特定人物の感情を推定することができるからである。 A predetermined number of facial expression maps including neurons with high similarity are extracted in descending order of similarity, and the emotions indicated by the most frequent facial expression map value among those facial expression map values are unspecified. Can be used as learning images of a plurality of specific persons whose facial features are common to unspecified persons without using central neuron fixed clustering, conditional clustering, or area sharing means 14. This is because the emotion of an unspecified person can be estimated using the expression map generated based on the expression map.

また、感情推定装置１００は、顔認証システム、顔面神経痛の症状判定システム等に適用されてもよく、或いは、運転支援装置として利用されてもよい。 The emotion estimation apparatus 100 may be applied to a face authentication system, a facial neuralgia symptom determination system, or the like, or may be used as a driving support apparatus.

感情推定装置１００を利用した運転支援装置は、所定の感情に関連付けたドライバの表情画像を用いてニューラルネットワークを学習させ、運転中のドライバの表情画像からドライバの感情を推定する。 The driving support apparatus using the emotion estimation apparatus 100 learns a neural network using a driver's facial expression image associated with a predetermined emotion, and estimates the driver's emotion from the driving driver's facial expression image.

この運転支援装置は、推定した感情に基づいて所定の運転支援を実行し、例えば、「怒り」を推定（検知）した場合には、アクセル開度を所定値以下に制限したりする。 This driving support device performs predetermined driving support based on the estimated emotion. For example, when “anger” is estimated (detected), the accelerator opening is limited to a predetermined value or less.

本発明に係る感情推定装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the emotion estimation apparatus which concerns on this invention. 表情画像加工処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a facial expression image process. 表情画像加工処理の各ステップを説明するための表情画像である。It is a facial expression image for demonstrating each step of facial expression image processing. 表情マップ生成処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a facial expression map production | generation process. 表情マップ生成手段により生成された表情マップの例を示す図である。It is a figure which shows the example of the facial expression map produced | generated by the facial expression map production | generation means. 領域区分手段による条件付きクラスタリングを説明するための図である。It is a figure for demonstrating conditional clustering by an area | region classification means. 各感情領域の配置、形状及び大きさを等しくした表情マップの例を示す模式図である。It is a schematic diagram which shows the example of the expression map which made arrangement | positioning of each emotion area | region, shape, and size equal. 感情推定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an emotion estimation process. 表情画像と表情マップ上の各ニューロンとの間の類似度の分布を示す図である。It is a figure which shows distribution of the similarity degree between a facial expression image and each neuron on a facial expression map. 三人の特定人物の学習用画像を用いて生成された三つの表情マップを示す。3 shows three facial expression maps generated using learning images of three specific persons. 感情推定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an emotion estimation process.

Explanation of symbols

１制御部
２撮像部
３表示部
１０表情画像加工手段
１１表情マップ生成手段
１２領域区分手段
１３感情推定手段
１４領域共通化手段
１００感情推定装置
Ｅ１〜Ｅ７固定中心ニューロン
ＣＶ１〜ＣＶ９中心ニューロン
Ｒ１〜Ｒ１１領域 DESCRIPTION OF SYMBOLS 1 Control part 2 Image pick-up part 3 Display part 10 Facial expression image processing means 11 Facial expression map production | generation means 12 Area | region classification means 13 Emotion estimation means 14 Area commonality means 100 Emotion estimation apparatus E1-E7 Fixed central neuron CV1-CV9 Central neuron R1-R11 region

Claims

A facial expression map generating means for generating a facial expression map by a neural network by learning a facial expression image of a specific person associated with a predetermined emotion;
Area dividing means for dividing an expression map corresponding to each of the specific persons generated by the expression map generating means into a plurality of areas based on the predetermined emotion;
An emotion estimation means for estimating an emotion of an unspecified person based on a facial expression image of the unspecified person and a plurality of facial expression maps having areas divided by the area classification means;
An emotion estimation apparatus comprising:

The region dividing means predetermines at least one of the center position, shape or area of the region to be divided.
The emotion estimation apparatus according to claim 1, wherein:

An area sharing means for sharing the corresponding area in each of the plurality of facial expression maps, which is an area divided by the area dividing means;
The emotion estimation means estimates the emotion of the unspecified person based on the expression image of the unspecified person and a plurality of expression maps having a region shared by the area sharing means;
The emotion estimation apparatus according to claim 1, wherein:

The region sharing means shares at least one of the shape, center position, or area of the corresponding region,
The emotion estimation apparatus according to claim 3, wherein: