JP2016167164A

JP2016167164A - Image recognition device, image recognition method and image recognition program

Info

Publication number: JP2016167164A
Application number: JP2015046510A
Authority: JP
Inventors: 雄司金田; Yuji Kaneda; 佐藤　博; Hiroshi Sato; 博佐藤; 大輔西野; Daisuke Nishino
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-03-09
Filing date: 2015-03-09
Publication date: 2016-09-15
Anticipated expiration: 2035-03-09
Also published as: JP6626259B2

Abstract

PROBLEM TO BE SOLVED: To perform highly accurate face authentication on a face image including a face with a small captured-size, blur or the like.SOLUTION: An image recognition device comprises: setting means which compares the whole facial area of an input image with the whole facial area of each of registered images, calculates the face similarity between the input image and the registered image, retrieves the registered image corresponding to the input image on the basis of the face similarity, and sets a face organ position stored in association with the corresponding registered image as the face organ position of the input image; feature extraction means which extracts a feature quantity of each face organ of the input image on the basis of the face organ position set by the setting means and extracts a feature quantity of each face organ of each registered image on the basis of the stored face organ position; and face identification means which calculates the face organ similarity between the feature quantity of each face organ of the input image and the feature quantity of each face organ of each registered image and identifies whether the face captured in the input image and the face captured in each registered image are the face of the same person on the basis of the calculated face organ similarity.SELECTED DRAWING: Figure 1

Description

本発明は、画像認識装置、画像認識方法、及び画像認識プログラムに関するものである。 The present invention relates to an image recognition apparatus, an image recognition method, and an image recognition program.

画像内に存在する物体の位置を特定した上で、その大きさを揃える位置合わせ技術は画像認識全般で非常に重要である。例えば、顔の位置や大きさを一定に揃えるために非特許文献６のように、目尻や目頭など顔表面のより細かな特徴点を抽出し、抽出した特徴点に基づいて顔の大きさや位置を一定に揃えるような技術がある。 An alignment technique for identifying the position of an object existing in an image and aligning the size of the object is very important in general image recognition. For example, in order to make the position and size of the face uniform, as in Non-Patent Document 6, finer feature points of the face surface such as the corners of the eyes and the eyes are extracted, and the size and position of the face are based on the extracted feature points There is a technology that keeps a constant.

このような位置合わせ技術をベースとした画像認識技術の中に、映像中の顔が誰かを特定する顔認証技術がある。例えば、非特許文献１のように、顔の位置や大きさを一定に揃えた入力輝度顔画像からLocal Binary Pattern（以下、ＬＢＰ特徴）と呼ばれる特徴量を抽出する。そして、入力輝度顔画像から抽出した特徴量と、予め登録しておいた輝度顔画像から抽出した特徴量とを比較することで、入力された顔が誰かを特定するものである。 Among image recognition techniques based on such alignment techniques, there is a face authentication technique that identifies who the face in the video is. For example, as in Non-Patent Document 1, a feature value called Local Binary Pattern (hereinafter referred to as LBP feature) is extracted from an input luminance face image in which the position and size of the face are made uniform. Then, by comparing the feature amount extracted from the input luminance face image with the feature amount extracted from the previously registered luminance face image, the input face is identified.

なお、特徴量を抽出する領域は画像を均等に分割するのではなく、目尻や目頭など顔表面のより細かな特徴点を基準として設定した方が顔の向きなどにも頑健になる。このような顔認証技術は、これまでデジタルカメラのオートシャッターや入退出管理に見られるように、撮像装置から被写体までの距離が近いなど比較的に良好な撮影条件で用いられてきた。 It should be noted that the region from which the feature amount is extracted does not divide the image evenly, and the direction set with reference to finer feature points on the face surface such as the corners of the eyes and the eyes becomes more robust in the direction of the face. Such face authentication technology has been used in comparatively good shooting conditions such as a short distance from an imaging device to a subject as seen in auto shutter and entry / exit management of a digital camera.

近年では、監視カメラで取得されるような遠方にある小さい顔やボケなどが生じている低解像顔画像に対する顔認証の研究がなされており、これに対応する１つの手段として画像の高解像化がある。つまり、低解像顔画像を前処理で高解像化し、情報量を回復させた上で顔認証を行うというものである。画像の高解像化技術は、例えば、非特許文献２のように、ある人物の顔を別人の顔の線形和で近似するhallucinationという技術が提案されている。 In recent years, research on face authentication for low-resolution face images with small distant faces or blurs that are obtained by surveillance cameras has been conducted, and high resolution of images is one means to deal with this. There is imaging. That is, face authentication is performed after the high resolution of the low-resolution face image is pre-processed and the amount of information is restored. As a high-resolution image technique, for example, as in Non-Patent Document 2, a technique called hallucination that approximates the face of one person with the linear sum of the faces of another person has been proposed.

T. Pajdla, and J. Matas, “Face Recognition with Local Binary Patterns”, ECCV, pp. 469 − 481, 2004T. Pajdla, and J. Matas, “Face Recognition with Local Binary Patterns”, ECCV, pp. 469 − 481, 2004 K. Huang, R. Hu, “Face hallucination via K−selection mean constrained sparse representation”, ICIP, pp. 882 − 885, 2012K. Huang, R. Hu, “Face hallucination via K−selection mean constrained sparse representation”, ICIP, pp. 882 − 885, 2012 M. Turk, A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neurosicence, Vol. 3, No. 1, 1991, pp. 71−86M. Turk, A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neurosicence, Vol. 3, No. 1, 1991, pp. 71-86 B. Li, H. Chang, “Hallucinating Facial Images and Features”, ICPR, pp. 1−4, 2008B. Li, H. Chang, “Hallucinating Facial Images and Features”, ICPR, pp. 1-4, 2008 P. Viola, M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, in Proc. Of CVPR, vol.1, pp.511−518, December, 2001P. Viola, M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, in Proc. Of CVPR, vol.1, pp.511−518, December, 2001 T. F. Cootes， C. J. Taylor, D. H. Cooper, and J. Graham, “Active Shape Models −Their Training and Application”, Computer Vision and Image Understanding, Vol. 61, No. 1, January, pp. 38 − 59, 1995T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, “Active Shape Models −Their Training and Application”, Computer Vision and Image Understanding, Vol. 61, No. 1, January, pp. 38 − 59, 1995 I. Kemelmacher−Shlizerman, “3D Face Reconstruction from a Single Image Using a Single Reference Face Shape”, PAMI, pp. 394 − 405, 2011I. Kemelmacher−Shlizerman, “3D Face Reconstruction from a Single Image Using a Single Reference Face Shape”, PAMI, pp. 394 − 405, 2011

上述したhallucination技術のような画像の高解像化技術は、画像を複数のブロックに分割し、分割されたブロックごとに高解像化を行っているため、顔の位置や大きさを一定に揃える必要がある。しかしながら、高解像化される前の低解像度の画像（低解像画像）は情報量が大きく失われているため、顔や目などの位置を正確に検出することは期待できないという問題があった。本発明は上記課題に鑑みなされたものであり、遠方にあるような小さい顔やボケ、更にはノイズが含まれる顔画像に対しても正確な位置合わせをすることができるとともに、高精度な顔認証を実現することができる技術を提供することを目的とする。 Image resolution techniques such as the hallucination technique described above divide an image into multiple blocks and perform high resolution for each divided block, so the face position and size are kept constant. It is necessary to align. However, since the amount of information of a low resolution image (low resolution image) before high resolution is greatly lost, there is a problem that it is not possible to accurately detect the position of the face or eyes. It was. The present invention has been made in view of the above-described problems, and can accurately align a small face or blur that is far away, and even a face image that includes noise, and a highly accurate face. It aims at providing the technique which can implement | achieve authentication.

上記目的を達成するために、本発明の画像認識装置は、
複数の登録画像を、前記登録画像に写っている各顔器官の顔器官位置と関連付けて記憶する登録手段と、
入力画像の顔領域全体と前記複数の登録画像の各登録画像の顔領域全体とを比較して、前記入力画像と前記登録画像との顔類似度を算出し、前記顔類似度に基づいて前記入力画像に対応する前記登録画像を探索し、当該対応する登録画像に関連付けて記憶されている前記顔器官位置を、前記入力画像の顔器官位置として設定する設定手段と、
前記設定手段が設定した前記顔器官位置に基づいて前記入力画像の各顔器官の特徴量を抽出し、かつ、前記登録手段が記憶している前記顔器官位置に基づいて前記各登録画像の各顔器官の特徴量を抽出する特徴抽出手段と、
前記特徴抽出手段が抽出した前記入力画像の各顔器官の特徴量と前記各登録画像の各顔器官の特徴量との顔器官類似度を算出し、算出された前記顔器官類似度に基づいて前記入力画像に写っている顔と前記各登録画像に写っている顔が同一人物の顔か識別する顔識別手段と、
を備えることを特徴とする。 In order to achieve the above object, the image recognition apparatus of the present invention provides:
Registration means for storing a plurality of registered images in association with the facial organ positions of each facial organ shown in the registered image;
Comparing the entire face area of the input image and the entire face area of each registered image of the plurality of registered images to calculate a face similarity between the input image and the registered image, and based on the face similarity Searching for the registered image corresponding to the input image, and setting the face organ position stored in association with the corresponding registered image as the face organ position of the input image;
A feature amount of each facial organ of the input image is extracted based on the facial organ position set by the setting means, and each registered image based on the facial organ position stored by the registration means Feature extraction means for extracting feature quantities of facial organs;
A facial organ similarity between the facial organ feature quantity of the input image extracted by the feature extraction unit and the facial organ feature quantity of each registered image is calculated, and based on the calculated facial organ similarity degree Face identifying means for identifying whether the face shown in the input image and the face shown in each of the registered images are the faces of the same person;
It is characterized by providing.

本発明により、遠方にあるような小さい顔やボケ、更にはノイズが含まれる顔画像に対しても正確な位置合わせをすることができるとともに、高精度な顔認証を実現することができる。 According to the present invention, it is possible to accurately align a small face or blur that is far away, and even a face image that includes noise, and to realize highly accurate face authentication.

第１の実施形態の画像認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image recognition apparatus of 1st Embodiment. 第１の実施形態の全体処理及び顔画像登録モードにおける処理を示すフローチャートである。It is a flowchart which shows the process in the whole process and face image registration mode of 1st Embodiment. 第１の実施形態の顔識別モードにおける処理を示すフローチャートである。It is a flowchart which shows the process in the face identification mode of 1st Embodiment. ステップＳ１４０１における顔器官位置設定処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the face organ position setting process in step S1401. ステップＳ１４１１における変換画像生成処理を解説する図である。It is a figure explaining the conversion image generation process in step S1411. ステップＳ１４１２における変換画像と登録画像のペアを探索する処理を解説する図である。It is a figure explaining the process which searches the pair of the conversion image and registration image in step S1412. ステップＳ１４１３における登録画像の顔器官位置を変換画像の顔器官位置に設定する処理を解説する図である。It is a figure explaining the process which sets the facial organ position of the registration image in step S1413 to the facial organ position of a conversion image. ステップＳ１７０１における変換画像の顔器官の特徴量と登録画像の顔器官の特徴量との類似度を算出する処理を解説する図である。It is a figure explaining the process which calculates the similarity degree of the feature-value of the facial organ of the conversion image in step S1701, and the feature-value of the facial organ of a registration image. 第２の実施形態の画像認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image recognition apparatus of 2nd Embodiment. 第２の実施形態の顔識別モードにおける処理を示すフローチャートである。It is a flowchart which shows the process in the face identification mode of 2nd Embodiment. 第２の実施形態における登録画像、変換画像、高解像化した変換画像、登録画像の関係を解説する図である。It is a figure explaining the relationship between the registration image in 2nd Embodiment, a conversion image, the high-resolution conversion image, and a registration image. hallucination技術により低解像顔画像から高解像顔画像を生成した時の高解像顔画像の破綻を示す図である。It is a figure which shows the failure of a high-resolution face image when a high-resolution face image is produced | generated from a low-resolution face image by hallucination technique. 第３の実施形態の画像認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image recognition apparatus of 3rd Embodiment. 第３の実施形態の顔識別モードにおける処理を示すフローチャートである。It is a flowchart which shows the process in the face identification mode of 3rd Embodiment. 第３の実施形態のステップＳ２４０１からステップＳ２８０１までの処理を詳細に示すフローチャートである。It is a flowchart which shows the process from step S2401 of 3rd Embodiment to step S2801 in detail. 第３の実施形態の特徴を示す図である。It is a figure which shows the characteristic of 3rd Embodiment. 第３の実施形態における登録画像、変換画像、２回変換画像の関係を解説する図である。It is a figure explaining the relationship between the registration image in 3rd Embodiment, a conversion image, and a twice conversion image. 第３の実施形態における２回変換画像、２回変換高解像画像、登録画像の関係を解説する図である。It is a figure explaining the relationship between the twice-converted image, the twice-converted high-resolution image, and the registered image in the third embodiment. 第３の実施形態における２回変換高解像画像、登録画像の関係を解説する図である。It is a figure explaining the relationship between the twice conversion high resolution image in a 3rd embodiment, and a registration image. 本発明の第１〜第３の実施形態に係る画像認識装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the image recognition apparatus which concerns on the 1st-3rd embodiment of this invention.

顔認証技術全体の処理に注目すると、顔認証では予め特定したい人物の顔画像を登録する登録処理が存在する。登録処理で登録される顔画像は、その場で撮影して取得する場合や既に撮影した顔画像をフラッシュメモリなどのインターフェイスを介して取得する場合が多く、比較的に高解像な顔画像であることが多い。
本発明では、従来のように入力された低解像顔画像に対して顔特徴点検出技術を適用して位置合わせをするのではなく、登録処理で登録された顔画像から検出した顔特徴点を入力された低解像顔画像に適用することで位置合わせを行う。
また、これに加え、高解像化技術に対して位置ズレに頑健な本発明の技術を組み込むことで、低解像顔画像に対する顔認証精度を大幅に向上させることができる。 Focusing on the processing of the entire face authentication technique, there is a registration process for registering a face image of a person to be specified in advance in face authentication. Face images registered in the registration process are often acquired on the spot or face images that have already been shot are acquired via an interface such as a flash memory. There are often.
In the present invention, the face feature point detected from the face image registered in the registration process is not applied to the low-resolution face image input in the conventional manner by applying the face feature point detection technique to perform positioning. Is applied to the input low-resolution face image.
In addition to this, by incorporating the technique of the present invention that is robust against positional deviation with respect to the high-resolution technique, the face authentication accuracy for the low-resolution face image can be greatly improved.

［第１の実施形態］
第１の実施形態では、登録処理で登録された顔画像から検出した顔器官位置を入力低解像顔画像の顔器官位置に設定し、顔認識を実行する。以下で、その詳細を説明する。 [First Embodiment]
In the first embodiment, the face organ position detected from the face image registered in the registration process is set as the face organ position of the input low-resolution face image, and face recognition is executed. The details will be described below.

＜ハードウェア構成＞
図１４に、本実施形態における画像認識装置のハードウェア構成の一例を示す。画像認識装置１４００は、ＣＰＵ（Central Processing Unit）１４０１、ＲＯＭ（Read Only Memory）１４０２、ＲＡＭ（Random Access Memory）１４０３を備える。更に、二次記憶装置１４０４、表示部１４０５、操作部１４０６、ネットワーク通信部１４０７、ネットワーク接続部１４０８、ＵＳＢ通信部１４０９、ＵＳＢ接続部１４１０、および接続バス１４１１を備える。
ＣＰＵ１４０１は、ＲＯＭ１４０２やＲＡＭ１４０３に格納された制御プログラムを実行することにより、本装置全体の制御を行う。 <Hardware configuration>
FIG. 14 shows an example of the hardware configuration of the image recognition apparatus in the present embodiment. The image recognition apparatus 1400 includes a CPU (Central Processing Unit) 1401, a ROM (Read Only Memory) 1402, and a RAM (Random Access Memory) 1403. Further, a secondary storage device 1404, a display unit 1405, an operation unit 1406, a network communication unit 1407, a network connection unit 1408, a USB communication unit 1409, a USB connection unit 1410, and a connection bus 1411 are provided.
The CPU 1401 controls the entire apparatus by executing a control program stored in the ROM 1402 or the RAM 1403.

ＲＯＭ１４０２は、不揮発性メモリであり、制御プログラムや各種パラメタデータを記憶する。制御プログラムは、ＣＰＵ１４０１で実行され、後述する各処理を実行するための手段として、当該装置を機能させる。
ＲＡＭ１４０３は、揮発性メモリであり、画像データや制御プログラムおよびその実行結果を一時的に記憶する。 The ROM 1402 is a nonvolatile memory, and stores a control program and various parameter data. The control program is executed by the CPU 1401 and causes the apparatus to function as means for executing each process described below.
A RAM 1403 is a volatile memory, and temporarily stores image data, a control program, and an execution result thereof.

二次記憶装置１４０４は、ハードディスクやフラッシュメモリなどの書き換え可能な二次記憶装置であり、ＯＳ（Operating System）、アプリケーションプログラム、画像データなどを記憶する。
ＣＰＵ１４０１は、二次記憶装置１４０４に記憶されているプログラムやＯＳをメモリ１４０３に読み出す。そして、メモリ１４０３上で当該プログラムを実行することで、画像認識装置の各種の機能を実現することができる。 The secondary storage device 1404 is a rewritable secondary storage device such as a hard disk or a flash memory, and stores an OS (Operating System), application programs, image data, and the like.
The CPU 1401 reads out the program and OS stored in the secondary storage device 1404 to the memory 1403. Various functions of the image recognition apparatus can be realized by executing the program on the memory 1403.

なお、プログラムの実行は、１つのプロセッサにより行われてもよいし、複数のプロセッサが協働することでプログラムが実行される場合であってもよい。また特定の処理を実行するための専用回路（ＡＳＩＣ）を設け、その特定の処理については、専用回路が実行する場合であってもよい。
また、後述する処理を記述したソフトウェア（プログラム）をネットワークまたは各種記憶媒体を介して取得して実行してもよい。 Note that the execution of the program may be performed by a single processor, or may be executed by the cooperation of a plurality of processors. Further, a dedicated circuit (ASIC) for executing a specific process may be provided, and the specific process may be executed by the dedicated circuit.
Further, software (program) describing processing to be described later may be acquired and executed via a network or various storage media.

表示部１４０５は、ＬＣＤ等の表示装置から構成される。操作部１４０６は、キーボードやマウス等の入力装置から構成される。ネットワーク通信部１４０７は、画像認識装置をネットワークに接続して各種通信を行う。ネットワーク接続部１４０８は、ネットワーク通信部１４０７をネットワーク媒体に接続する。 The display unit 1405 is composed of a display device such as an LCD. The operation unit 1406 includes an input device such as a keyboard and a mouse. A network communication unit 1407 connects the image recognition apparatus to a network and performs various communications. A network connection unit 1408 connects the network communication unit 1407 to a network medium.

ネットワーク通信部１４０７とネットワーク接続部１４０８は、有線ＬＡＮと無線ＬＡＮの内、少なくともいずれかに対応する。これらの具体的な形態は、対応ＬＡＮに応じて必要な機能及び形態をとる。ＵＳＢ通信部１４０９は、各種周辺装置とＵＳＢインターフェイスを介して通信する。ＵＳＢ接続部１４１０は、ＵＳＢコネクタから構成される。
接続バス１４１１は、ＣＰＵ１４０１、ＲＯＭ１４０２、ＲＡＭ１４０３、二次記憶装置１４０４などを接続して相互にデータの入出力を行う。 The network communication unit 1407 and the network connection unit 1408 correspond to at least one of a wired LAN and a wireless LAN. These specific forms take necessary functions and forms according to the corresponding LAN. A USB communication unit 1409 communicates with various peripheral devices via a USB interface. The USB connection unit 1410 includes a USB connector.
The connection bus 1411 connects the CPU 1401, the ROM 1402, the RAM 1403, the secondary storage device 1404, and the like to input / output data mutually.

＜機能構成＞
図１は、第１の実施形態における画像認識装置の機能構成の一例を示したブロック図である。図１に示すように、本実施形態の機能構成は、画像取得部１１０、顔位置検出部１２０、顔器官位置検出部１３０、顔器官位置設定部１４０、特徴抽出部１５０、顔画像登録部１６０、及び顔識別部１７０を含む。 <Functional configuration>
FIG. 1 is a block diagram illustrating an example of a functional configuration of the image recognition apparatus according to the first embodiment. As shown in FIG. 1, the functional configuration of the present embodiment includes an image acquisition unit 110, a face position detection unit 120, a face organ position detection unit 130, a face organ position setting unit 140, a feature extraction unit 150, and a face image registration unit 160. , And a face identification unit 170.

画像取得部１１０は、複数の登録用の画像を取得する。登録用の画像または登録された画像を、登録画像とも記載する。
また、画像取得部１１０は、登録画像と一致するか判断される対象となる画像（以下、「入力画像」とも記載する）を取得する。
顔位置検出部１２０は、入力画像および登録画像に写っている顔の位置を検出する。 The image acquisition unit 110 acquires a plurality of registration images. An image for registration or a registered image is also referred to as a registered image.
In addition, the image acquisition unit 110 acquires an image (hereinafter, also referred to as “input image”) that is a target for which it is determined whether the image matches the registered image.
The face position detection unit 120 detects the position of the face shown in the input image and the registered image.

顔器官位置検出部１３０は、登録画像に関しては、登録画像に写っている各顔器官（右目、左目、口など）の位置を検出する。
顔器官位置検出部１３０は、入力画像に関しては、入力画像に写っている各顔器官の位置は検出しない。 As for the registered image, the facial organ position detector 130 detects the position of each facial organ (right eye, left eye, mouth, etc.) shown in the registered image.
The facial organ position detection unit 130 does not detect the position of each facial organ in the input image with respect to the input image.

顔器官位置設定部１４０は、入力画像の顔領域全体と複数の登録画像の各登録画像の顔領域全体とを比較して、入力画像と登録画像との顔類似度を算出し、入力画像と対応するペアとなる登録画像を探索する。ここでは、入力画像との顔類似度が最も高くなる登録画像を探索する。そして、顔類似度が最も高くなる登録画像に関連付けて記憶されている顔器官位置を、入力画像の顔器官位置として設定する。なお、この探索は、特徴を抽出する基準となる顔器官位置を設定するためのものであり、顔類似度が最も高くなる登録画像ではなく、顔類似度が所定の閾値より高い登録画像を探索するようにしてもよい。以下の探索でも同様である。 The facial organ position setting unit 140 compares the entire face area of the input image with the entire face area of each registered image of the plurality of registered images, calculates the face similarity between the input image and the registered image, Search for a registered image that becomes a corresponding pair. Here, a registered image having the highest face similarity with the input image is searched. Then, the facial organ position stored in association with the registered image having the highest facial similarity is set as the facial organ position of the input image. Note that this search is for setting a facial organ position as a reference for extracting features, and not a registered image with the highest face similarity but a registered image with a face similarity higher than a predetermined threshold. You may make it do. The same applies to the following search.

また顔器官位置設定部１４０は、
入力画像から顔の大きさ、顔の位置、顔の向きの少なくとも１つが異なる複数の変換画像を生成し、生成された複数の変換画像のそれぞれと、複数の登録画像のそれぞれとの顔類似度を算出し、
顔類似度が最も高くなる変換画像と登録画像とのペアを探索し、
顔類似度が最も高くなる登録画像に関連付けて記憶されている顔器官位置を、顔類似度が最も高くなる変換画像の顔器官位置として設定する、ことが好ましい。 The facial organ position setting unit 140 also
A plurality of converted images having at least one of a face size, a face position, and a face orientation are generated from the input image, and the degree of face similarity between each of the generated converted images and each of the plurality of registered images To calculate
Search for the pair of the converted image and registered image with the highest facial similarity,
It is preferable to set the face organ position stored in association with the registered image with the highest face similarity as the face organ position of the converted image with the highest face similarity.

特徴抽出部１５０は、登録画像に関しては、顔器官位置検出部１３０が検出した顔器官位置に基づいて、登録画像の各顔器官の特徴量を抽出する。
特徴抽出部１５０は、入力画像に関しては、顔器官位置設定部１４０が設定した顔器官位置に基づいて、入力画像の変換画像から各顔器官の特徴量を抽出する。 With respect to the registered image, the feature extraction unit 150 extracts a feature amount of each facial organ of the registered image based on the facial organ position detected by the facial organ position detection unit 130.
With respect to the input image, the feature extraction unit 150 extracts feature amounts of each facial organ from the converted image of the input image based on the facial organ position set by the facial organ position setting unit 140.

顔画像登録部１６０は、複数の登録画像を、各登録画像に写っている各顔器官の位置と、各顔器官の特徴量（右目の特徴量、左目の特徴量、口の特徴量など）と、関連付けて記憶する。
例えば、
第１の登録画像を、
第１の登録画像に写っている右目の位置、左目の位置、口の位置と、
第１の登録画像に写っている右目の特徴量、左目の特徴量、口の特徴量と、
関連付けて記憶する。 The face image registration unit 160 includes a plurality of registered images, the position of each facial organ shown in each registered image, and the feature amount of each facial organ (the feature amount of the right eye, the feature amount of the left eye, the feature amount of the mouth, etc.) And store them in association with each other.
For example,
The first registered image
The position of the right eye, the position of the left eye, the position of the mouth in the first registered image;
The right eye feature amount, left eye feature amount, mouth feature amount, and the first registered image,
Store it in association.

顔識別部１７０は、特徴抽出部１５０が抽出した入力画像の各顔器官の特徴量と、特徴抽出部１５０が抽出し、顔画像登録部１６０が記憶している各登録画像の各顔器官の特徴量との顔器官類似度を算出する。そして、算出された顔器官類似度に基づいて入力画像に写っている顔と各登録画像に写っている顔が同一人物の顔か識別する。 The face identification unit 170 extracts the feature amount of each facial organ from the input image extracted by the feature extraction unit 150 and the facial organ of each registered image extracted by the feature extraction unit 150 and stored in the facial image registration unit 160. The facial organ similarity with the feature amount is calculated. Then, based on the calculated facial organ similarity, it is discriminated whether the face shown in the input image and the face shown in each registered image are the faces of the same person.

図２および図３は、本発明の第１の実施形態の全体フローを示しており、以下では、この全体フローを利用して第１の実施形態を詳細に説明する。
図２（ａ）のステップＳ１００１では、まず、登録モードが選択されているかどうかを判定する。登録モードが選択されている場合には、顔画像登録モードに進む。 2 and 3 show the overall flow of the first embodiment of the present invention, and in the following, the first embodiment will be described in detail using this overall flow.
In step S1001 of FIG. 2A, first, it is determined whether or not the registration mode is selected. If the registration mode is selected, the process proceeds to the face image registration mode.

＜顔画像登録モード＞
図２（ｂ）のステップＳ１１０１では、画像取得部１１０において、登録画像を取得する。画像取得部１１０は、レンズなどの集光素子、光を電気信号に変換するＣＭＯＳやＣＣＤなどの撮像素子、アナログ信号をデジタル信号に変換するＡＤ変換器を通過することによって、得られたデジタル画像データを取得する。また、間引き処理等を行うことによって、例えば、ＶＧＡ（６４０×４８０［ｐｉｘｅｌ］）やＱＶＧＡ（３２０×２４０［ｐｉｘｅｌ］）に変換した顔画像を取得することも可能である。また、撮影の他もフラッシュメモリなどを通じて登録画像を取得することもできる。従って、登録画像は比較的に高解像度の顔画像が登録される。 <Face image registration mode>
In step S1101 in FIG. 2B, the image acquisition unit 110 acquires a registered image. The image acquisition unit 110 passes through a condensing element such as a lens, an imaging element such as a CMOS or a CCD that converts light into an electrical signal, and an AD converter that converts an analog signal into a digital signal. Get the data. Further, by performing thinning processing or the like, it is also possible to acquire a face image converted into, for example, VGA (640 × 480 [pixel]) or QVGA (320 × 240 [pixel]). In addition to shooting, a registered image can be acquired through a flash memory or the like. Therefore, a relatively high-resolution face image is registered as the registered image.

ステップＳ１２０１では、顔位置検出部１２０が、非特許文献５のような技術を利用して顔や左右の目や口などの重心位置を検出する。
ステップＳ１２０２では、ステップＳ１２０１で検出した顔や左右の目や口などの重心位置からアフィン変換などを利用して顔の大きさが所定のサイズ、顔の向きが正立するような第１の正規化画像を生成する。なお、顔の大きさは左右の目の間のユークリッド距離として定義する方法などがある。 In step S <b> 1201, the face position detection unit 120 detects the position of the center of gravity of the face, the left and right eyes, the mouth, and the like using a technique such as Non-Patent Document 5.
In step S1202, the first regularity that the face size is a predetermined size and the face orientation is upright using affine transformation or the like from the center of gravity of the face, left and right eyes or mouth detected in step S1201. Generate a digitized image. There is a method of defining the size of the face as the Euclidean distance between the left and right eyes.

ステップＳ１３０１では、顔器官位置検出部１３０が、ステップＳ１２０１で生成した第１の正規化画像に対して、非特許文献６のような技術を利用して目尻や目頭などのより細かな顔器官位置を検出する。
ステップＳ１３０２では、ステップＳ１３０１で検出した目尻や目頭などのより細かな特徴点の重心位置を利用して顔の大きさが所定のサイズ、顔の向きが正立するような第２の正規化画像を生成する。 In step S1301, the facial organ position detection unit 130 uses the technique as described in Non-Patent Document 6 for the first normalized image generated in step S1201, and more fine facial organ positions such as the corners of the eyes and the eyes. Is detected.
In step S1302, a second normalized image in which the face size is a predetermined size and the face orientation is upright using the position of the center of gravity of finer feature points such as the corners of the eyes and the eyes detected in step S1301. Is generated.

ステップＳ１５０１では、ステップＳ１３０１で検出した顔器官位置に基づいて、ステップＳ１３０２で生成された第２の正規化画像に対して特徴抽出領域を設定し、その領域から非特許文献１のようなＬＢＰ特徴を抽出する。
ステップＳ１６０１では、顔画像登録部１６０が、個人ＩＤ、ステップＳ１３０１で検出された顔器官位置、ステップＳ１３０２で生成された第２の正規化画像、ステップＳ１５０１で生成された特徴量をメモリなどに記憶する。
なお、従来の顔認証ではステップＳ１６０１においては、ステップＳ１３０１で検出された顔器官位置やステップＳ１３０２で生成された第２の正規化画像は記憶せず、個人ＩＤと特徴量だけを記憶するケースが一般的である。 In step S1501, a feature extraction region is set for the second normalized image generated in step S1302 based on the facial organ position detected in step S1301, and an LBP feature as in Non-Patent Document 1 is set from the region. To extract.
In step S1601, the face image registration unit 160 stores the personal ID, the face organ position detected in step S1301, the second normalized image generated in step S1302, and the feature amount generated in step S1501 in a memory or the like. To do.
In conventional face authentication, in step S1601, the face organ position detected in step S1301 and the second normalized image generated in step S1302 are not stored, and only the personal ID and the feature amount are stored. It is common.

以上の処理が顔画像登録モードで実施される登録処理である。なお、ステップＳ１５０１では正規化画像に対して特徴抽出を行い、ステップＳ１６０１で抽出した特徴量もメモリに記憶している。しかし、個人ＩＤ、ステップＳ１３０１で検出された顔器官位置、ステップＳ１３０２で生成された正規化画像までを記憶しておき、登録処理では特徴抽出を行わないでおく。そして、顔識別モード時に、入力画像から特徴抽出を行うと共に、登録画像からも特徴抽出を行うようにしても良い。 The above processing is registration processing performed in the face image registration mode. In step S1501, feature extraction is performed on the normalized image, and the feature amount extracted in step S1601 is also stored in the memory. However, the personal ID, the face organ position detected in step S1301, and the normalized image generated in step S1302 are stored, and the feature extraction is not performed in the registration process. In the face identification mode, feature extraction may be performed from the input image and feature extraction may be performed from the registered image.

＜顔識別モード＞
図２（ａ）のステップＳ１００１で、顔識別モードが選択された場合には、顔識別モードに進む。
図３のステップＳ１１０２からステップＳ１２０４までの処理は、顔画像登録モードにおけるステップＳ１１０１からステップＳ１２０２までの処理と同じため、説明を省略する。但し、本発明の第１の実施形態では、顔識別モードで取得される顔画像は、遠方にある小さい顔やぼけている顔などの低解像顔画像としている。 <Face identification mode>
If the face identification mode is selected in step S1001 of FIG. 2A, the process proceeds to the face identification mode.
Since the processing from step S1102 to step S1204 in FIG. 3 is the same as the processing from step S1101 to step S1202 in the face image registration mode, description thereof will be omitted. However, in the first embodiment of the present invention, the face image acquired in the face identification mode is a low-resolution face image such as a small face in the distance or a blurred face.

ステップＳ１４０１では、本実施形態のポイントとなる顔器官位置設定部１４０における顔器官位置設定処理について説明する。
図４は、ステップＳ１４０１における顔器官位置設定処理の詳細な処理フローを示している。
図５は、ステップＳ１４１１における変換画像生成処理を解説した図である。
ステップＳ１４１１では、図５に示すようにステップＳ１２０４で生成された第１の正規化画像からスケール（サイズ）、シフト量、回転量の異なる複数の変換画像を生成する。スケール、シフト量、回転量の異なる複数の変換画像は、例えば、アフィン変換を用いて生成する。 In step S1401, the facial organ position setting process in the facial organ position setting unit 140, which is a point of the present embodiment, will be described.
FIG. 4 shows a detailed processing flow of the facial organ position setting processing in step S1401.
FIG. 5 is a diagram illustrating the converted image generation processing in step S1411.
In step S1411, a plurality of converted images having different scales (sizes), shift amounts, and rotation amounts are generated from the first normalized image generated in step S1204 as shown in FIG. A plurality of converted images having different scales, shift amounts, and rotation amounts are generated using, for example, affine transformation.

図６Ａは、ステップＳ１４１２における変換画像と登録画像とのペアを探索する処理を解説した図である。
ステップＳ１４１２では、図６Ａに示すようにステップＳ１４１１で生成されたスケール、シフト量、回転量の異なる複数の変換画像と、登録画像とのマッチングを行うことで最も類似度の高くなる変換画像と登録画像とのペアを探索する。マッチングには、例えば、正規化相互相関などを用いる。 FIG. 6A is a diagram illustrating processing for searching for a pair of a converted image and a registered image in step S1412.
In step S1412, as shown in FIG. 6A, a plurality of converted images having different scales, shift amounts, and rotation amounts generated in step S1411 are registered with a converted image having the highest similarity by matching the registered image. Search for a pair with an image. For the matching, for example, normalized cross-correlation is used.

なお、スケール変換によって複数の変換画像が生成される。シフト変換によっても、回転変換によっても複数の変換画像が生成される。また、スケール変換、シフト変換、回転変換は適宜組み合わせても良い。スケール変換し、かつシフト変換しても良い。また、回転変換し、かつスケール変換し、かつシフト変換しても良い。 A plurality of converted images are generated by scale conversion. A plurality of converted images are generated by both shift conversion and rotation conversion. Further, scale conversion, shift conversion, and rotation conversion may be appropriately combined. Scale conversion and shift conversion may be performed. Further, rotation conversion, scale conversion, and shift conversion may be performed.

図６Ｂは、ステップＳ１４１３における登録画像の顔器官位置を変換画像の顔器官位置に設定する処理を解説した図である。
ステップＳ１４１３では、図６Ｂに示すようにステップＳ１４１２で探索された登録画像（図６Ｂの例では第１の登録画像）の顔器官位置を、変動画像（図６Ｂの例では第１の変換画像）の顔器官位置として設定する。なお、登録画像の顔器官位置は上述のようにステップＳ１３０１での顔器官位置検出の結果である。 FIG. 6B is a diagram illustrating processing for setting the facial organ position of the registered image to the facial organ position of the converted image in step S1413.
In step S1413, as shown in FIG. 6B, the facial organ position of the registered image (first registered image in the example of FIG. 6B) searched in step S1412 is changed to the variation image (first converted image in the example of FIG. 6B). Set as the facial organ position. Note that the facial organ position of the registered image is a result of the facial organ position detection in step S1301 as described above.

従来の顔認証では、ステップＳ１２０４で生成された第１の正規化画像に対してステップＳ１３０１と同様に顔器官位置検出を行っていた。しかしながら、ステップＳ１２０４で生成された第１の正規化画像が低解像画像である場合には、顔器官位置検出を実施しても正しい位置が得られない。但し、低解像画像であっても顔の輪郭情報など比較的に低周波成分の情報だけは残されている。 In conventional face authentication, face organ position detection is performed on the first normalized image generated in step S1204 as in step S1301. However, if the first normalized image generated in step S1204 is a low-resolution image, a correct position cannot be obtained even if face organ position detection is performed. However, only relatively low-frequency component information such as face contour information remains even in a low-resolution image.

従って、本実施形態では、ステップＳ１２０４で生成された第１の正規化画像に対しては顔器官位置検出を実施しない。その代わりに、ステップＳ１２０４で生成された第１の正規化画像との顔全体の見た目のマッチングを行うことで、変換画像と最も類似度の高い登録画像を探索し、最も類似度が高い登録画像の顔器官位置検出結果を、変換画像の顔器官位置として利用する。 Accordingly, in the present embodiment, face organ position detection is not performed on the first normalized image generated in step S1204. Instead, a registered image having the highest similarity with the converted image is searched by matching the appearance of the entire face with the first normalized image generated in step S1204, and the registered image having the highest similarity. The face organ position detection result is used as the face organ position of the converted image.

ステップＳ１５０２では、ステップＳ１４１３で設定された登録画像の顔器官位置検出の結果を利用して、特徴抽出領域を設定し、その領域に対して特徴抽出を行う。
ステップＳ１７０１では、ステップＳ１５０２で抽出された特徴量と、ステップＳ１６０１で記憶された特徴量との類似度（顔器官類似度）を算出し、個人を識別する。
例えば、図６Ｃに示すように、第１の変換画像の各顔器官と第１の登録画像の各顔器官の類似度、第１の変換画像の各顔器官と第２の登録画像の各顔器官の類似度を算出する。そして、算出された顔器官類似度に基づいて、第１の入力画像に写っている人物と第１の登録画像に写っている人物とが同一人物か、第１の入力画像に写っている人物と第２の登録画像に写っている人物とが同一人物かを識別する。 In step S1502, a feature extraction region is set using the result of face organ position detection of the registered image set in step S1413, and feature extraction is performed on the region.
In step S1701, the degree of similarity (facial organ similarity) between the feature amount extracted in step S1502 and the feature amount stored in step S1601 is calculated to identify an individual.
For example, as shown in FIG. 6C, the degree of similarity between each facial organ of the first converted image and each facial organ of the first registered image, and each facial organ of the first converted image and each face of the second registered image Calculate organ similarity. Then, based on the calculated facial organ similarity, the person shown in the first input image and the person shown in the first registered image are the same person or the person shown in the first input image And whether the person shown in the second registered image is the same person.

［第２の実施形態］
第２の実施形態では、入力画像の顔器官位置を設定した後に、入力画像の変換画像を高解像化する。そしで、高解像化した変換画像の各顔器官と、登録画像の各顔器官の類似度を算出し、算出された類似度（顔器官類似度）に基づいて、入力画像に写っている人物と登録画像に写っている人物が同一人物かを識別する。 [Second Embodiment]
In the second embodiment, after setting the facial organ position of the input image, the converted image of the input image is subjected to high resolution. Therefore, the similarity between each facial organ of the converted image with high resolution and each facial organ of the registered image is calculated, and is reflected in the input image based on the calculated similarity (facial organ similarity). Identify whether the person and the person in the registered image are the same person.

＜機能構成＞
図７Ａは、第２の実施形態における画像認識装置の機能構成の一例を示したブロック図である。図７Ａに示すように、本実施形態の機能構成２００は、画像取得部１１０、顔位置検出部１２０、顔器官位置検出部１３０、顔器官位置設定部１４０、特徴抽出部１５１、顔画像登録部１６０、及び顔識別部１７１、並びに高解像顔画像生成部２１０を含む。 <Functional configuration>
FIG. 7A is a block diagram illustrating an example of a functional configuration of the image recognition apparatus according to the second embodiment. As shown in FIG. 7A, the functional configuration 200 of the present embodiment includes an image acquisition unit 110, a face position detection unit 120, a face organ position detection unit 130, a face organ position setting unit 140, a feature extraction unit 151, and a face image registration unit. 160, a face identification unit 171, and a high-resolution face image generation unit 210.

図７Ａに示す第２の実施形態における画像認識装置２００と、図１に示す第１の実施形態における画像認識装置１００との違いは以下のとおりである。
画像認識装置２００は高解像顔画像生成部２１０を有するが、画像認識装置１００は高解像顔画像生成部２１０に相当する機能部は有しない。
高解像顔画像生成部２１０は、顔類似度が最も高くなる変換画像から高解像顔画像を生成する。 The difference between the image recognition apparatus 200 in the second embodiment shown in FIG. 7A and the image recognition apparatus 100 in the first embodiment shown in FIG. 1 is as follows.
The image recognition apparatus 200 includes a high-resolution face image generation unit 210, but the image recognition apparatus 100 does not include a functional unit corresponding to the high-resolution face image generation unit 210.
The high-resolution face image generation unit 210 generates a high-resolution face image from the converted image having the highest face similarity.

特徴抽出部１５１は、入力画像に関しては、顔器官位置設定部１４０が設定した顔器官位置に基づいて、入力画像の変換画像を高解像化した画像（高解像化された変換画像）から各顔器官の特徴量を抽出する。
顔識別部１７１は、高解像顔画像の各顔器官の特徴量と、各登録画像の各顔器官の特徴量との顔器官類似度を算出し、算出された顔器官類似度に基づいて、入力画像に写っている顔と各登録画像に写っている顔が同一人物の顔か識別する。 With respect to the input image, the feature extraction unit 151 uses a high-resolution image (high-resolution converted image) based on the converted image of the input image based on the facial organ position set by the facial organ position setting unit 140. Extract features of each facial organ.
The face identification unit 171 calculates the facial organ similarity between the feature amount of each facial organ of the high-resolution face image and the feature amount of each facial organ of each registered image, and based on the calculated facial organ similarity. The face shown in the input image and the face shown in each registered image are identified as the faces of the same person.

図７Ｂは、第２の実施形態における顔識別モードにおける処理の流れを示すフローチャートである。図７ＢのステップＳ１１０２からステップＳ１４０１までの処理は、図３のステップＳ１１０２からステップＳ１４０１までの処理と同じため、説明を省略する。
図７Ｃは、第２の実施形態における一連の処理の流れを示すフローチャートである。 FIG. 7B is a flowchart showing a flow of processing in the face identification mode in the second embodiment. The processing from step S1102 to step S1401 in FIG. 7B is the same as the processing from step S1102 to step S1401 in FIG.
FIG. 7C is a flowchart showing a flow of a series of processes in the second embodiment.

ステップＳ１４５０では、高解像顔画像生成部２１０が変換画像を高解像化する。高解像化される変換画像は、前記のマッチングの結果、顔器官位置設定部１４０が変換画像と登録画像とのペアの中で最も類似度（顔類似度）が高いと判断したペアの変換画像である。
ステップＳ１５０２では、特徴抽出部１５１が高解像化された変換画像から特徴量を抽出する。
ステップＳ１７０１では、特徴抽出部１５１が高解像化された変換画像から抽出した特徴量と、特徴抽出部１５１によって登録画像から抽出され顔画像登録部１６０に記憶されている特徴量との類似度（顔器官類似度）が算出される。そして、算出された顔器官類似度に基づいて、顔識別部１７０が入力画像に写っている人物と登録画像に写っている人物とが同一人物かを識別する。 In step S1450, the high-resolution face image generation unit 210 increases the resolution of the converted image. As a result of the matching described above, the conversion image to be high-resolution is the conversion of the pair that the facial organ position setting unit 140 determines to have the highest similarity (face similarity) among the pair of the conversion image and the registered image. It is an image.
In step S1502, the feature extraction unit 151 extracts a feature amount from the converted image having a high resolution.
In step S <b> 1701, the similarity between the feature amount extracted from the high-resolution converted image by the feature extraction unit 151 and the feature amount extracted from the registered image by the feature extraction unit 151 and stored in the face image registration unit 160. (Facial organ similarity) is calculated. Then, based on the calculated facial organ similarity, the face identifying unit 170 identifies whether the person shown in the input image and the person shown in the registered image are the same person.

［第３の実施形態］
まず、始めに高解像顔画像生成技術であるhallucination技術について説明する。
・hallucination技術の説明
hallucination技術は、低解像顔画像から高解像顔画像を生成する技術である。その原理の概要は、入力された低解像顔画像を他人の高解像顔画像で近似するというものである。詳細を説明する。 [Third Embodiment]
First, the hallucination technique, which is a high-resolution face image generation technique, will be described.
・ Description of hallucination technology
The hallucination technique is a technique for generating a high-resolution face image from a low-resolution face image. The outline of the principle is that an input low-resolution face image is approximated by another person's high-resolution face image. Details will be described.

まずは、予め様々な人物の顔画像を利用して高解像と低解像がペアとなっている高解像化辞書（数式１）を学習により用意する。
高解像と低解像のペアは数式２に示すように複数格納されている。
また、高解像化辞書を構成する第１のペア、第１のペアを構成する低解像辞書、第１のペアを構成する高解像辞書のそれぞれを数式３のように記述すると、高解像化辞書と低解像辞書と高解像辞書との関係は数式４のとおりである。 First, a high resolution dictionary (Equation 1) in which high resolution and low resolution are paired using face images of various persons in advance is prepared by learning.
A plurality of pairs of high resolution and low resolution are stored as shown in Equation 2.
In addition, when each of the first pair constituting the high resolution dictionary, the low resolution dictionary constituting the first pair, and the high resolution dictionary constituting the first pair is described as Equation 3, The relationship among the resolution dictionary, the low resolution dictionary, and the high resolution dictionary is as shown in Equation 4.

高解像化辞書Ｄとして輝度画像を用いる場合には、高解像の輝度画像と低解像の輝度画像がペアとなって格納される。例えば、高解像の輝度画像は、映像中の顔が誰かを特定することが十分に可能なくらい鮮明な画像である。一方、低解像の輝度画像は、顔が小さすぎるために又は顔がボケているために映像中の顔が誰かを特定するのが難しい画像となる。 When a luminance image is used as the high resolution dictionary D, a high resolution luminance image and a low resolution luminance image are stored as a pair. For example, a high-resolution luminance image is an image that is clear enough to identify who the face in the video is. On the other hand, a low-resolution luminance image is an image in which it is difficult to specify who the face in the video is because the face is too small or the face is blurred.

次に、入力された低解像顔画像Ｉ_Ｌから低解像部分画像を切り出し、この低解像部分画像を高解像化辞書Ｄに記憶されている高解像と低解像のペア辞書のうち、低解像辞書の線形和で近似する。数式５は近似の結果である。
これにより、低解像部分画像を近似する低解像辞書と結合係数α（α１、α２、α３、．．．）が求まり、そして、低解像辞書に対応する高解像辞書と結合係数α（α１、α２、α３、．．．）を用いて高解像部分画像を生成する。数式６は生成された高解像部分画像を表す式である。 Next, cut a low-resolution partial images from the input low-resolution face image I _L, the pair dictionary high resolution and low resolution stored the low-resolution partial images to the high resolution of the dictionary D Of these, approximation is performed using a linear sum of low-resolution dictionaries. Equation 5 is an approximation result.
As a result, a low resolution dictionary and a coupling coefficient α (α1, α2, α3,...) Approximating the low resolution partial image are obtained, and a high resolution dictionary and a coupling coefficient α corresponding to the low resolution dictionary are obtained. A high-resolution partial image is generated using (α1, α2, α3,...). Expression 6 is an expression representing the generated high-resolution partial image.

なお、高解像化辞書Ｄは輝度画像ではなく、エッジなどのような顔画像に共通な基底画像を利用しても良い。基底画像の例としては、非特許文献３のように主成分分析による固有顔などがある。 Note that the high-resolution dictionary D may use a base image common to face images such as edges instead of a luminance image. As an example of the base image, there is an eigenface by principal component analysis as in Non-Patent Document 3.

次にhallucination技術の問題点を説明する。
・hallucination技術の問題点
hallucination技術では、低解像顔画像を複数のブロックに分割し、各ブロック毎に高解像化辞書Ｄを利用して高解像化を行うため、目や口などの位置を所定の位置に合わせるような高精度な位置合わせが必要である。もし、高精度な位置合わせができていない場合には、生成された高解像顔画像が部分的に破綻してしまい、十分な顔認証精度を実現できない可能性がある。 Next, the problem of hallucination technology is explained.
・ Problems of hallucination technology
In the hallucination technology, the low-resolution face image is divided into a plurality of blocks, and the high-resolution dictionary D is used for each block to achieve high resolution. A highly accurate alignment is required. If high-accuracy alignment has not been achieved, the generated high-resolution face image may partially fail, and sufficient face authentication accuracy may not be realized.

例えば、図８のように低解像顔画像１８００１から高解像度の画像（高解像画像）１８００３を生成した時に、ブロック１８００２に含まれる人の目（左目）は、ブロック１８００２に対応するブロック１８００４においても人の目として認識できる。しかし、ブロック８０１及び８０２にまたがる人の目（右目）は、ブロック８０１及び８０２に対応するブロック８０３及び８０４においては破綻してしまい、人の目として認識することができない。 For example, when a high-resolution image (high-resolution image) 18003 is generated from the low-resolution face image 18001 as shown in FIG. 8, the human eye (left eye) included in the block 18002 is a block 18004 corresponding to the block 18002. Can be recognized as human eyes. However, the human eye (right eye) straddling blocks 801 and 802 fails in blocks 803 and 804 corresponding to blocks 801 and 802 and cannot be recognized as human eyes.

一方、低解像顔画像１８００１の顔位置を左にシフトした低解像顔画像１８００５から高解像画像１８００７を生成した時には、ブロック１８００６に含まれる人の目（右目）は、当該ブロックに対応するブロック１８００８においても人の目として認識できる。しかし、ブロック８０５及び８０６にまたがる人の目（左目）は、ブロック８０５及び８０６に対応するブロック８０７及び８０８においては破綻してしまい、人の目として認識することができない、というような現象が発生する。
以上のように、hallucination技術では目や口などの位置を所定の位置に合わせることが非常に重要であり、これができていない場合には画像の一部が破綻してしまうような現象が発生する。しなしながら、低解像顔画像に対して高精度な位置合わせをすることは困難である。そこで、第３の実施形態では、高解像化処理で複数の高解像顔画像を生成し、その中から最も類似度が高い領域を利用することによって、破綻していない領域だけを利用して顔認証を行う。以下で、その詳細を説明する。 On the other hand, when the high-resolution image 18007 is generated from the low-resolution face image 18005 obtained by shifting the face position of the low-resolution face image 18001 to the left, the human eye (right eye) included in the block 18006 corresponds to the block. In block 18008 to be recognized as human eyes. However, the phenomenon that the human eye (left eye) straddling blocks 805 and 806 fails in blocks 807 and 808 corresponding to blocks 805 and 806 and cannot be recognized as human eyes. To do.
As described above, in the hallucination technology, it is very important to align the position of the eyes and mouth with a predetermined position. If this is not done, a phenomenon may occur in which part of the image breaks down. . However, it is difficult to align the low-resolution face image with high accuracy. Therefore, in the third embodiment, a plurality of high-resolution face images are generated by the high-resolution processing, and the region with the highest similarity is used among them, so that only the region that has not failed is used. Face recognition. The details will be described below.

図９は、第３の実施形態における画像認識装置の機能構成の一例を示したブロック図である。図９に示すように、本実施形態の機能構成は、画像取得部１１０、顔位置検出部１２０、顔器官位置検出部１３０、顔器官位置設定部１４０、特徴抽出部１５２、及び顔画像登録部１６０、並びに高解像画像生成部３１０及び顔識別部３４０を含む。顔識別部３４０は、類似度算出部３２０及び類似度統合部３３０を含む。 FIG. 9 is a block diagram illustrating an example of a functional configuration of the image recognition apparatus according to the third embodiment. As shown in FIG. 9, the functional configuration of the present embodiment includes an image acquisition unit 110, a face position detection unit 120, a face organ position detection unit 130, a face organ position setting unit 140, a feature extraction unit 152, and a face image registration unit. 160, and a high-resolution image generation unit 310 and a face identification unit 340. The face identification unit 340 includes a similarity calculation unit 320 and a similarity integration unit 330.

画像取得部１１０、顔位置検出部１２０、顔器官位置検出部１３０、顔器官位置設定部１４０、及び顔画像登録部１６０は、第１の実施形態又は第２の実施形態における画像取得部１１０等と同様であるから説明を省略する。 The image acquisition unit 110, the face position detection unit 120, the face organ position detection unit 130, the face organ position setting unit 140, and the face image registration unit 160 are the image acquisition unit 110 or the like in the first embodiment or the second embodiment. Since it is the same as that of FIG.

高解像顔画像生成部３１０は、顔類似度が最も高くなる変換画像から、顔の大きさ、顔の位置、顔の向きの少なくとも１つが異なる複数の２回変換画像を生成する。そして、複数の２回変換画像の各２回変換画像から２回変換高解像顔画像（以下では、単に高解像顔画像とも言う）を生成する。 The high-resolution face image generation unit 310 generates a plurality of twice-converted images having at least one of a face size, a face position, and a face orientation from the converted image having the highest face similarity. Then, a twice-converted high-resolution face image (hereinafter, also simply referred to as a high-resolution face image) is generated from each twice-converted image of the plurality of twice-converted images.

特徴抽出部１５２は、
顔器官位置設定部１４０が設定した顔器官位置に基づいて、２回変換高解像顔画像の各顔器官の特徴量を抽出し、
顔画像登録部１６０が記憶している顔器官位置に基づいて、各登録画像の各顔器官の特徴量を抽出する。 The feature extraction unit 152
Based on the facial organ position set by the facial organ position setting unit 140, the feature amount of each facial organ of the twice converted high-resolution facial image is extracted,
Based on the facial organ position stored in the facial image registration unit 160, the feature amount of each facial organ of each registered image is extracted.

類似度算出部３２０は、２回変換高解像顔画像の各顔器官の特徴量と、各登録画像の各顔器官の特徴量との顔器官類似度を算出する。 The similarity calculation unit 320 calculates the facial organ similarity between the feature amount of each facial organ of the twice-converted high-resolution face image and the feature amount of each facial organ of each registered image.

類似度統合部３３０は、少なくとも
第１の２回変換高解像顔画像を用いて算出された第１の顔器官についての顔器官類似度の最高値と、
第２の２回変換高解像顔画像を用いて算出された第２の顔器官についての顔器官類似度の最高値と、を統合して統合顔器官類似度を得る。
第１の顔器官とは、例えば右目であり、第２の顔器官とは例えば左目である。 The similarity integration unit 330 includes at least the highest facial organ similarity for the first facial organ calculated using the first two-time converted high-resolution facial image,
The integrated facial organ similarity is obtained by integrating the highest value of the facial organ similarity for the second facial organ calculated using the second two-time converted high-resolution facial image.
The first facial organ is, for example, the right eye, and the second facial organ is, for example, the left eye.

ここで、「少なくとも」とは
第３の２回変換高解像顔画像を用いて算出された第３の顔器官（例えば、口）についての顔器官類似度の最高値や、
第４の２回変換高解像顔画像を用いて算出された第４の顔器官（右目、左目、口以外の顔器官）についての顔器官類似度の最高値と、を統合して統合顔器官類似度を得るとしても良い、という意味である。 Here, “at least” means the maximum value of the facial organ similarity for the third facial organ (for example, mouth) calculated using the third two-time transformed high-resolution facial image,
The integrated face obtained by integrating the maximum value of the facial organ similarity for the fourth facial organ (a facial organ other than the right eye, the left eye, and the mouth) calculated using the fourth two-time conversion high resolution facial image This means that organ similarity may be obtained.

「第１の２回変換高解像顔画像」とは「当該画像の第１の顔器官と登録画像の第１の顔器官との組合せ」から「第１の顔器官についての顔器官類似度の最高値」が得られた２回変換高解像画像ということを意味する。
また、「第２の２回変換高解像顔画像」とは「当該画像の第２の顔器官と登録画像の第２の顔器官との組合せ」から「第２の顔器官についての顔器官類似度の最高値」が得られた２回変換高解像画像ということを意味する。 The “first two-time conversion high-resolution face image” means “a combination of the first facial organ of the image and the first facial organ of the registered image” to “a facial organ similarity for the first facial organ” This means a twice-converted high-resolution image from which the “highest value” is obtained.
The “second two-time conversion high-resolution face image” means “the combination of the second facial organ of the image and the second facial organ of the registered image” to “the facial organ of the second facial organ” This means a twice-converted high-resolution image from which the “maximum similarity” is obtained.

ある「２回変換高解像顔画像」から「第１の顔器官についての顔器官類似度の最高値」が得られ、かつ
これと同じ「２回変換高解像顔画像」から「第２の顔器官についての顔器官類似度の最高値」が得られることもあり得る。この場合、「第１の２回変換高解像顔画像」と「第２の２回変換高解像顔画像」とは同一の「２回変換高解像顔画像」となる。 A “maximum value of facial organ similarity for the first facial organ” is obtained from a certain “two-time transformed high-resolution facial image”, and “second-transformed high-resolution facial image” is obtained from the same “second-transformed high-resolution facial image”. It is possible that the “maximum value of facial organ similarity for the facial organs” is obtained. In this case, the “first two-time conversion high-resolution face image” and the “second two-time conversion high-resolution face image” are the same “two-time conversion high-resolution face image”.

顔識別部３４０は、類似度統合部３３０が得た統合顔器官類似度（統合類似度）に基づいて、前記入力画像に写っている顔と前記各登録画像に写っている顔が同一人物の顔か識別する。 Based on the integrated facial organ similarity (integrated similarity) obtained by the similarity integrating unit 330, the face identifying unit 340 determines that the face shown in the input image and the face shown in each registered image are the same person. Identify face.

図１０、および図１１は、本発明の第３の実施形態における顔識別モードの処理の流れを示している。
顔登録モードは第１の実施形態の顔登録モード同様であるので、説明を省略する。 10 and 11 show the flow of processing in the face identification mode in the third embodiment of the present invention.
Since the face registration mode is the same as the face registration mode of the first embodiment, description thereof is omitted.

＜顔識別モード＞
図１０のステップＳ２１０２からステップＳ２３０２までの処理は図２のステップＳ１１０１からステップＳ１２０２までの処理と同様である。
ステップＳ２３０３以降を説明する前に本発明の第３の実施形態でのポイントを説明する。 <Face identification mode>
The processing from step S2102 to step S2302 in FIG. 10 is the same as the processing from step S1101 to step S1202 in FIG.
Before describing step S2303 and subsequent steps, points in the third embodiment of the present invention will be described.

＜第３の実施形態のポイント＞
上述で説明した通り、hallucination技術を用いた場合、目や口などの位置を所定の位置に合わせることができず、画像の一部が破綻してしまうような現象が発生する。そこで、第３の実施形態では、ステップＳ２３０３でシフト量、スケール、回転量の異なる複数の変換画像を生成し、生成した複数の変換画像に対して高解像化処理を行い、登録画像との類似度が高い領域だけを利用して顔認識を行う。その結果、破綻していない領域だけを利用して顔認証を行うことになる。 <Points of Third Embodiment>
As described above, when the hallucination technique is used, a phenomenon such that the position of eyes or mouth cannot be adjusted to a predetermined position and a part of the image breaks down occurs. Therefore, in the third embodiment, in step S2303, a plurality of converted images having different shift amounts, scales, and rotation amounts are generated, the resolution conversion processing is performed on the generated converted images, and the registered images are compared. Face recognition is performed using only regions with high similarity. As a result, face authentication is performed using only the unbroken area.

図１２は、複数の変換画像から高解像顔画像を生成し、生成した高解像顔画像のうちの破綻していない領域（ブロック）を利用して顔認証を実施するという処理の流れの概要を示す図である。
変換画像１８００１と変換画像１８００５はシフト量が異なる変換画像の例である。変換画像１８００１から生成した高解像顔画像１８００３のブロック１８００４と、変換画像１８００５から生成した高解像顔画像１８００７のブロック１８００８とは、登録画像との顔器官類似度が高い（つまり、破綻していない）領域の例である。 FIG. 12 shows a process flow in which a high-resolution face image is generated from a plurality of converted images, and face authentication is performed using an unbroken area (block) in the generated high-resolution face image. It is a figure which shows an outline.
The converted image 18001 and the converted image 18005 are examples of converted images having different shift amounts. The block 18004 of the high-resolution face image 18003 generated from the converted image 18001 and the block 18008 of the high-resolution face image 18007 generated from the converted image 18005 have high facial organ similarity with the registered image (that is, it has failed). This is an example of a region that is not.

破綻していないブロック１８００４の位置と登録画像１８００９のブロック１８０１１の位置が対応し、破綻していないブロック１８００８の位置とブロック１８０１０の位置とが対応する。そして、ブロック１８００４から抽出される特徴とブロック１８０１１から抽出される特徴との類似度、及びブロック１８００８から抽出される特徴とブロック１８０１０から抽出される特徴との類似度に基づいて顔を識別する。
以上が、第３の実施形態のポイントである。
続いて、第３の実施形態における処理を順次説明する。 The position of the block 18004 that does not fail corresponds to the position of the block 18011 of the registered image 18809, and the position of the block 18008 that does not fail corresponds to the position of the block 18010. Then, the face is identified based on the similarity between the feature extracted from the block 18004 and the feature extracted from the block 18011, and the similarity between the feature extracted from the block 18008 and the feature extracted from the block 18010.
The above is the point of the third embodiment.
Subsequently, processing in the third embodiment will be sequentially described.

＜第３の実施形態における処理の流れ＞
ステップＳ２３０３では、ステップＳ２３０２で生成された正規化画像からシフト量、スケール、回転量の異なる複数の変換画像を生成する。例えば、アフィン変換などを用いる。
図１３Ａに示すように、入力画像からスケール変換、シフト変換、回転変換などにより第１〜第３の変換画像を生成する。なお、スケール変換、シフト変換、回転変換のそれぞれによって複数の変換画像が生成されること、各変換を適宜組み合わせても良いことは第１、第２の実施形態と同様である。 <Processing Flow in Third Embodiment>
In step S2303, a plurality of converted images having different shift amounts, scales, and rotation amounts are generated from the normalized image generated in step S2302. For example, affine transformation is used.
As illustrated in FIG. 13A, first to third converted images are generated from an input image by scale conversion, shift conversion, rotation conversion, and the like. As in the first and second embodiments, a plurality of converted images are generated by scale conversion, shift conversion, and rotation conversion, and the conversions may be appropriately combined.

＜ペアの探索＞
ステップＳ２３０４では、図４のステップＳ１４０２と同様に、類似度が最も高くなる変換画像と登録画像とのペアを探索する。 <Search for pairs>
In step S2304, as in step S1402 of FIG. 4, a search is made for a pair of a converted image and a registered image having the highest similarity.

＜２回変換画像の生成＞
ステップＳ２３０５では、ステップＳ２３０４で類似度が最も高くなると判断されたペアの変換画像からシフト量、スケール、回転量の異なる複数の変換画像（以下、「２回変換画像」とも記載する）を生成する。
図１３Ａに示す例では、第１の変換画像が類似度が最も高くなった変換画像であり、その第１の変換画像から、第１の２回変換画像、第２の２回変換画像を生成している。
図１０のステップＳ２３０５におけるシフト量、スケール、回転量は、ステップＳ２３０３におけるシフト量、スケール、回転量よりも小さい。ステップＳ２３０３は顔全体の類似度を比較するための前処理であるのに対し、ステップＳ２３０５は顔全体と比べて小さい顔器官の類似度を比較するための前処理だからである。例えば、ステップＳ２３０３では１ｃｍ単位でシフトさせ、ステップＳ２３０５では１ｍｍ単位でシフトさせる。 <Generate twice converted image>
In step S2305, a plurality of converted images having different shift amounts, scales, and rotation amounts (hereinafter also referred to as “twice-converted images”) are generated from the pair of converted images determined to have the highest similarity in step S2304. .
In the example shown in FIG. 13A, the first converted image is the converted image having the highest similarity, and the first two-time converted image and the second two-time converted image are generated from the first converted image. doing.
The shift amount, scale, and rotation amount in step S2305 in FIG. 10 are smaller than the shift amount, scale, and rotation amount in step S2303. This is because step S2303 is a pre-process for comparing the similarity of the whole face, whereas step S2305 is a pre-process for comparing the similarity of a facial organ that is smaller than the entire face. For example, in step S2303, the shift is performed in units of 1 cm, and in step S2305, the shift is performed in units of 1 mm.

ステップＳ２４０１では、ステップＳ２３０５で生成した複数の変換画像に対して、上述したようなhallucination技術を適用することにより高解像化を行う。
図１３Ｂに示す例では、第１の２回変換画像を高解像化して第１の２回変換高解像画像とし、第２の２回変換画像を高解像化して第２の２回変換高解像画像とした。 In step S2401, high resolution is performed by applying the hallucination technique as described above to the plurality of converted images generated in step S2305.
In the example shown in FIG. 13B, the first two-time conversion image is increased in resolution to form a first two-time conversion high-resolution image, and the second two-time conversion image is increased in resolution to obtain the second two-time conversion. It was set as the conversion high resolution image.

＜特徴の抽出＞
ステップＳ２５０２では、ステップＳ２４０１で生成された全ての高解像顔画像に対して非特許文献１に記載されているようなＬＢＰ特徴を抽出する。
＜類似度の算出＞
ステップＳ２７０１では、高解像顔画像のあるブロックからステップＳ２５０２で抽出したＬＢＰ特徴と、ステップＳ１６０１（図２）で記憶した顔画像の同じブロックから抽出したＬＢＰ特徴との類似度をそれぞれ算出する。 <Feature extraction>
In step S2502, LBP features as described in Non-Patent Document 1 are extracted from all the high-resolution face images generated in step S2401.
<Calculation of similarity>
In step S2701, the similarity between the LBP feature extracted in step S2502 from a block having a high-resolution face image and the LBP feature extracted from the same block in the face image stored in step S1601 (FIG. 2) is calculated.

図１３Ｂに示す例では、
第１の２回変換画像のブロック１８０２２から抽出したＬＢＰ特徴と、第１の登録画像のブロック１８０１０から抽出したＬＢＰ特徴との類似度を算出し、
第２の２回変換画像のブロック１８００８から抽出したＬＢＰ特徴と、第１の登録画像のブロック１８０１０から抽出したＬＢＰ特徴との類似度を算出する。 In the example shown in FIG. 13B,
Calculating the similarity between the LBP feature extracted from the block 18022 of the first twice-converted image and the LBP feature extracted from the block 18010 of the first registered image;
The similarity between the LBP feature extracted from the block 18008 of the second twice-converted image and the LBP feature extracted from the block 18010 of the first registered image is calculated.

図１３Ｃに示す例では、
第１の２回変換画像のブロック１８００４から抽出したＬＢＰ特徴と、第１の登録画像のブロック１８０１１から抽出したＬＢＰ特徴との類似度を算出し、
第２の２回変換画像のブロック１８０１２から抽出したＬＢＰ特徴と、第１の登録画像のブロック１８０１１から抽出したＬＢＰ特徴との類似度を算出する。 In the example shown in FIG. 13C,
Calculating the similarity between the LBP feature extracted from the block 18004 of the first twice-converted image and the LBP feature extracted from the block 18011 of the first registered image;
The degree of similarity between the LBP feature extracted from the block 18012 of the second twice-converted image and the LBP feature extracted from the block 18011 of the first registered image is calculated.

＜類似度の統合＞
ステップＳ２７０２では、ブロック位置が同じ複数のペアの類似度の中から最も類似度の高いペアの類似度を選択し、選択された類似度を統合する。
例えば、図１３Ｂの高解像顔画像１８００３のブロック１８０２２と、登録画像１８００９のブロック１８０１０との類似度（顔器官類似度）と、
高解像顔画像１８００７のブロック１８００８と、登録画像１８００９のブロック１８０１０との類似度（顔器官類似度）と、の中から最も類似度の高いペア（ブロック１８００８とブロック１８０１０とのペア）の類似度を選択する。 <Similarity integration>
In step S2702, the similarity of the pair having the highest similarity is selected from the similarities of the plurality of pairs having the same block position, and the selected similarities are integrated.
For example, the similarity (facial organ similarity) between the block 18022 of the high-resolution face image 18003 in FIG. 13B and the block 18010 of the registered image 18809,
Similarity between the block 18008 of the high-resolution face image 18007 and the block 18010 of the registered image 18009 (face organ similarity), and the similarity of the pair having the highest similarity (pair of the block 18008 and block 18010) Select the degree.

同様に、図１３Ｃの高解像顔画像１８００３のブロック１８００４と、登録画像１８００９のブロック１８０１１との類似度（顔器官類似度）と、
高解像顔画像１８００７のブロック１８０１２と、登録画像１８００９のブロック１８０１１との類似度（顔器官類似度）と、の中から最も類似度の高いペア（ブロック１８００４とブロック１８０１１とのペア）の類似度を選択する。
そして、選択された類似度（ブロック１８００８とブロック１８０１０とのペアの類似度、ブロック１８００４とブロック１８０１１とのペアの類似度）を統合する。 Similarly, the similarity (facial organ similarity) between the block 18004 of the high-resolution face image 18003 in FIG. 13C and the block 18011 of the registered image 18009,
Similarity between the block 18012 of the high-resolution face image 18007 and the block 18011 of the registered image 18009 (face organ similarity), and the similarity between the pair having the highest similarity (pair of the block 18004 and the block 18011) Select the degree.
Then, the selected similarity (similarity of a pair of block 18008 and block 18010, similarity of a pair of block 18004 and block 18011) is integrated.

高解像顔画像１８００７のブロック１８００８と登録画像１８００９のブロック１８０１０とのペアの類似度を選択し、選択された類似度について後述する類似度統合を行う。
ステップＳ２８０１では、統合された類似度に基づいて顔識別を実行する。 The pair similarity between the block 18008 of the high-resolution face image 18007 and the block 18010 of the registered image 18809 is selected, and similarity integration described later is performed on the selected similarity.
In step S2801, face identification is performed based on the integrated similarity.

＜特徴抽出から類似度統合まで＞
図１１を用いてステップＳ２５０２〜ステップＳ２７０２について、詳しく説明する。
まず、ステップＳ３０１０では、顔器官の特徴を抽出する。
ステップＳ３０１１では、顔器官の類似度を算出する。
ステップＳ３０１２では、全ての２回変換高解像画像について顔器官類似度を算出していなければステップＳ３０１０に戻り、新たな２回変換高解像画像についてステップＳ３０１０以降の処理を繰り返す。全ての２回変換高解像画像について顔器官類似度を算出していればステップＳ３０１３に進む。 <From feature extraction to similarity integration>
Steps S2502 to S2702 will be described in detail with reference to FIG.
First, in step S3010, facial organ features are extracted.
In step S3011, the similarity between facial organs is calculated.
In step S3012, if the facial organ similarity is not calculated for all the twice-converted high-resolution images, the process returns to step S3010, and the processes after step S3010 are repeated for the new twice-converted high-resolution images. If the facial organ similarity is calculated for all the twice-converted high-resolution images, the process proceeds to step S3013.

例えば、２回変換高解像画像として第１の２回変換高解像画像から第ｎの２回変換高解像画像までが生成されたとする（ｎは任意の自然数）。
この場合、
第１の２回変換高解像画像の右目と第１の登録画像の右目との第１の類似度を算出し、
第２の２回変換高解像画像の右目と第１の登録画像の右目との第２の類似度を算出し、
：
第（ｎ−１）の２回変換高解像画像の右目と第１の登録画像の右目との第（ｎ−１）の類似度を算出し、
第ｎの２回変換高解像画像の右目と第１の登録画像の右目との第ｎの類似度を算出したら、ステップＳ３０１３に進む。 For example, it is assumed that from the first two-time conversion high-resolution image to the n-th two-time conversion high-resolution image are generated as the two-time conversion high-resolution image (n is an arbitrary natural number).
in this case,
Calculating a first similarity between the right eye of the first two-time converted high-resolution image and the right eye of the first registered image;
Calculating a second similarity between the right eye of the second two-time converted high-resolution image and the right eye of the first registered image;
:
Calculating the (n−1) th similarity between the right eye of the (n−1) th two-time conversion high resolution image and the right eye of the first registered image;
If the n-th similarity between the right eye of the n-th two-time conversion high-resolution image and the right eye of the first registered image is calculated, the process proceeds to step S3013.

ステップＳ３０１３では、顔器官類似度の最高値を記憶する。
例えば、前記の如く、２回変換高解像画像として第１の２回変換高解像画像から第ｎの２回変換高解像画像までが生成されたとする（ｎは任意の自然数）。
この場合、
右目についての第１の類似度、第２の類似度、・・・、第（ｎ−１）の類似度、第ｎの類似度の中での最高値を記憶する。 In step S3013, the highest facial organ similarity is stored.
For example, as described above, it is assumed that from the first two-time conversion high-resolution image to the n-th two-time conversion high-resolution image are generated as the two-time conversion high resolution image (n is an arbitrary natural number).
in this case,
The highest similarity among the first similarity, the second similarity,..., The (n−1) th similarity, and the nth similarity for the right eye is stored.

ステップＳ３０１４では、全ての顔器官について顔器官類似度の最高値を記憶していなければステップＳ３０１０に戻り、新たな顔器官に関して、ステップＳ３０１０以降の処理を繰り返す。
例えば、右目、左目及び口に関して、顔器官類似度を算出し、その最高値を記憶し、統合して顔識別をする場合に、右目に関しては最高値の記憶をしたが、左目に関してはまだ最高値を記憶していないとする。 In step S3014, if the maximum value of the facial organ similarity is not stored for all the facial organs, the process returns to step S3010, and the processing after step S3010 is repeated for the new facial organ.
For example, when the facial organ similarity is calculated for the right eye, left eye, and mouth, and the highest value is stored and integrated to identify the face, the highest value is stored for the right eye, but the highest value is still stored for the left eye. Assume that no value is stored.

この場合、
第１の２回変換高解像画像の左目と第１の登録画像の左目との第１の類似度を算出し、
第２の２回変換高解像画像の左目と第１の登録画像の左目との第２の類似度を算出し、
：
第ｎの２回変換高解像画像の左目と第１の登録画像の左目との第ｎの類似度を算出したら、左目についての第１の類似度、第２の類似度、・・・、第（ｎ−１）の類似度、第ｎの類似度の中での最高値を記憶する。 in this case,
Calculating a first similarity between the left eye of the first two-time converted high-resolution image and the left eye of the first registered image;
Calculating a second similarity between the left eye of the second two-time converted high-resolution image and the left eye of the first registered image;
:
When the n-th similarity between the left eye of the n-th two-time conversion high-resolution image and the left eye of the first registered image is calculated, the first similarity, the second similarity,. The highest value among the (n−1) -th similarity and the n-th similarity is stored.

その後、ステップＳ３０１０に戻り、口に関して、ステップＳ３０１０以降の処理を繰り返す。第１の２回変換高解像画像の口と第１の登録画像の口との第１の類似度、・・・第ｎの２回変換高解像画像の口と第１の登録画像の口との第ｎの類似度を算出し、第１〜第ｎの類似度の中での最高値を記憶する。 Then, it returns to step S3010 and repeats the process after step S3010 regarding a mouth. The first similarity between the mouth of the first two-time converted high-resolution image and the mouth of the first registered image, ... the mouth of the n-th two-time converted high-resolution image and the first registered image The n-th similarity with the mouth is calculated, and the highest value among the first to n-th similarities is stored.

全ての顔器官について顔器官類似度の最高値を記憶していればステップＳ３０１５に進む。
ステップＳ３０１５では、顔器官類似度を統合する。
例えば、右目についての顔器官類似度の最高値と、左目についての顔器官類似度の最高値と、口についての顔器官類似度の最高値とを合計する。 If the maximum value of the facial organ similarity is stored for all the facial organs, the process proceeds to step S3015.
In step S3015, the facial organ similarity is integrated.
For example, the highest value of the facial organ similarity for the right eye, the highest value of the facial organ similarity for the left eye, and the highest value of the facial organ similarity for the mouth are summed.

各顔器官類似度の最高値が、第１〜第ｎの２回変換高解像画像のどれとの組合せによって得られるかは顔器官に応じて異なり得る。
例えば、右目については、第１の２回変換高解像画像と第１の登録画像との類似度が最高値となり、
左目については第１の２回変換高解像画像と第１の登録画像との類似度が最高値となり、
口については第３の２回変換高解像画像と第１の登録画像との類似度が最高値となり得る。
又は、右目については、第１の２回変換高解像画像と第１の登録画像との類似度が最高値となり、
左目については第３の２回変換高解像画像と第１の登録画像との類似度が最高値となり、
口については第５の２回変換高解像画像と第１の登録画像との類似度が最高値となり得る。 Which of the first to n-th two-time converted high-resolution images is combined with the highest value of each facial organ similarity may differ depending on the facial organ.
For example, for the right eye, the similarity between the first two-time converted high resolution image and the first registered image is the highest value,
For the left eye, the similarity between the first two-time converted high resolution image and the first registered image is the highest value,
For the mouth, the similarity between the third twice-converted high-resolution image and the first registered image can be the highest value.
Or, for the right eye, the similarity between the first two-time converted high resolution image and the first registered image is the highest value,
For the left eye, the similarity between the third two-time conversion high resolution image and the first registered image is the highest value,
For the mouth, the similarity between the fifth twice-converted high-resolution image and the first registered image can be the highest value.

ステップＳ３０１６では、全ての登録画像について顔器官類似度を統合していなければステップＳ３０１０に戻り、新たな登録画像に関して、ステップＳ３０１０以降の処理を繰り返す。
全ての登録画像について、全ての顔器官について、全ての変換画像との組み合わせについて顔器官類似度を統合したら、ステップＳ２８０１に進み、顔識別を実行する。 In step S3016, if the facial organ similarity is not integrated for all the registered images, the process returns to step S3010, and the processes after step S3010 are repeated for the new registered image.
When the facial organ similarities are integrated for all the facial organs for all the registered images and the combinations with all the converted images, the process proceeds to step S2801, and face identification is executed.

前記の如く、第３の実施形態では、高解像化処理で複数の高解像顔画像を生成し、その中から最も類似度が高い領域を利用することによって、破綻していない領域だけを利用して顔認証を行う。それによって、正確な位置合わせが困難なために、hallucination技術を適用した、画像の一部が破綻してしまうような現象が発生した場合であっても、高精度の顔認証が可能となる。 As described above, in the third embodiment, a plurality of high-resolution face images are generated by high-resolution processing, and by using an area having the highest similarity among them, only an unbroken area is obtained. Use face authentication. As a result, since accurate alignment is difficult, even when a phenomenon in which the hallucination technique is applied and a part of the image breaks down occurs, highly accurate face authentication can be performed.

［その他の実施形態］
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 [Other Embodiments]
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１１０画像取得部
１２０顔位置検出部
１３０顔器官位置検出部
１４０顔器官位置設定部
１５０特徴抽出部
１６０顔画像登録部
１７０顔識別部

DESCRIPTION OF SYMBOLS 110 Image acquisition part 120 Face position detection part 130 Face organ position detection part 140 Face organ position setting part 150 Feature extraction part 160 Face image registration part 170 Face identification part

Claims

Registration means for storing a plurality of registered images in association with the facial organ positions of each facial organ shown in the registered image;
Comparing the entire face area of the input image and the entire face area of each registered image of the plurality of registered images to calculate a face similarity between the input image and the registered image, and based on the face similarity Searching for the registered image corresponding to the input image, and setting the face organ position stored in association with the corresponding registered image as the face organ position of the input image;
A feature amount of each facial organ of the input image is extracted based on the facial organ position set by the setting means, and each registered image based on the facial organ position stored by the registration means Feature extraction means for extracting feature quantities of facial organs;
A facial organ similarity between the facial organ feature quantity of the input image extracted by the feature extraction unit and the facial organ feature quantity of each registered image is calculated, and based on the calculated facial organ similarity degree Face identifying means for identifying whether the face shown in the input image and the face shown in each of the registered images are the faces of the same person;
An image recognition apparatus comprising:

The setting means includes
A plurality of converted images having at least one of a face size, a face position, and a face orientation are generated from the input image, and each of the generated converted images and each of the plurality of registered images Calculate facial similarity,
Search for a corresponding pair of the converted image and the registered image based on the face similarity,
The face organ position stored in association with the paired registered image is set as the face organ position of the paired converted image,
The feature extraction means includes
Based on the facial organ position set by the setting means, extract feature quantities of each facial organ of the paired conversion image,
Based on the facial organ position stored by the registration means, the feature amount of each facial organ of each registered image is extracted,
The face identification means includes
Calculating a facial organ similarity between the feature amount of each facial organ of the converted image extracted by the feature extraction unit and the feature amount of each facial organ of the registered image extracted by the feature extraction unit;
2. The image according to claim 1, wherein the face in the input image and the face in the registered images are identified as the face of the same person based on the calculated facial organ similarity. Recognition device.

The image recognition apparatus according to claim 1, wherein the input image has a low resolution, and the registered image has a high resolution.

Further comprising a generating means for generating a high-resolution face image from the paired converted images,
The feature extraction means includes
Based on the facial organ position set by the setting means, extract feature quantities of each facial organ of the high-resolution facial image,
Based on the facial organ position stored by the registration means, the feature amount of each facial organ of each registered image is extracted,
The face identification means includes
Calculating a facial organ similarity between the facial feature of each high resolution facial image and the facial feature of each registered image;
4. The method according to claim 2, wherein the face in the input image and the face in each of the registered images are identified based on the calculated face organ similarity. Image recognition device.

The generating means includes
Generating a plurality of twice-converted images in which at least one of a face size, a face position, and a face orientation is different from the converted image having the highest face similarity,
Generating a high-resolution face image from each twice-converted image of the plurality of twice-converted images;
The feature extraction means includes
Based on the facial organ position set by the setting means, extract feature quantities of each facial organ of the high-resolution facial image,
Based on the facial organ position stored by the registration means, the feature amount of each facial organ of each registered image is extracted,
The face identification means includes
Calculating a facial organ similarity between the facial feature of each high resolution facial image and the facial feature of each registered image;
A facial organ similarity for the first facial organ calculated using at least the first high-resolution facial image;
Based on the integrated similarity obtained by integrating the facial organ similarity for the second facial organ calculated using the second high-resolution facial image, the face reflected in the input image and the face The image recognition apparatus according to claim 4, wherein a face in each registered image is identified as a face of the same person.

A registration step of storing a plurality of registered images in association with the facial organ positions of each facial organ shown in the registered image;
Comparing the entire face area of the input image and the entire face area of each registered image of the plurality of registered images to calculate a face similarity between the input image and the registered image, and based on the face similarity Searching for the registered image corresponding to the input image, and setting the facial organ position stored in association with the corresponding registered image as the facial organ position of the input image;
A feature amount of each facial organ of the input image is extracted based on the facial organ position set in the setting step, and each registered image based on the facial organ position stored in the registration step A feature extraction step for extracting feature quantities of facial organs;
Calculating a facial organ similarity between the facial organ feature quantity of the input image extracted in the feature extraction step and the facial organ feature quantity of each registered image, and based on the calculated facial organ similarity degree And a face identification step of identifying whether the face shown in the input image is the same person's face as the face shown in each of the registered images.

The program for functioning a computer as each means of the image recognition apparatus of any one of Claims 1 thru | or 5.