JP2011053951A

JP2011053951A - Image processing apparatus

Info

Publication number: JP2011053951A
Application number: JP2009202799A
Authority: JP
Inventors: Kotaro Yano; 光太郎矢野; Satoru Yashiro; 哲八代; Yasuhiro Ito; 靖浩伊藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2009-09-02
Filing date: 2009-09-02
Publication date: 2011-03-17
Anticipated expiration: 2029-09-02
Also published as: JP5578816B2

Abstract

<P>PROBLEM TO BE SOLVED: To improve the accuracy of recognition of a subject in a moving image. <P>SOLUTION: An image processing apparatus extracts an area in which image information is changed between a previous frame (first frame) concerned with a moving image and a subsequent frame (second frame) subsequent to the previous frame, sets a search area of the subsequent frame based on a subject area concerned with the detection of the subject in the previous frame and the extracted area, and discriminates the subject in the set search area of the subsequent frame. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、動画像に係る画像処理装置及び画像処理方法に関する。 The present invention relates to an image processing apparatus and an image processing method for moving images.

従来、静止画像から特定の被写体パターンを自動的に検出する画像処理方法は、非常に有用であり、例えば人間の顔の判定に利用されている。このような画像処理方法は、通信会議、マン・マシン・インタフェース、セキュリティ、人間の顔を追跡するためのモニタ・システム、画像圧縮等の多くの分野で使用することができる。 2. Description of the Related Art Conventionally, an image processing method for automatically detecting a specific subject pattern from a still image is very useful, and is used for, for example, determination of a human face. Such an image processing method can be used in many fields such as a teleconference, a man-machine interface, security, a monitor system for tracking a human face, and image compression.

近年、動画像からの被写体の検出が行われており、動画像に対して顔の検出を実時間に行うために、時間的に変化していない領域の判定を行い、その領域を顔の検出処理から除外する方法が開示されている（例えば、特許文献１参照）。 In recent years, detection of a subject from a moving image has been performed. In order to detect a face in a moving image in real time, an area that has not changed in time is determined, and the area is detected. A method of excluding from processing is disclosed (for example, see Patent Document 1).

特開２００５−１７４３５２号公報JP 2005-174352 A

しかしながら、上述した方法では、動画像における被写体の認識に係る画像処理時間の短縮には効果があるが、動画像における被写体の認識の精度を向上させることが困難である。 However, the above-described method is effective in shortening the image processing time related to the recognition of the subject in the moving image, but it is difficult to improve the accuracy of the recognition of the subject in the moving image.

本発明はこのような問題点に鑑みなされたもので、動画像における被写体の認識の精度を向上させることを目的とする。 The present invention has been made in view of such problems, and an object thereof is to improve the accuracy of recognition of a subject in a moving image.

そこで、本発明は、動画像に係る第１のフレームと該第１のフレームに後続する第２のフレームとの間で、画像情報が変化している領域を抽出する抽出手段と、前記第１のフレームでの被写体の検出に関する被写体領域と前記抽出手段で抽出された領域とに基づいて、前記第２のフレームの探索領域を設定する設定手段と、前記設定手段で設定された前記第２のフレームの探索領域内で被写体を判別する判別手段と、を有することを特徴とする。 Therefore, the present invention provides an extraction unit that extracts an area in which image information changes between a first frame relating to a moving image and a second frame following the first frame, and the first unit Setting means for setting the search area of the second frame based on the subject area relating to the detection of the subject in the frame and the area extracted by the extracting means, and the second set by the setting means Discriminating means for discriminating a subject within a frame search area.

本発明によれば、動画像における被写体の認識の精度を向上させることができる。 According to the present invention, it is possible to improve the accuracy of recognition of a subject in a moving image.

画像処理装置の構成を示す図である。It is a figure which shows the structure of an image processing apparatus. 初期フレームに関する処理に係るフローチャートを示す図である。It is a figure which shows the flowchart which concerns on the process regarding an initial stage frame. 照合パターンの一例を示す図である。It is a figure which shows an example of a collation pattern. 後続フレームに関する処理に係るフローチャートを示す図である。It is a figure which shows the flowchart which concerns on the process regarding a subsequent frame.

以下、本発明の実施形態について図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１の（ａ）は、本実施形態に係る画像処理装置のハードウェア構成を示す図である。画像処理装置は、ＣＰＵ（Central Processing Unit）１、記憶装置２、入力装置３、出力装置４、及び撮像装置５を含んで構成される。なお、各装置は、互いに通信可能に構成され、バス等により接続されている。 FIG. 1A is a diagram illustrating a hardware configuration of the image processing apparatus according to the present embodiment. The image processing apparatus includes a CPU (Central Processing Unit) 1, a storage device 2, an input device 3, an output device 4, and an imaging device 5. Each device is configured to be able to communicate with each other, and is connected by a bus or the like.

ＣＰＵ１は、画像処理装置の動作をコントロールし、記憶装置２に格納されたプログラムの実行等を行う。
記憶装置２は、磁気記憶装置、半導体メモリ等のストレージデバイスであり、ＣＰＵ１の動作に基づき読み込まれたプログラム、長時間記憶しなくてはならないデータ等を記憶する。
本実施形態では、ＣＰＵ１が、記憶装置２に格納されたプログラムの手順に従って処理を行うことによって、画像処理装置における機能及び後述するフローチャートに係る処理が実現される。 The CPU 1 controls the operation of the image processing apparatus and executes a program stored in the storage device 2.
The storage device 2 is a storage device such as a magnetic storage device or a semiconductor memory, and stores a program read based on the operation of the CPU 1, data that must be stored for a long time, and the like.
In the present embodiment, the CPU 1 performs processing according to the procedure of the program stored in the storage device 2, thereby realizing functions in the image processing apparatus and processing according to flowcharts described later.

入力装置３は、マウス、キーボード、タッチパネルデバイス、ボタン等であり、各種の指示を入力する。
出力装置４は、液晶パネル、外部モニタ、スピーカ等であり、各種の情報を出力する。
撮像装置５は、カムコーダ等であり、CCD（Charge Coupled Devices）、CMOS（Complementary Metal Oxide Semiconductor）等の撮像素子を備える。なお、撮像装置５で撮像された動画像データは、記憶装置２等に記憶される。また、動画像は、一連の複数のフレームを含んで構成され、各フレームに対応する静止画像を有している。 The input device 3 is a mouse, a keyboard, a touch panel device, a button, or the like, and inputs various instructions.
The output device 4 is a liquid crystal panel, an external monitor, a speaker, or the like, and outputs various types of information.
The imaging device 5 is a camcorder or the like, and includes an imaging device such as a CCD (Charge Coupled Devices) or a CMOS (Complementary Metal Oxide Semiconductor). Note that moving image data captured by the imaging device 5 is stored in the storage device 2 or the like. The moving image includes a series of a plurality of frames, and has a still image corresponding to each frame.

なお、画像処理装置のハードウェア構成は、これに限られるものではない。例えば、画像処理装置は、各種の装置間で通信を行うためのＩ／Ｏ装置を備えてもよい。なお、Ｉ／Ｏ装置は、メモリーカード、ＵＳＢケーブル等の入出力部、有線又は無線による送受信部等である。 Note that the hardware configuration of the image processing apparatus is not limited to this. For example, the image processing device may include an I / O device for performing communication between various devices. The I / O device is an input / output unit such as a memory card or a USB cable, a wired or wireless transmission / reception unit, and the like.

図１の（ｂ）は、本実施形態に係る画像処理装置の機能構成を示す図である。画像処理装置の処理及び機能は、画像入力部１０、画像メモリ部２０、画像縮小部３０、照合パターン抽出部４０、輝度補正部５０、顔判別部６０、顔確率分布統合部７０、顔領域出力部８０、変化領域抽出部９０、及び探索領域設定部１００により実現される。 FIG. 1B is a diagram illustrating a functional configuration of the image processing apparatus according to the present embodiment. The processing and functions of the image processing apparatus are as follows: image input unit 10, image memory unit 20, image reduction unit 30, matching pattern extraction unit 40, brightness correction unit 50, face discrimination unit 60, face probability distribution integration unit 70, face area output. Unit 80, change area extraction unit 90, and search area setting unit 100.

画像入力部１０は、撮像装置５により撮像された動画像データを読込み、動画像データからフレームごとに画像データを抽出し、抽出した画像データを画像メモリ部２０に入力する。なお、画像入力部１０は、動画像データを記憶する記憶媒体から動画像データを読み込む構成でもよい。また、画像入力部１０は、インターネット等を介してサーバ等に記憶された動画像データを読み込む構成でもよい。
画像メモリ部２０は、記憶装置２に設けられる記憶領域である。画像メモリ部２０は、画像入力部１０から出力された画像データを一時的に記憶する。なお、画像メモリ部２０が動画像データを一時的に記憶する構成を採用してもよい。 The image input unit 10 reads the moving image data captured by the imaging device 5, extracts image data for each frame from the moving image data, and inputs the extracted image data to the image memory unit 20. The image input unit 10 may be configured to read moving image data from a storage medium that stores moving image data. The image input unit 10 may be configured to read moving image data stored in a server or the like via the Internet or the like.
The image memory unit 20 is a storage area provided in the storage device 2. The image memory unit 20 temporarily stores the image data output from the image input unit 10. Note that the image memory unit 20 may temporarily store moving image data.

画像縮小部３０は、画像メモリ部２０に記憶されている動画像データの各フレームに対応する画像データを所定の倍率に従って縮小し、大きさの異なる複数の縮小画像を出力する。
照合パターン抽出部４０は、画像縮小部３０で縮小された画像データから所定の部分領域を照合対象のパターン（いわゆる照合パターン）として抽出する。
輝度補正部５０は、照合パターン抽出部４０で抽出された照合パターンの輝度分布を補正する。
顔判別部６０は、照合パターン抽出部４０で抽出され、輝度補正部５０で補正された照合パターンが顔パターンであるか非顔パターンであるかを判別するための顔確率を出力する。 The image reduction unit 30 reduces the image data corresponding to each frame of the moving image data stored in the image memory unit 20 according to a predetermined magnification, and outputs a plurality of reduced images having different sizes.
The collation pattern extraction unit 40 extracts a predetermined partial area from the image data reduced by the image reduction unit 30 as a collation target pattern (so-called collation pattern).
The brightness correction unit 50 corrects the brightness distribution of the matching pattern extracted by the matching pattern extraction unit 40.
The face discriminating unit 60 outputs a face probability for discriminating whether the collation pattern extracted by the collation pattern extraction unit 40 and corrected by the luminance correction unit 50 is a face pattern or a non-face pattern.

顔確率分布統合部７０は、複数の縮小画像から抽出された各部分領域に対応した顔確率の分布を保持し、複数のフレーム間の顔確率の分布を統合する。
顔領域出力部８０は、顔確率分布統合部７０による統合の結果に基づいて顔と判別される照合パターンに対応する部分領域を出力装置４等に出力する。 The face probability distribution integration unit 70 holds a distribution of face probabilities corresponding to each partial area extracted from a plurality of reduced images, and integrates the distribution of face probabilities between a plurality of frames.
The face area output unit 80 outputs a partial area corresponding to a matching pattern that is identified as a face based on the result of integration by the face probability distribution integration unit 70 to the output device 4 or the like.

変化領域抽出部９０は、被写体の動きにより、フレーム間での画像データ（画像情報）が変化する領域（変化領域）を出力する。
探索領域設定部１００は、前フレームにおける顔確率分布統合部７０の結果及び変化領域抽出部９０で抽出されたフレーム間の変化領域から後フレームにおける探索領域を設定する。 The change area extraction unit 90 outputs an area (change area) in which image data (image information) changes between frames due to the movement of the subject.
The search region setting unit 100 sets a search region in the subsequent frame from the result of the face probability distribution integration unit 70 in the previous frame and the change region between frames extracted by the change region extraction unit 90.

図２は、画像処理装置の処理に係るフローチャートを示す図である。図２を参照して、初期フレームにおける画像処理装置の動作について説明する。 FIG. 2 is a flowchart illustrating processing performed by the image processing apparatus. The operation of the image processing apparatus in the initial frame will be described with reference to FIG.

まず、画像入力部１０は、処理の対象とする画像データを画像メモリ部２０に入力する（ステップＳ１０１）。
ここで、入力される画像データは、例えば８ビットの画素により構成される２次元配列のデータであり、Ｒ、Ｇ、Ｂ、３つの面により構成される。このとき、画像データがＪＰＥＧ（Joint Photographic Experts Group）等の方式により圧縮されている場合は、圧縮に対応する解凍方式に従って画像データを解凍し、ＲＧＢ各画素により構成される画像データとする。
さらに、本実施形態では、画像入力部１０は、ＲＧＢ各画素により構成される画像データに含まれる輝度データより、輝度成分からなる輝度画像データ（例えば、色差成分を排した画像データ）を生成し、以後の処理に適用する。なお、輝度画像データは、画像メモリ部２０に記憶される。また、画像データとしてＹＣｒＣｂのデータを入力する場合は、Ｙ成分をそのまま輝度データとして採用し、輝度画像データが生成されてもよい。 First, the image input unit 10 inputs image data to be processed into the image memory unit 20 (step S101).
Here, the input image data is, for example, two-dimensional array data composed of 8-bit pixels, and is composed of R, G, B, and three surfaces. At this time, when the image data is compressed by a method such as JPEG (Joint Photographic Experts Group), the image data is decompressed in accordance with a decompression method corresponding to the compression, and the image data is configured by RGB pixels.
Further, in the present embodiment, the image input unit 10 generates luminance image data including luminance components (for example, image data excluding color difference components) from luminance data included in image data configured by RGB pixels. This applies to subsequent processing. The luminance image data is stored in the image memory unit 20. When YCrCb data is input as image data, the Y component may be directly used as luminance data, and luminance image data may be generated.

次に、画像縮小部３０は、輝度画像データを画像メモリ部２０から読み込み、所定の倍率に縮小した輝度画像データ（いわゆる縮小輝度画像データ）を生成する（ステップＳ１０２）。本実施形態では、所定の倍率に縮小した輝度画像データを生成して、複数のサイズの輝度画像データに対して順次検出を行う構成（例えば、参考文献１を参照のこと。）を採用しているので、様々な大きさの顔を検出できる。例えば、画像縮小部３０は、倍率が1.2倍程度異なる画像への縮小処理を複数回行う。
参考文献１：Rowley et al, "Neural network-based face detection", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.20 , NO.1, JANUARY 1998 Next, the image reducing unit 30 reads the luminance image data from the image memory unit 20, and generates luminance image data (so-called reduced luminance image data) reduced to a predetermined magnification (step S102). In the present embodiment, a configuration is adopted in which luminance image data reduced to a predetermined magnification is generated and the luminance image data of a plurality of sizes is sequentially detected (see, for example, Reference 1). Therefore, faces of various sizes can be detected. For example, the image reduction unit 30 performs a reduction process to images different in magnification by about 1.2 times a plurality of times.
Reference 1: Rowley et al, "Neural network-based face detection", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.20, NO.1, JANUARY 1998

ここで、参考文献１では、ニューラルネットワークにより画像中の顔パターンを検出する方法が提案されている。以下、参考文献１による顔検出の方法について簡単に説明する。
まず、顔の検出を対象とする画像データがメモリに読み込まれ、顔と照合する所定の領域が読み込まれた画像から切り出される。そして、切り出された領域の画素値の分布を入力としてニューラルネットワークによる演算で一つの出力が得られる。このとき、ニューラルネットワークの重み及び閾値が、膨大な顔画像パターンと非顔画像パターンとにより予め学習されている。例えば、ニューラルネットワークの出力が０以上なら顔、それ以外は非顔であると判別される。そして、ニューラルネットワークの入力である顔と照合する画像パターンの切り出し位置が、例えば、画像全域から縦横順次に走査されていくことにより、画像中から顔が検出される。また、様々な大きさの顔の検出に対応するため、読み込んだ画像が所定の割合で順次縮小され、それに対して前述した顔検出の走査が行われるようにしている。 Here, Reference 1 proposes a method of detecting a face pattern in an image using a neural network. Hereinafter, a face detection method according to Reference 1 will be briefly described.
First, image data targeted for face detection is read into the memory, and a predetermined area to be matched with the face is cut out from the read image. Then, one output can be obtained by calculation using a neural network with the distribution of pixel values in the cut out region as an input. At this time, the weights and threshold values of the neural network are learned in advance using a huge number of face image patterns and non-face image patterns. For example, if the output of the neural network is 0 or more, it is determined that the face is non-face. Then, the face is detected from the image by, for example, scanning the cutout position of the image pattern to be collated with the face which is an input of the neural network in the vertical and horizontal directions from the entire image. Further, in order to cope with detection of faces of various sizes, the read images are sequentially reduced at a predetermined rate, and the above-described face detection scanning is performed on the images.

次に、照合パターン抽出部４０は、縮小された輝度画像データから所定の大きさの部分領域を照合パターンとして抽出し、設定する（ステップＳ１０３）。
ここで、図３を参照して、照合パターンについて説明する。
図３に示すＡの列は、画像縮小部３０で縮小された夫々の縮小輝度画像を示し、ステップＳ１０３では、夫々の縮小輝度画像に対して所定の大きさの部分領域（例えば、矩形領域）が切り出される。すなわち、各縮小輝度画像には、同じ大きさの矩形領域が設定され、矩形領域が照合パターンとして順次抽出される。
また、図３に示すＢの列は、夫々の縮小輝度画像から縦横順次に走査を繰り返していく途中の切り出しの様子を示している。図示するように、縮小率の大きな画像から照合パターンを切り出して顔の判別を行う場合には、画像に対して大きな領域において顔の検出を行うことになる。 Next, the collation pattern extraction unit 40 extracts and sets a partial area having a predetermined size from the reduced luminance image data as a collation pattern (step S103).
Here, the collation pattern will be described with reference to FIG.
The column A shown in FIG. 3 shows each reduced luminance image reduced by the image reduction unit 30. In step S103, a partial area (for example, a rectangular area) having a predetermined size with respect to each reduced luminance image. Is cut out. That is, a rectangular area having the same size is set in each reduced luminance image, and the rectangular areas are sequentially extracted as a collation pattern.
In addition, the column B shown in FIG. 3 shows a state of clipping in the middle of repeating scanning in the vertical and horizontal directions from each reduced luminance image. As shown in the drawing, when a face is discriminated by extracting a collation pattern from an image with a large reduction ratio, the face is detected in a large region with respect to the image.

次に、輝度補正部５０は、照合パターン抽出部４０で切り出された部分領域の輝度をその分布をもとに正規化する（ステップＳ１０４）。例えば、輝度補正部５０は、ヒストグラム平滑化等の輝度補正を行う。撮像される被写体パターンがその照明条件によって輝度分布が変わるとしても、輝度を補正することにより、被写体における照合の精度が劣化するのを極力低減できる。 Next, the brightness correction unit 50 normalizes the brightness of the partial area cut out by the matching pattern extraction unit 40 based on the distribution (step S104). For example, the luminance correction unit 50 performs luminance correction such as histogram smoothing. Even if the luminance distribution of the subject pattern to be captured changes depending on the illumination condition, it is possible to reduce the accuracy of collation in the subject as much as possible by correcting the luminance.

次に、顔判別部６０は、照合パターン抽出部４０で抽出され、輝度補正部５０で補正された照合パターンが顔パターンであるか非顔パターンであるかを判別すると共に、顔パターンであるか否か示す指標として顔確率を算出する（ステップＳ１０５）。
ここで、顔判別の方法は、公知の方法（例えば、参考文献１、２、３を参照のこと。）を用いてもよい。
参考文献２：Schneiderman and Kanade, "A statistical method for 3D object detection applied to faces and cars", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2000)
参考文献３：Viola and Jones, "Rapid Object Detection using Boosted Cascade of Simple Features", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'01) Next, the face discriminating unit 60 discriminates whether the collation pattern extracted by the collation pattern extraction unit 40 and corrected by the luminance correction unit 50 is a face pattern or a non-face pattern, and whether it is a face pattern. A face probability is calculated as an index indicating whether or not (step S105).
Here, a known method (for example, refer to References 1, 2, and 3) may be used as the face discrimination method.
Reference 2: Schneiderman and Kanade, "A statistical method for 3D object detection applied to faces and cars", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR2000)
Reference 3: Viola and Jones, "Rapid Object Detection using Boosted Cascade of Simple Features", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'01)

ここで、参考文献２では、照合パターンの顔確率を複数の見え方に関する統計的分布の統合モデルとして捉えて判別に係る処理を行っている。
また、参考文献３では、処理の高速化に着目し、AdaBoostを使って多くの弱判別器を有効に組合せて顔判別の精度を向上させる一方、夫々の弱判別器をHaarタイプの矩形特徴量で構成し、矩形特徴量の算出を、積分画像を利用して高速に行っている。また、AdaBoost学習によって得た判別器を直列に繋ぎ、カスケード型の顔検出器を構成するようにしている。このカスケード型の顔検出器は、まず前段の単純な判別器（すなわち計算量のより少ない判別器）を使って明らかに顔でないパターンの候補をその場で除去する。そして、それ以外の候補に対してのみ、より高い識別性能を持つ後段の複雑な判別器（すなわち計算量のより多い判別器）を使って顔であるか否かの判定を行なう。このように、すべての候補に対して複雑な判定を行う必要がないので処理が高速となる。 Here, in Reference Document 2, the face probability of the matching pattern is regarded as an integrated model of statistical distributions regarding a plurality of appearances, and processing related to discrimination is performed.
Reference 3 focuses on speeding up the processing and uses AdaBoost to effectively combine many weak classifiers to improve the accuracy of face discrimination, while each weak classifier is a Haar-type rectangular feature. The rectangular feature value is calculated at high speed using the integral image. Also, the discriminators obtained by AdaBoost learning are connected in series to form a cascade type face detector. This cascade type face detector first removes a pattern candidate that is clearly not a face on the spot by using a simple classifier (that is, a classifier having a smaller calculation amount) in the preceding stage. Then, only for other candidates, it is determined whether or not it is a face by using a later complex discriminator having higher discrimination performance (that is, a discriminator having a larger calculation amount). In this way, since it is not necessary to make a complicated determination for all candidates, the processing becomes faster.

例えば、本実施形態では、ニューラルネットワークの出力値を顔確率の値として出力するようにする。ただし、顔確率として出力する値の精度を上げるために、ニューラルネットワークの出力値そのものでなく、ニューラルネットワークの出力値と顔確率の値との関係をテーブルに予め記憶しておき、テーブルを参照して顔確率の値を出力するようにしてもよい。なお、テーブルは、十分な数の顔画像パターンを予め用意し、用意したパターンのニューラルネットワークの出力値の統計的分布に基づいて作成することができる。
また、顔判別部として複数の判別器によって顔の判別を行う場合には、複数の判別器の出力値の加重平均等を算出して顔確率の値を出力するようにしてもよい。 For example, in this embodiment, the output value of the neural network is output as the face probability value. However, in order to increase the accuracy of the value output as the face probability, the relationship between the output value of the neural network and the value of the face probability is stored in advance in a table instead of the output value itself of the neural network, and the table is referred to. Then, the face probability value may be output. The table can be prepared based on the statistical distribution of the output values of the prepared neural network with a sufficient number of face image patterns prepared in advance.
When a face is discriminated by a plurality of discriminators as the face discriminating unit, a weighted average or the like of output values of the plurality of discriminators may be calculated to output a face probability value.

そして、顔確率分布統合部７０は、顔判別部６０で得た部分領域の顔確率を、予め記憶されている顔確率分布における対応する部分領域の値と統合して、顔確率分布を更新する（ステップＳ１０６）。
例えば、縮小倍率ｓ、切り出し位置ｘ、ｙにおける顔判別部６０で得た顔確率をP（s,x,y）とする。ここで、顔確率分布における、縮小倍率ｓ、切り出し位置ｘ、ｙに対応する値（この場合、初期値で所定の値が設定されている）をP_OLD（s,x,y）とすると、統合後の顔確率分布における値P_NEW（s,x,y）は、以下の式（１）により算出される。ただし、αは、所定の統合パラメータであり、0＜α＜１を満たす。 Then, the face probability distribution integration unit 70 integrates the face probability of the partial area obtained by the face determination unit 60 with the value of the corresponding partial area in the face probability distribution stored in advance, and updates the face probability distribution. (Step S106).
For example, let P (s, x, y) be the face probability obtained by the face discrimination unit 60 at the reduction ratio s and the cutout positions x and y. Here, if the value corresponding to the reduction ratio s and the cutout position x, y (in this case, a predetermined value is set as an initial value) in the face probability distribution is P _OLD (s, x, y), A value P _NEW (s, x, y) in the integrated face probability distribution is calculated by the following equation (1). However, α is a predetermined integration parameter and satisfies 0 <α <1.

P_NEW（s,x,y）＝α・P（s,x,y）＋（1−α）・P_OLD（s,x,y）・・・（１） P _NEW (s, x, y) = α • P (s, x, y) + (1−α) • P _OLD (s, x, y) (1)

ここで、ステップＳ１０３からステップＳ１０６までの処理が、画像縮小部３０の出力である各縮小輝度画像に対して図３に示すように縦横順次に走査が繰り返される。また、倍率が異なる縮小処理が順次適用され、ステップＳ１０２からステップＳ１０６までの処理が繰り返される。 Here, in the processing from step S103 to step S106, scanning is repeated in the vertical and horizontal order as shown in FIG. 3 for each reduced luminance image output from the image reduction unit 30. In addition, reduction processes with different magnifications are sequentially applied, and the processes from step S102 to step S106 are repeated.

そして、顔領域出力部８０は、予め定められた複数種類の倍率での全ての縮小輝度画像において探索が終了した後、顔確率分布統合部７０によって更新された顔確率分布における値が所定の値以上で、かつ、顔確率分布内で極大値（局所的な極大値であってもよい。）をとる領域を顔領域（被写体領域）として出力装置４等に出力する（ステップＳ１０７）。換言するならば、顔確率分布が縮小輝度画像ごとに設けられているので、顔領域出力部８０は、縮小輝度画像ごとに顔領域を出力装置４等に出力し得る。 Then, the face area output unit 80 determines that the value in the face probability distribution updated by the face probability distribution integration unit 70 is a predetermined value after the search is completed in all reduced luminance images at a plurality of predetermined magnifications. The region having the maximum value (may be a local maximum value) in the face probability distribution is output to the output device 4 or the like as a face region (subject region) (step S107). In other words, since the face probability distribution is provided for each reduced luminance image, the face area output unit 80 can output the face area for each reduced luminance image to the output device 4 or the like.

次に、図４を参照して、初期フレームの検出結果を用いて行う後続フレームにおける画像処理装置の動作について説明する。 Next, with reference to FIG. 4, the operation of the image processing apparatus in the subsequent frame performed using the detection result of the initial frame will be described.

まず、画像入力部１０は、初期フレーム（例えば、第１のフレーム）に後続する後続フレーム（例えば、第２のフレーム）に対応する輝度画像データを画像メモリ部２０に入力する（ステップＳ２０１）。 First, the image input unit 10 inputs luminance image data corresponding to a subsequent frame (for example, a second frame) subsequent to an initial frame (for example, a first frame) to the image memory unit 20 (step S201).

次に、抽出手段の一例である変化領域抽出部９０は、被写体の動きによってフレーム間で輝度データが変化する領域（変化領域）を出力する（ステップＳ２０２）。
例えば、変化領域抽出部９０は、各フレームの輝度画像の輝度値の差分演算を行い、輝度値の差が所定の閾値を超える画素が含まれる領域を変化領域とする。
より詳細には、変化領域抽出部９０は、変化領域の画素と変化領域以外の領域の画素とを識別するために２値化処理を行う。さらに、画像ノイズの影響を低減するために、変化領域抽出部９０は、２値化処理後の輝度画像データを所定の近傍領域の輝度画像データをもとに平滑化処理を繰り返し、所定面積以上の変化領域に統合していく。なお、変化領域抽出部９０は、例えば、２値化処理後の輝度画像データに対してモルフォロジー的オープニング処理を行って変化領域を算出するようにしてもよい。
また、変化領域抽出部９０は、各フレームの輝度画像の輝度値の差分をもとに変化領域を算出する構成に加えて又は代えて、各フレームのカラー画像から肌色確率の分布を抽出し、各フレームにおける肌色確率の差分をもとに変化領域を算出するようにしてもよい。なお、肌色確率は、肌色の確率分布を表した混合ガウスモデルにより行う方法（例えば、参考文献４を参照のこと。）等を適用して算出する。
参考文献４：Jones and Rehg, "Statistical color models with application to skin detection", International Journal of Computer Vision, VOL.46, NO.1, JANUARY 2002 Next, the change area extraction unit 90, which is an example of an extraction unit, outputs an area (change area) in which the luminance data changes between frames due to the movement of the subject (step S202).
For example, the change area extraction unit 90 performs a difference calculation of the luminance value of the luminance image of each frame, and sets an area including pixels whose luminance value difference exceeds a predetermined threshold as the change area.
More specifically, the change area extraction unit 90 performs binarization processing in order to identify pixels in the change area and pixels in areas other than the change area. Further, in order to reduce the influence of the image noise, the change area extraction unit 90 repeats the smoothing process on the luminance image data after the binarization process based on the luminance image data of a predetermined neighboring area, so that the predetermined area or more Will be integrated into the changing areas. Note that the change area extraction unit 90 may calculate the change area by performing a morphological opening process on the luminance image data after the binarization process, for example.
Further, the change area extraction unit 90 extracts the skin color probability distribution from the color image of each frame, in addition to or instead of the configuration of calculating the change area based on the difference in the luminance value of the luminance image of each frame, You may make it calculate a change area | region based on the difference of the skin color probability in each flame | frame. Note that the skin color probability is calculated by applying a method (for example, see Reference 4) performed by a mixed Gaussian model representing the skin color probability distribution.
Reference 4: Jones and Rehg, "Statistical color models with application to skin detection", International Journal of Computer Vision, VOL.46, NO.1, JANUARY 2002

また、固定カメラによって撮影された動画像の場合は、以上の簡単な処理により変化領域の抽出が可能であるが、カメラが固定されていない場合には、変化領域が必ずしも動被写体領域に対応しないので、以下の処理によって領域を抽出するようにしてもよい。この場合、まず、変化領域抽出部９０は、複数フレームの輝度画像データから画像全域で動きベクトルの抽出を行う。そして、変化領域抽出部９０は、動きベクトルの分布をもとにカメラの運動パラメータを算出する。さらに、変化領域抽出部９０は、動きベクトルの分布から算出したカメラの運動パラメータに相当する移動量の補正を行い、カメラの動きと分離した被写体の動きによる動き成分を抽出する。そして、変化領域抽出部９０は、この被写体の動きによる動き成分が所定の値以上の領域を変化領域とする。さらに、変化領域抽出部９０は、抽出された変化領域に対して、前述したような画像ノイズの影響を考慮した平滑化の処理を行う。
なお、複数フレームの輝度画像データから動きベクトルを抽出して、カメラの運動パラメータを算出する方法については、公知の方法（例えば、参考文献５を参照のこと。）を用いることができる。
参考文献５：武川、宮島，「時系列画像からの３次元運動と形状解析」，コンピュータビジョン技術評論と将来展望，新技術コミュニケーションズ（１９９８） In the case of a moving image shot by a fixed camera, the change area can be extracted by the above simple process. However, if the camera is not fixed, the change area does not necessarily correspond to the moving subject area. Therefore, you may make it extract an area | region by the following processes. In this case, first, the change area extraction unit 90 extracts a motion vector over the entire image from the luminance image data of a plurality of frames. Then, the change area extraction unit 90 calculates camera motion parameters based on the motion vector distribution. Furthermore, the change area extraction unit 90 corrects the movement amount corresponding to the camera motion parameter calculated from the motion vector distribution, and extracts a motion component due to the subject motion separated from the camera motion. Then, the change area extraction unit 90 sets an area in which the motion component due to the movement of the subject is a predetermined value or more as the change area. Further, the change area extraction unit 90 performs a smoothing process on the extracted change area in consideration of the influence of the image noise as described above.
As a method for extracting motion vectors from luminance image data of a plurality of frames and calculating camera motion parameters, a known method (for example, see Reference 5) can be used.
Reference 5: Takekawa, Miyajima, “Three-dimensional motion and shape analysis from time-series images”, Computer vision technology review and future prospects, New Technology Communications (1998)

次に、設定手段の一例である探索領域設定部１００は、前フレームにおける顔領域出力部８０の結果（例えば顔領域）及び変化領域抽出部９０で抽出されたフレーム間の変化領域から後フレームにおける探索領域を設定する（ステップＳ２０３）。
より具体的に説明すると、まず、探索領域設定部１００は、顔領域出力部８０で出力された領域を第１の探索領域として抽出する。そして、探索領域設定部１００は、変化領域抽出部９０で抽出されたフレーム間の変化領域を第２の探索領域として抽出する。ここで、縮小処理が行われている場合には、探索領域設定部１００は、縮小処理により得られた輝度画像における領域に同様の縮小処理により変化領域を縮小した領域を設定し、第２の探索領域として抽出する。そして、探索領域設定部１００は、第１の探索領域と第２の探索領域との論理和を探索領域として設定する。
なお、第1の探索領域として用いる領域（例えば顔領域）は、顔確率分布統合部７０の出力である顔確率分布における値が所定の値以上の領域を全て用いるようにしてもよい。また、第1の探索領域は、顔領域に対する所定の近傍領域を含めた領域を出力するようにしてもよい。例えば、近傍領域は、顔領域に接する部分領域である。 Next, the search area setting unit 100, which is an example of a setting unit, in the subsequent frame from the result of the face area output unit 80 in the previous frame (for example, the face area) and the change area between the frames extracted by the change area extraction unit 90. A search area is set (step S203).
More specifically, the search area setting unit 100 first extracts the area output by the face area output unit 80 as a first search area. Then, the search area setting unit 100 extracts the change area between frames extracted by the change area extraction unit 90 as the second search area. Here, when the reduction process is performed, the search area setting unit 100 sets the area obtained by reducing the change area by the same reduction process as the area in the luminance image obtained by the reduction process, and the second Extract as a search area. Then, the search area setting unit 100 sets the logical sum of the first search area and the second search area as the search area.
It should be noted that as the region used as the first search region (for example, the face region), all regions in which the value in the face probability distribution that is the output of the face probability distribution integration unit 70 is a predetermined value or more may be used. Further, the first search area may output an area including a predetermined neighborhood area with respect to the face area. For example, the neighborhood area is a partial area in contact with the face area.

次に、画像縮小部３０は、輝度画像データを画像メモリ部２０から読み込み、所定の倍率に縮小した輝度画像データを生成する（ステップＳ２０４）。 Next, the image reducing unit 30 reads the luminance image data from the image memory unit 20, and generates luminance image data reduced to a predetermined magnification (step S204).

次に、照合パターン抽出部４０は、縮小された輝度画像データから抽出する所定の大きさの部分領域を設定し（ステップＳ２０５）、探索領域設定部１００で設定された探索領域に属する領域であるか否かを判定する（ステップＳ２０６）。ここで、探索領域に属する領域とは、探索領域の一部又は全部を含む部分領域をいう。 Next, the collation pattern extraction unit 40 sets a partial area of a predetermined size to be extracted from the reduced luminance image data (step S205), and is an area belonging to the search area set by the search area setting unit 100. It is determined whether or not (step S206). Here, the area belonging to the search area refers to a partial area including a part or all of the search area.

そして、ステップＳ２０５で設定した部分領域が探索領域に属する場合には、照合パターン抽出部４０は、探索領域に属する部分領域を照合パターンとして抽出し、設定する（ステップＳ２０７）。ステップＳ２０５で設定した部分領域が探索領域に属さない場合には、ステップＳ２０５に処理が戻り、縦横順次に走査が繰り返される。なお、一の輝度画像又は一の縮小輝度画像について走査が終了した場合には、ステップＳ２０４に処理が戻る。 If the partial area set in step S205 belongs to the search area, the matching pattern extraction unit 40 extracts and sets the partial area belonging to the search area as a matching pattern (step S207). If the partial area set in step S205 does not belong to the search area, the process returns to step S205, and scanning is repeated in the vertical and horizontal order. Note that if scanning is completed for one luminance image or one reduced luminance image, the process returns to step S204.

次に、輝度補正部５０は、ステップＳ１０４の処理と同様に、照合パターン抽出部４０で切り出された部分領域の輝度をその分布をもとに正規化する（ステップＳ２０８）。 Next, the luminance correction unit 50 normalizes the luminance of the partial region cut out by the matching pattern extraction unit 40 based on the distribution, similarly to the processing of step S104 (step S208).

次に、判別手段の一例である顔判別部６０は、ステップＳ１０５の処理と同様に処理を行う（ステップＳ２０９）。すなわち顔判別部６０は、照合パターン抽出部４０で抽出され、輝度補正部５０で補正された照合パターン（探索領域内で判別される被写体）が顔パターンであるか非顔パターンであるかを判別すると共に、顔確率を算出する。 Next, the face determination unit 60, which is an example of a determination unit, performs a process similar to the process of step S105 (step S209). That is, the face discriminating unit 60 discriminates whether the collation pattern (subject to be discriminated in the search area) extracted by the collation pattern extraction unit 40 and corrected by the luminance correction unit 50 is a face pattern or a non-face pattern. In addition, the face probability is calculated.

そして、統合手段の一例である顔確率分布統合部７０は、ステップＳ１０６の処理と同様に処理を行う（ステップＳ２１０）。すなわち顔確率分布統合部７０は、顔確率（後フレームの探索領域内での被写体の判別結果）と前フレームまでで算出した顔確率分布の部分領域が対応する値（後フレームの探索領域に対応する領域での被写体の判別結果）と統合して顔確率分布を更新する。顔判別部６０で得た顔確率をP（s,x,y）、顔確率分布における値（この場合、前フレームまでの顔確率分布の統合結果）をP_OLD（s,x,y）とすると、統合後の顔確率分布における値P_NEW（s,x,y）は、上述の式（１）により算出される。 Then, the face probability distribution integration unit 70, which is an example of an integration unit, performs the same process as the process of step S106 (step S210). That is, the face probability distribution integration unit 70 corresponds to a value (corresponding to the search area of the subsequent frame) corresponding to the partial area of the face probability distribution calculated up to the previous frame and the face probability (discrimination result of the subject in the search area of the subsequent frame). The face probability distribution is updated in combination with the subject discrimination result in the area to be processed. The face probability obtained by the face discriminating unit 60 is P (s, x, y), and the value in the face probability distribution (in this case, the integrated result of the face probability distribution up to the previous frame) is P _OLD (s, x, y). Then, the value P _NEW (s, x, y) in the face probability distribution after the integration is calculated by the above equation (1).

以上、ステップＳ２０５からステップＳ２１０までの処理が、画像縮小部３０の出力である各縮小輝度画像に対して縦横順次に走査が繰り返される。また、倍率が異なる縮小処理が順次適用され、ステップＳ２０４からステップＳ２１０までの処理が繰り返される。すなわち、縮小輝度画像ごとに探索領域として設定した領域内において顔パターンの探索が行われる。 As described above, the processing from step S205 to step S210 is repeated in the vertical and horizontal order for each reduced luminance image output from the image reduction unit 30. In addition, reduction processes with different magnifications are sequentially applied, and the processes from step S204 to step S210 are repeated. That is, the search for the face pattern is performed in the area set as the search area for each reduced luminance image.

そして、出力手段の一例である顔領域出力部８０は、ステップＳ１０７の処理と同様に処理を行う（ステップＳ２１１）。すなわち、顔領域出力部８０は、統合された結果に基づいて被写体に関する領域を出力する。より具体的には、顔領域出力部８０は、顔確率分布統合部７０によって更新された顔確率分布における値が所定の値以上で、かつ、顔確率分布内で極大値をとる領域を顔領域として出力装置４等に出力する。
なお、ステップＳ２０１からステップＳ２１１までの処理は、動画像データにおける全フレームにて終了するまで順次繰り返される。 Then, the face area output unit 80, which is an example of an output unit, performs a process similar to the process of step S107 (step S211). That is, the face area output unit 80 outputs an area related to the subject based on the integrated result. More specifically, the face region output unit 80 determines a region in which the value in the face probability distribution updated by the face probability distribution integration unit 70 is equal to or greater than a predetermined value and has a maximum value in the face probability distribution. Is output to the output device 4 or the like.
Note that the processing from step S201 to step S211 is sequentially repeated until the processing is completed for all frames in the moving image data.

なお、被写体の動きがフレーム間隔に比べ遅い場合には、必ずしも全フレームにおいて処理を行わなくてもよく、所定フレーム間隔で検出処理（例えば、参考文献６を参照のこと。）を行うようにしてもよい。
参考文献６：Mikolajczyk et al, "Face detection in a video sequence - a temporal approarch", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'01) Note that if the movement of the subject is slower than the frame interval, the processing does not necessarily have to be performed for all frames, and detection processing (for example, see Reference 6) is performed at a predetermined frame interval. Also good.
Reference 6: Mikolajczyk et al, "Face detection in a video sequence-a temporal approarch", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'01)

ここで、参考文献６では、参考文献２に記載の手法をもとに、所定フレームの顔検出結果から次フレームの顔の状態を予測し、それに顔の判別処理を適用して顔検出結果を更新する方法を提案している。また、５フレームごとに全探索を行う方法等を提案している。 Here, in Reference Document 6, based on the method described in Reference Document 2, the face state of the next frame is predicted from the face detection result of a predetermined frame, and the face detection result is applied to the face determination process. Proposes a way to update. Also, a method for performing a full search every 5 frames is proposed.

また、本実施形態では、被写体パターンとして人物の顔を検出するようにしたが、人物とは異なるその他の被写体のパターンを採用してもよい。 In the present embodiment, the face of a person is detected as the subject pattern, but other subject patterns different from the person may be adopted.

＜その他の実施形態＞
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 <Other embodiments>
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

以上、上述した各実施形態によれば、動画像における被写体の認識の精度を向上させることができるようになる。
つまり、各実施形態の構成によれば、高速、高精度に動画像から所定の被写体を検出する装置を提供することができる。すなわち、被写体領域とフレーム間変化領域とに探索領域を絞ることにより高速化が実現できる。さらに、探索領域として被写体領域を含むため、フレーム間の時間的変化が小さい場合にも対応できる。また、探索領域としてフレーム間の変化領域も含むため、新しい被写体の出現にも対応できる。さらに、複数フレームでの検出結果を統合するので、フレームごとに被写体検出を行う場合に比べて安定して高精度の検出が行える。 As described above, according to each of the embodiments described above, it is possible to improve the accuracy of subject recognition in a moving image.
That is, according to the configuration of each embodiment, it is possible to provide an apparatus that detects a predetermined subject from a moving image with high speed and high accuracy. That is, speeding up can be realized by narrowing the search area to the subject area and the inter-frame change area. Further, since the subject area is included as the search area, it is possible to cope with a case where a temporal change between frames is small. Further, since the search area includes a change area between frames, it is possible to cope with the appearance of a new subject. Furthermore, since the detection results in a plurality of frames are integrated, the detection can be performed stably and with high accuracy compared to the case of subject detection for each frame.

以上、本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to such specific embodiments, and various modifications can be made within the scope of the gist of the present invention described in the claims.・ Change is possible.

１０画像入力部、２０画像メモリ部、３０画像縮小部、４０照合パターン抽出部、５０輝度補正部、６０顔判別部、７０顔確率分布統合部、８０顔領域出力部、９０変化領域抽出部、１００探索領域設定部 10 image input unit, 20 image memory unit, 30 image reduction unit, 40 collation pattern extraction unit, 50 brightness correction unit, 60 face discrimination unit, 70 face probability distribution integration unit, 80 face region output unit, 90 change region extraction unit, 100 Search area setting section

Claims

Extraction means for extracting a region in which image information changes between a first frame relating to a moving image and a second frame subsequent to the first frame;
Setting means for setting a search area for the second frame based on a subject area relating to detection of a subject in the first frame and an area extracted by the extracting means;
Discriminating means for discriminating a subject within the search area of the second frame set by the setting means;
An image processing apparatus.

A result of the determination of the subject in the search region of the second frame by the determination means, and a result of determination of the subject in the region of the first frame corresponding to the search region of the second frame. Integration means to integrate;
Output means for outputting a subject area related to detection of a subject in the second frame based on the result of integration by the integration means;
The image processing apparatus according to claim 1, further comprising:

An extraction step of extracting an area in which image information changes between a first frame relating to a moving image and a second frame subsequent to the first frame;
A setting step for setting a search region for the second frame based on a subject region related to detection of a subject in the first frame and the region extracted in the extraction step;
A discriminating step for discriminating a subject within the search area of the second frame set in the setting step;
An image processing method.

An extraction step of extracting an area in which image information changes between a first frame relating to a moving image and a second frame subsequent to the first frame;
A setting step for setting a search region for the second frame based on a subject region related to detection of a subject in the first frame and the region extracted in the extraction step;
A discriminating step for discriminating a subject within the search area of the second frame set in the setting step;
A program that causes a computer to execute.