JP2011150594A

JP2011150594A - Image processor and image processing method, and program

Info

Publication number: JP2011150594A
Application number: JP2010012258A
Authority: JP
Inventors: Kazuki Aisaka; 一樹相坂; Masatoshi Yokokawa; 昌俊横川; Atsushi Murayama; 淳村山
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2010-01-22
Filing date: 2010-01-22
Publication date: 2011-08-04

Abstract

PROBLEM TO BE SOLVED: To track an object in a more simple, quick and stable manner. SOLUTION: A flatness determination part 51 determines whether an input image is a flat image, based on dispersion of pixel values of pixels in respective regions of the input image. When the input image is not flat, a tracking part 52 detects movement between frames of the input image to detect an object to be tracked from the input image. When the input image is flat, the tracking part 53 extracts feature quantities of features from the input image, generates an object map indicating the likeness of the object in respective regions of the input image from the feature quantities and detects the object from the input image by using the object map. Since the object is detected by a different tracking method according to whether the input image is flat, the object can be more stably tracked. This invention is applicable to an imaging apparatus. COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、画像処理装置および方法、並びにプログラムに関し、特に、より簡単かつ迅速に、安定して被写体をトラッキングできるようにした画像処理装置および方法、並びにプログラムに関する。 The present invention relates to an image processing apparatus and method, and a program, and more particularly, to an image processing apparatus and method, and a program capable of tracking a subject more simply, quickly and stably.

近年、シャッタ操作がされる前にユーザに対して提示される、いわゆるプレビュー画像を撮像する場合など、複数の画像が連続して撮像される場合に、撮像された画像上にある、ユーザにより選択された被写体を追尾する機能を有する撮像装置が知られている。 In recent years, when a plurality of images are captured continuously, such as when capturing a so-called preview image that is presented to the user before the shutter operation is performed, the user selects a captured image on the captured image. There is known an imaging apparatus having a function of tracking a captured subject.

このような被写体を追尾（トラッキング）する撮像装置には、シリコン網膜を利用してトラッキングをするもの（例えば、特許文献１参照）や、深度感知撮像技術を利用してトラッキングを行なうものがある（例えば、特許文献２参照）。 Such an imaging device that tracks (tracks) a subject includes tracking using a silicon retina (see, for example, Patent Document 1) and tracking using a depth sensing imaging technique ( For example, see Patent Document 2).

また、Lucas-Kanadeアルゴリズムを利用して被写体をトラッキングする手法も提案されている（例えば、非特許文献１参照）。この手法では、画像から特徴点を検出し、この特徴点を追跡することにより、被写体のトラッキングが実現される。 In addition, a method of tracking a subject using the Lucas-Kanade algorithm has been proposed (for example, see Non-Patent Document 1). In this method, subject tracking is realized by detecting a feature point from an image and tracking the feature point.

特開２００４−２４０５９１号公報JP 2004-240591 A 特表２００７−５１４２１１号公報Special table 2007-51411 gazette

Jean-Yves Bouguet「Pyramidal Implementation of the Lucas Kanade Feature Tracker Description of the algorithm」Intel Corporation Microprocessor Research Labs (2000) OpenCV DocumentsJean-Yves Bouguet `` Pyramidal Implementation of the Lucas Kanade Feature Tracker Description of the algorithm '' Intel Corporation Microprocessor Research Labs (2000) OpenCV Documents

しかしながら、シリコン網膜や深度感知撮像技術を利用する手法では、シリコン網膜や深度カメラといった特殊な器具が必要となる。そのため、カメラ等の一般的な撮像装置での被写体のトラッキングを実現することは困難であった。 However, the technique using the silicon retina and depth sensing imaging technology requires special instruments such as a silicon retina and a depth camera. For this reason, it has been difficult to realize tracking of a subject with a general imaging device such as a camera.

また、Lucas-Kanadeアルゴリズムを利用したトラッキング手法では、特徴点の検出やトラッキングの処理量が多く、時間がかかってしまう。さらに、Lucas-Kanadeアルゴリズムを利用したトラッキング手法では、被写体の形状変化が激しい場合などには、安定してトラッキングを行なうことができなかった。例えば、被写体としての人が、歩いている状態からしゃがんだ場合、これまで特徴として用いられていた足の部分の特徴量が得られなくなってしまう。そのため、その後、被写体としての人が立ち上がって、再び歩き出した場合には、足以外の部分しかトラッキングできなくなってしまう。 In addition, the tracking method using the Lucas-Kanade algorithm requires a large amount of feature point detection and tracking, which takes time. Furthermore, with the tracking method using the Lucas-Kanade algorithm, tracking could not be performed stably, for example, when the subject's shape changes drastically. For example, when a person as a subject squats down from a walking state, the feature amount of the foot portion that has been used as a feature until now cannot be obtained. Therefore, when a person as a subject stands up and walks again, only the part other than the foot can be tracked.

本発明は、このような状況に鑑みてなされたものであり、より簡単かつ迅速に、安定して被写体をトラッキングすることができるようにするものである。 The present invention has been made in view of such circumstances, and makes it possible to track a subject more simply, quickly and stably.

本発明の一側面の画像処理装置は、連続する複数フレームの入力画像のそれぞれについて、前記入力画像から被写体を検出する画像処理装置であって、処理対象の現フレームの前記入力画像と、前記現フレームよりも時間的に前の前フレームにおける追尾対象の被写体の検出結果とに基づいて、前記現フレームの前記入力画像から、前記追尾対象の被写体を検出するトラッキング手段と、前記入力画像から予め定められた第１の特徴の特徴量を抽出して、前記特徴量に基づいて前記入力画像の特性を特定するとともに、前記入力画像の特性に応じて、互いに異なる方法により前記入力画像から前記追尾対象の被写体を検出する複数の前記トラッキング手段のうちの何れかに、前記追尾対象の被写体を検出させる切り替え手段とを備える。 An image processing apparatus according to an aspect of the present invention is an image processing apparatus that detects a subject from an input image for each of a plurality of consecutive frames of an input image, the input image of the current frame to be processed, and the current image Based on the detection result of the subject to be tracked in the previous frame temporally before the frame, tracking means for detecting the subject to be tracked from the input image of the current frame, and predetermined from the input image The feature amount of the first feature is extracted, the characteristics of the input image are specified based on the feature amount, and the tracking target is extracted from the input image by different methods according to the characteristics of the input image. One of the plurality of tracking means for detecting the subject is provided with switching means for detecting the subject to be tracked.

前記切り替え手段には、前記特徴量として前記入力画像の各領域における画素の画素値の分散値を抽出して、前記分散値から前記入力画像の平坦さの度合いを示す平坦度を算出する平坦度算出手段と、前記平坦度から前記入力画像が、空間方向の画素値の変化の少ない平坦な画像であるか否かを判定し、その判定結果に応じて複数の前記トラッキング手段のうちの何れかに、前記追尾対象の被写体を検出させる判定手段とを設けることができる。 The switching means extracts a variance value of pixel values of pixels in each region of the input image as the feature amount, and calculates a flatness indicating a flatness degree of the input image from the variance value It is determined whether the input image is a flat image with little change in the pixel value in the spatial direction from the calculation means and the flatness, and any one of the plurality of tracking means according to the determination result And determining means for detecting the subject to be tracked.

前記トラッキング手段には、前記入力画像が平坦な画像でない場合、前記前フレームの前記入力画像上の前記追尾対象の被写体が含まれる被写体領域と、前記現フレームの前記入力画像とを用いた動き検出を行なって、前記被写体領域の動きを求めることにより、前記現フレームの前記入力画像上の前記追尾対象の被写体の領域を検出する第１のトラッキング手段と、前記入力画像が平坦な画像である場合、前記現フレームの前記入力画像から、複数の第２の特徴の特徴量を抽出し、前記第２の特徴の特徴量から前記入力画像の各領域における被写体らしさを示す被写体マップを生成するとともに、前記被写体マップにより特定される前記入力画像における被写体らしい領域のうち、前記前フレームの前記被写体領域と同じ位置の領域が含まれる領域を、前記現フレームの前記入力画像上の前記追尾対象の被写体の領域として検出する第２のトラッキング手段とを設けることができる。 When the input image is not a flat image, the tracking means uses a subject area including the subject to be tracked on the input image of the previous frame and a motion detection using the input image of the current frame. When the input image is a flat image, the first tracking means for detecting the region of the subject to be tracked on the input image of the current frame by calculating the movement of the subject region Extracting a feature quantity of a plurality of second features from the input image of the current frame, generating a subject map indicating the subjectivity in each region of the input image from the feature quantities of the second feature, Among the regions that are likely to be subjects in the input image specified by the subject map, include regions at the same position as the subject region in the previous frame That region, the may be provided a second tracking means for detecting a region of the tracking target subject on the input image of the current frame.

前記切り替え手段には、前記現フレームの前記入力画像から人の顔を検出する顔検出手段をさらに設け、前記トラッキング手段には、前記現フレームの前記入力画像からの前記人の顔の検出結果に基づいて、前記現フレームの前記入力画像から検出された前記人の顔の領域のうち、前記前フレームの前記被写体領域と最も近い領域を前記現フレームの前記入力画像上の前記追尾対象の被写体の領域として検出する第３のトラッキング手段をさらに設け、前記顔検出手段には、前記入力画像から前記人の顔が検出された場合、前記第３のトラッキング手段に前記追尾対象の被写体を検出させ、前記平坦度算出手段には、前記入力画像から前記人の顔が検出されなかった場合、前記平坦度を算出させることができる。 The switching means further includes a face detection means for detecting a human face from the input image of the current frame, and the tracking means includes a detection result of the human face from the input image of the current frame. Based on the area of the person's face detected from the input image of the current frame, the area closest to the subject area of the previous frame is defined as the tracking target object on the input image of the current frame. Third tracking means for detecting as an area is further provided, and when the human face is detected from the input image, the face detecting means causes the third tracking means to detect the subject to be tracked, The flatness calculating means can calculate the flatness when the human face is not detected from the input image.

前記切り替え手段には、前記特徴量として前記入力画像の画素の色成分を抽出し、前記前フレームの前記入力画像上の前記追尾対象の被写体が含まれる被写体領域における画素の色の分布を示す前景ヒストグラムを生成する前景ヒストグラム生成手段と、前記特徴量として前記入力画像の画素の色成分を抽出し、前記前フレームの前記入力画像における前記被写体領域を除く領域の画素の色の分布を示す背景ヒストグラムを生成する背景ヒストグラム生成手段と、前記前景ヒストグラムと前記背景ヒストグラムとの類似の度合いを示す距離に応じて、複数の前記トラッキング手段のうちの何れかに、前記追尾対象の被写体を検出させる判定手段とを設けることができる。 The switching means extracts a color component of a pixel of the input image as the feature amount, and indicates a color distribution of the pixel in a subject area including the subject to be tracked on the input image of the previous frame. A foreground histogram generating means for generating a histogram, and a background histogram showing a color distribution of pixels in a region excluding the subject region in the input image of the previous frame by extracting a color component of the pixel of the input image as the feature amount A background histogram generation unit that generates a tracking histogram, and a determination unit that causes any one of the plurality of tracking units to detect the subject to be tracked according to a distance indicating a degree of similarity between the foreground histogram and the background histogram. And can be provided.

前記トラッキング手段には、前記距離が所定の閾値以下である場合、前記現フレームの前記入力画像の各領域の被写体の輪郭を示す輪郭画像において、前記前フレームの前記被写体領域の被写体の輪郭を示す前景輪郭画像と最も類似の度合いの高い領域を検索することにより、前記現フレームの前記入力画像上の前記追尾対象の被写体を検出する第４のトラッキング手段と、前記距離が前記閾値より大きい場合、前記現フレームの前記入力画像において、前記前景ヒストグラムと最も類似の度合いの高い、色の分布を示すヒストグラムが得られる領域を検索することにより、前記現フレームの前記入力画像上の前記追尾対象の被写体を検出する第５のトラッキング手段とを設けることができる。 When the distance is equal to or smaller than a predetermined threshold, the tracking means indicates a contour of the subject in the subject area of the previous frame in a contour image showing a contour of the subject in each region of the input image of the current frame. A fourth tracking means for detecting a subject to be tracked on the input image of the current frame by searching a region having the highest degree of similarity to a foreground contour image, and when the distance is greater than the threshold, In the input image of the current frame, the tracking target object on the input image of the current frame is searched by searching for an area in which a histogram showing a color distribution having the highest degree of similarity to the foreground histogram is obtained. And a fifth tracking means for detecting.

本発明の一側面の画像処理方法またはプログラムは、連続する複数フレームの入力画像のそれぞれについて、前記入力画像から被写体を検出する画像処理方法またはプログラムであって、前記入力画像から予め定められた特徴の特徴量を抽出して、前記特徴量に基づいて前記入力画像の特性を特定するとともに、前記入力画像の特性に応じて、互いに異なる方法により前記入力画像から追尾対象の被写体を検出する複数のトラッキング手段のうちの何れかに、前記追尾対象の被写体を検出させ、前記トラッキング手段が、処理対象の現フレームの前記入力画像と、前記現フレームよりも時間的に前の前フレームにおける前記追尾対象の被写体の検出結果とに基づいて、前記現フレームの前記入力画像から、前記追尾対象の被写体を検出するステップを含む。 An image processing method or program according to an aspect of the present invention is an image processing method or program for detecting a subject from an input image for each of a plurality of consecutive frames of an input image, wherein the image processing method or program is predetermined from the input image. A plurality of feature amounts of the input image based on the feature amounts, and a subject to be tracked is detected from the input image by a different method according to the characteristics of the input image. Either of the tracking means detects the subject to be tracked, and the tracking means detects the input image of the current frame to be processed and the tracking target in the previous frame temporally before the current frame. On the basis of the detection result of the subject in the current frame from the input image of the current frame. Tsu, including the flop.

本発明の一側面においては、連続する複数フレームの入力画像のそれぞれについて、前記入力画像から被写体を検出する場合に、前記入力画像から予め定められた特徴の特徴量が抽出されて、前記特徴量に基づいて前記入力画像の特性が特定されるとともに、前記入力画像の特性に応じて、互いに異なる方法により前記入力画像から追尾対象の被写体を検出する複数のトラッキング手段のうちの何れかにより、前記追尾対象の被写体が検出され、前記トラッキング手段により、処理対象の現フレームの前記入力画像と、前記現フレームよりも時間的に前の前フレームにおける前記追尾対象の被写体の検出結果とに基づいて、前記現フレームの前記入力画像から、前記追尾対象の被写体が検出される。 In one aspect of the present invention, for each of a plurality of consecutive frames of input images, when a subject is detected from the input image, a feature amount of a predetermined feature is extracted from the input image, and the feature amount The characteristics of the input image are specified based on the input image, and depending on the characteristics of the input image, any one of a plurality of tracking units that detect a tracking target subject from the input image by different methods, A tracking target subject is detected, and the tracking unit is configured to detect the tracking target subject in a previous frame temporally prior to the current frame based on the input image of the current frame to be processed. The subject to be tracked is detected from the input image of the current frame.

本発明の一側面によれば、より簡単かつ迅速に、安定して被写体をトラッキングすることができる。 According to one aspect of the present invention, a subject can be tracked more simply, quickly and stably.

本発明を適用した画像処理装置の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the image processing apparatus to which this invention is applied. 画像処理装置の構成例を示す図である。It is a figure which shows the structural example of an image processing apparatus. トラッキング部の構成例を示す図である。It is a figure which shows the structural example of a tracking part. トラッキング部の構成例を示す図である。It is a figure which shows the structural example of a tracking part. 被写体抽出部の構成例を示す図である。It is a figure which shows the structural example of a to-be-extracted part. 輝度情報抽出部の構成例を示す図である。It is a figure which shows the structural example of a brightness | luminance information extraction part. 色情報抽出部の構成例を示す図である。It is a figure which shows the structural example of a color information extraction part. エッジ情報抽出部の構成例を示す図である。It is a figure which shows the structural example of an edge information extraction part. 顔情報抽出部の構成例を示す図である。It is a figure which shows the structural example of a face information extraction part. トラッキング処理を説明するフローチャートである。It is a flowchart explaining a tracking process. 平坦度算出処理を説明するフローチャートである。It is a flowchart explaining a flatness calculation process. 動き検出による被写体検出処理を説明するフローチャートである。It is a flowchart explaining a subject detection process by motion detection. 動き検出による被写体の検出について説明する図である。It is a figure explaining the detection of the to-be-photographed object by a motion detection. ビジュアルアテンションによる被写体検出処理を説明するフローチャートである。It is a flowchart explaining the object detection process by visual attention. ビジュアルアテンションによる被写体の検出について説明する図である。It is a figure explaining the detection of the subject by visual attention. 被写体マップ生成処理を説明するフローチャートである。It is a flowchart explaining a subject map generation process. 輝度情報抽出処理を説明するフローチャートである。It is a flowchart explaining a luminance information extraction process. 色情報抽出処理を説明するフローチャートである。It is a flowchart explaining a color information extraction process. エッジ情報抽出処理を説明するフローチャートである。It is a flowchart explaining an edge information extraction process. 顔情報抽出処理を説明するフローチャートである。It is a flowchart explaining face information extraction processing. 画像処理装置の構成例を示す図である。It is a figure which shows the structural example of an image processing apparatus. トラッキング処理を説明するフローチャートである。It is a flowchart explaining a tracking process. 画像処理装置の構成例を示す図である。It is a figure which shows the structural example of an image processing apparatus. トラッキング部の構成例を示す図である。It is a figure which shows the structural example of a tracking part. トラッキング部の構成例を示す図である。It is a figure which shows the structural example of a tracking part. トラッキング処理を説明するフローチャートである。It is a flowchart explaining a tracking process. コンピュータの構成例を示すブロック図である。It is a block diagram which shows the structural example of a computer.

以下、図面を参照して、本発明を適用した実施の形態について説明する。 Embodiments to which the present invention is applied will be described below with reference to the drawings.

〈発明の概要〉
［画像処理装置の構成］
図１は、本発明を適用した画像処理装置の概要を説明するための図である。 <Summary of invention>
[Configuration of image processing apparatus]
FIG. 1 is a diagram for explaining an outline of an image processing apparatus to which the present invention is applied.

本発明を適用した画像処理装置１１は、切り替え部２１、トラッキング部２２−１乃至トラッキング部２２−Ｎ、表示制御部２３、および表示部２４から構成される。 The image processing apparatus 11 to which the present invention is applied includes a switching unit 21, tracking units 22-1 to 22-N, a display control unit 23, and a display unit 24.

例えば、画像処理装置１１は、被写体を撮像するカメラ等の撮像装置に設けられ、撮像装置により時間的に連続して撮像された複数フレームの入力画像が、順次、切り替え部２１および表示制御部２３に供給される。なお、入力画像は、連続して撮像された静止画像であってもよいし、動画像であってもよい。また、入力画像は、撮像後に記録媒体に記録されて、記録媒体から読み出されたものであってもよい。 For example, the image processing apparatus 11 is provided in an imaging apparatus such as a camera that captures a subject, and a plurality of frames of input images that are sequentially captured in time by the imaging apparatus are sequentially switched and the display control unit 23. To be supplied. The input image may be a still image captured continuously or a moving image. Further, the input image may be recorded on a recording medium after being captured and read out from the recording medium.

切り替え部２１は、供給された入力画像から、入力画像が有する特徴の特徴量を抽出し、抽出した特徴量に基づいて入力画像の特性を特定する。そして、切り替え部２１は、入力画像の特性に適したトラッキング方法を選択し、そのトラッキング方法により、入力画像上の被写体をトラッキングさせる。すなわち、切り替え部２１は、入力画像の特性に応じて、互いに異なるトラッキング方法でトラッキングを行うトラッキング部２２−１乃至トラッキング部２２−Ｎの何れかに入力画像を供給し、トラッキングの実行を指示する。 The switching unit 21 extracts the feature amount of the feature included in the input image from the supplied input image, and specifies the characteristics of the input image based on the extracted feature amount. Then, the switching unit 21 selects a tracking method suitable for the characteristics of the input image, and tracks the subject on the input image by the tracking method. That is, the switching unit 21 supplies the input image to any of the tracking unit 22-1 to the tracking unit 22-N that performs tracking by different tracking methods according to the characteristics of the input image, and instructs the execution of tracking. .

トラッキング部２２−１乃至トラッキング部２２−Ｎは、切り替え部２１からの指示に応じて、予め定められたトラッキング方法により、入力画像から、追尾対象となる被写体を検出し、その検出結果を表示制御部２３に供給する。 The tracking unit 22-1 to tracking unit 22-N detect a subject to be tracked from an input image by a predetermined tracking method according to an instruction from the switching unit 21, and display control the detection result. To the unit 23.

なお、追尾対象となる被写体は、トラッキングの開始時にユーザにより指定されるものとする。また、以下、トラッキング部２２−１乃至トラッキング部２２−Ｎを個々に区別する必要のない場合、単にトラッキング部２２とも称する。 Note that the subject to be tracked is designated by the user at the start of tracking. Hereinafter, the tracking units 22-1 to 22-N are also simply referred to as tracking units 22 when it is not necessary to individually distinguish them.

表示制御部２３は、トラッキング部２２から供給された被写体の検出結果と、供給された入力画像とを用いて、入力画像上に追尾対象の被写体の領域を囲む枠（以下、被写体枠と称する）が表示されるように入力画像を加工する。そして、表示制御部２３は、加工により得られた入力画像を表示部２４に供給し、入力画像を表示させる。これにより、入力画像とともに被写体枠が表示される。 The display control unit 23 uses the detection result of the subject supplied from the tracking unit 22 and the supplied input image to frame a region of the tracking target subject on the input image (hereinafter referred to as a subject frame). The input image is processed so that is displayed. And the display control part 23 supplies the input image obtained by the process to the display part 24, and displays an input image. Thereby, the subject frame is displayed together with the input image.

画像処理装置１１は、以上において説明した処理を入力画像のフレームごとに行って、被写体をトラッキングする。表示部２４には、入力画像とともに、ユーザが指定した被写体を囲む被写体枠が表示されるので、ユーザは、入力画像や被写体枠を見ながら構図を決定し、撮像装置を操作して、静止画像等を撮像させることができる。また、撮像装置は、画像処理装置１１のトラッキング処理の結果に基づいて、追尾対象の被写体にレンズの焦点が合うようにフォーカス調整したり、被写体が適度に明るくなるように露出調整したりすることができる。 The image processing apparatus 11 performs the processing described above for each frame of the input image to track the subject. Since the display unit 24 displays a subject frame surrounding the subject specified by the user along with the input image, the user determines the composition while viewing the input image and the subject frame, operates the imaging device, and operates the still image. Etc. can be imaged. Further, the imaging apparatus may adjust the focus so that the lens is focused on the tracking target subject or adjust the exposure so that the subject is moderately bright based on the result of the tracking processing of the image processing device 11. Can do.

画像処理装置１１では、入力画像の特性に対して、その特性の入力画像を得意とするトラッキング方法、つまりその特性を持つ入力画像を処理対象とした場合に、より少ない処理量で、より高精度に被写体を検出できるトラッキング方法が予め定められている。そして、入力画像の特性が特定されると、その特定結果に基づいて、トラッキング方法が選択されて、被写体の検出が行なわれる。これにより、画像処理装置１１では、より簡単かつ迅速に、安定して被写体をトラッキングすることができる。 In the image processing apparatus 11, when a tracking method that is good at an input image having the characteristics of the input image, that is, when an input image having the characteristics is a processing target, the processing accuracy is reduced with a smaller processing amount. A tracking method capable of detecting a subject is predetermined. When the characteristics of the input image are identified, a tracking method is selected based on the identification result, and the subject is detected. As a result, the image processing apparatus 11 can track the subject more easily, quickly and stably.

以下、図面を参照して画像処理装置１１のより具体的な構成例について説明する。 Hereinafter, a more specific configuration example of the image processing apparatus 11 will be described with reference to the drawings.

〈第１の実施の形態〉
［画像処理装置の構成］
図２は、本発明を適用した画像処理装置１１の一実施の形態の構成例を示す図である。 <First Embodiment>
[Configuration of image processing apparatus]
FIG. 2 is a diagram showing a configuration example of an embodiment of the image processing apparatus 11 to which the present invention is applied.

図２の画像処理装置１１は、平坦判定部５１、トラッキング部５２、トラッキング部５３、保持部５４、表示制御部２３、および表示部２４から構成される。また、画像処理装置１１では、撮像装置で撮像された入力画像が、平坦判定部５１、保持部５４、および表示制御部２３に供給される。なお、図２において、図１における場合と対応する部分には、同一の符号を付してあり、その説明は適宜省略する。 2 includes a flatness determination unit 51, a tracking unit 52, a tracking unit 53, a holding unit 54, a display control unit 23, and a display unit 24. In the image processing device 11, the input image captured by the imaging device is supplied to the flatness determination unit 51, the holding unit 54, and the display control unit 23. In FIG. 2, parts corresponding to those in FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.

平坦判定部５１は、図１の切り替え部２１に対応し、供給された入力画像から特徴量として入力画像の画素の画素値の分散値を抽出することにより、画像の平坦さを指標として、入力画像の特性を特定する。すなわち、平坦判定部５１は、入力画像が平坦な画像であるか否かを判定する。ここで、平坦な画像とは、画像上の画素が並ぶ方向、つまり空間方向に対する画素の画素値の変化が少ない画像をいう。 The flatness determination unit 51 corresponds to the switching unit 21 in FIG. 1 and extracts the variance value of the pixel values of the pixels of the input image as a feature amount from the supplied input image, thereby inputting the flatness of the image as an index. Identify image characteristics. That is, the flatness determination unit 51 determines whether or not the input image is a flat image. Here, a flat image means an image in which the pixel value of the pixel is little changed in the direction in which the pixels on the image are arranged, that is, in the spatial direction.

平坦判定部５１は、分散値算出部６１、平坦度算出部６２、および判定部６３から構成され、撮像装置からの各フレームの入力画像が、分散値算出部６１および判定部６３に供給される。 The flatness determination unit 51 includes a variance value calculation unit 61, a flatness calculation unit 62, and a determination unit 63, and an input image of each frame from the imaging device is supplied to the variance value calculation unit 61 and the determination unit 63. .

分散値算出部６１は、供給された入力画像を複数のブロックに分割し、各ブロックについて、ブロック内の画素の画素値の分散値を求め、平坦度算出部６２に供給する。平坦度算出部６２は、分散値算出部６１から供給された入力画像の各ブロックの分散値に基づいて、入力画像の平坦さの度合いを示す平坦度を算出し、判定部６３に供給する。 The variance value calculation unit 61 divides the supplied input image into a plurality of blocks, obtains the variance value of the pixel values of the pixels in the block for each block, and supplies it to the flatness calculation unit 62. The flatness calculation unit 62 calculates the flatness indicating the degree of flatness of the input image based on the variance value of each block of the input image supplied from the variance value calculation unit 61 and supplies the calculated flatness to the determination unit 63.

判定部６３は、平坦度算出部６２から供給された平坦度と、予め定められた閾値とを比較して、入力画像が平坦な画像であるか否かを判定する。また、判定部６３は、入力画像が平坦であるか否かの判定結果に応じて、供給された入力画像をトラッキング部５２またはトラッキング部５３の何れか一方に供給し、被写体の検出を指示する。 The determination unit 63 determines whether the input image is a flat image by comparing the flatness supplied from the flatness calculation unit 62 with a predetermined threshold. Further, the determination unit 63 supplies the supplied input image to either the tracking unit 52 or the tracking unit 53 according to the determination result of whether or not the input image is flat, and instructs the detection of the subject. .

トラッキング部５２およびトラッキング部５３は、図１のトラッキング部２２に対応し、平坦判定部５１の指示に応じて、互いに異なるトラッキング方法により、入力画像から被写体を検出する。 The tracking unit 52 and the tracking unit 53 correspond to the tracking unit 22 in FIG. 1 and detect a subject from an input image by different tracking methods according to an instruction from the flatness determination unit 51.

すなわち、トラッキング部５２は、判定部６３から供給された処理対象の現フレームの入力画像と、保持部５４に保持されている現フレームよりも１つ前のフレーム（以下、前フレームと称する）の入力画像および被写体領域情報とを用いて、動き検出により追尾対象の被写体を検出する。ここで、被写体領域情報とは、入力画像から検出された、追尾対象の被写体を囲む領域（以下、被写体領域と称する）の位置を示す情報をいう。 That is, the tracking unit 52 includes the input image of the current frame to be processed supplied from the determination unit 63 and the frame immediately before the current frame held in the holding unit 54 (hereinafter referred to as the previous frame). A subject to be tracked is detected by motion detection using the input image and subject area information. Here, the subject area information refers to information indicating the position of a region (hereinafter referred to as a subject region) surrounding the subject to be tracked, detected from the input image.

また、トラッキング部５２は、被写体の検出の結果得られた現フレームの被写体領域情報を、表示制御部２３に供給する。 Further, the tracking unit 52 supplies the display control unit 23 with subject area information of the current frame obtained as a result of subject detection.

トラッキング部５３は、判定部６３から供給された現フレームの入力画像と、保持部５４に保持されている前フレームの被写体領域情報とを用いて、ビジュアルアテンションと呼ばれる被写体抽出の手法を利用して、入力画像から追尾対象の被写体を検出する。また、トラッキング部５３は、被写体の検出の結果得られた現フレームの被写体領域情報を、表示制御部２３に供給する。 The tracking unit 53 uses a subject extraction method called visual attention using the input image of the current frame supplied from the determination unit 63 and the subject area information of the previous frame held in the holding unit 54. The subject to be tracked is detected from the input image. Further, the tracking unit 53 supplies subject area information of the current frame obtained as a result of subject detection to the display control unit 23.

保持部５４は、供給された入力画像と、トラッキング部５２またはトラッキング部５３から供給された被写体領域情報とを保持するとともに、必要に応じて入力画像または被写体領域情報を、トラッキング部５２やトラッキング部５３に供給する。 The holding unit 54 holds the supplied input image and the subject area information supplied from the tracking unit 52 or the tracking unit 53, and also stores the input image or the subject area information as necessary in the tracking unit 52 or the tracking unit. 53.

［トラッキング部５２の構成］
また、図２のトラッキング部５２は、より詳細には、図３に示すように構成される。 [Configuration of Tracking Unit 52]
In more detail, the tracking unit 52 of FIG. 2 is configured as shown in FIG.

すなわち、トラッキング部５２は、ブロック動き検出部９１、被写体動き検出部９２、および被写体領域決定部９３から構成される。 That is, the tracking unit 52 includes a block motion detection unit 91, a subject motion detection unit 92, and a subject region determination unit 93.

ブロック動き検出部９１は、保持部５４に保持されている前フレームの入力画像と被写体領域情報とを用いて、前フレームの入力画像上の被写体領域をいくつかのブロックに分割する。また、ブロック動き検出部９１は、分割により得られた各ブロックと、判定部６３からの現フレームの入力画像とから、各ブロックの動きを検出し、その検出結果を被写体動き検出部９２に供給する。 The block motion detection unit 91 divides the subject area on the input image of the previous frame into several blocks using the input image and subject area information of the previous frame held in the holding unit 54. The block motion detection unit 91 detects the motion of each block from each block obtained by the division and the input image of the current frame from the determination unit 63, and supplies the detection result to the subject motion detection unit 92. To do.

被写体動き検出部９２は、ブロック動き検出部９１から供給された各ブロックの動きの検出結果を用いて、被写体領域全体の動きを求め、被写体領域決定部９３に供給する。被写体領域決定部９３は、被写体動き検出部９２から供給された被写体領域全体の動きと、保持部５４に保持されている前フレームの被写体領域情報とから、現フレームの入力画像上の被写体領域を特定する。また、被写体領域決定部９３は、現フレームの入力画像上における被写体領域の位置を示す被写体領域情報を生成して表示制御部２３に供給するとともに、被写体領域情報を保持部５４に供給し、保持させる。 The subject motion detection unit 92 obtains the motion of the entire subject region using the motion detection result of each block supplied from the block motion detection unit 91 and supplies the motion to the subject region determination unit 93. The subject region determination unit 93 determines the subject region on the input image of the current frame from the movement of the entire subject region supplied from the subject motion detection unit 92 and the subject region information of the previous frame held in the holding unit 54. Identify. In addition, the subject area determination unit 93 generates subject area information indicating the position of the subject area on the input image of the current frame, supplies the subject area information to the display control unit 23, and supplies the subject area information to the holding unit 54 for holding. Let

［トラッキング部５３の構成］
また、図２のトラッキング部５３は、より詳細には、図４に示すように構成される。 [Configuration of Tracking Unit 53]
In more detail, the tracking unit 53 of FIG. 2 is configured as shown in FIG.

すなわち、トラッキング部５３は、被写体抽出部１２１、被写体候補領域決定部１２２、および被写体領域決定部１２３から構成される。 That is, the tracking unit 53 includes a subject extraction unit 121, a subject candidate region determination unit 122, and a subject region determination unit 123.

被写体抽出部１２１は、判定部６３から供給された現フレームの入力画像から、入力画像の各領域における被写体らしさを示す被写体マップを生成し、被写体候補領域決定部１２２に供給する。ここで、被写体マップにより示される被写体は、任意の被写体であり、ユーザが入力画像を一瞥した場合に、ユーザが注目すると推定される入力画像上の物体、つまりユーザが目を向けると推定される物体をいう。したがって、被写体は必ずしも人物に限られる訳ではない。 The subject extraction unit 121 generates a subject map that indicates the likelihood of the subject in each region of the input image from the input image of the current frame supplied from the determination unit 63 and supplies the subject map to the subject candidate region determination unit 122. Here, the subject indicated by the subject map is an arbitrary subject, and when the user glances at the input image, it is estimated that the object on the input image that the user is expected to pay attention to, that is, the user turns his eyes. An object. Therefore, the subject is not necessarily limited to a person.

被写体候補領域決定部１２２は、被写体抽出部１２１からの被写体マップを用いて、被写体マップ上の被写体領域の候補となる領域、つまり被写体らしい領域を含む矩形領域（以下、被写体候補領域とも称する）を抽出し、被写体領域決定部１２３に供給する。 Using the subject map from the subject extraction unit 121, the subject candidate region determination unit 122 uses the subject map as a subject region candidate on the subject map, that is, a rectangular region including a subject-like region (hereinafter also referred to as a subject candidate region). Extracted and supplied to the subject region determination unit 123.

被写体領域決定部１２３は、被写体候補領域決定部１２２からの被写体候補領域の抽出結果と、保持部５４に保持されている前フレームの被写体領域情報とを用いて、被写体候補領域のうちの何れかを現フレームの被写体領域を示す領域として選択する。すなわち、被写体マップ上の選択された被写体候補領域と同じ位置にある入力画像上の領域が、現フレームの被写体領域とされる。なお、以下では、被写体マップ上の被写体領域を示す領域として選択された被写体候補領域を、単に被写体領域とも称することとする。 The subject area determination unit 123 uses any of the candidate candidate areas extracted from the subject candidate area determination unit 122 and the subject area information of the previous frame held in the holding unit 54 to select one of the subject candidate areas. Is selected as an area indicating the subject area of the current frame. That is, the area on the input image at the same position as the selected subject candidate area on the subject map is the subject area of the current frame. Hereinafter, the subject candidate region selected as the region indicating the subject region on the subject map is also simply referred to as a subject region.

被写体領域決定部１２３は、現フレームの被写体領域を示す被写体領域情報を生成して表示制御部２３に供給するとともに、被写体領域情報を保持部５４に供給して保持させる。 The subject region determination unit 123 generates subject region information indicating the subject region of the current frame and supplies the subject region information to the display control unit 23, and supplies the subject region information to the holding unit 54 for holding.

このように、トラッキング部５３では、被写体マップ上において、ユーザが注目すると推定される任意の被写体らしい領域が、ユーザにより指定された追尾対象の被写体の領域の候補（被写体領域候補）とされる。そして、それらの被写体候補領域のなかから、最も追尾対象の被写体らしい領域が、被写体領域として選択される。 In this way, in the tracking unit 53, on the subject map, an arbitrary subject-like region estimated to be noticed by the user is set as a tracking target subject region candidate (subject region candidate) designated by the user. From these subject candidate regions, the region most likely to be the subject to be tracked is selected as the subject region.

［被写体抽出部の構成］
さらに、図４の被写体抽出部１２１は、より詳細には、図５に示すように構成される。すなわち、被写体抽出部１２１は、輝度情報抽出部１５１、色情報抽出部１５２、エッジ情報抽出部１５３、顔情報抽出部１５４、および被写体マップ生成部１５５から構成される。 [Configuration of subject extraction unit]
Furthermore, the subject extraction unit 121 of FIG. 4 is configured as shown in FIG. 5 in more detail. That is, the subject extraction unit 121 includes a luminance information extraction unit 151, a color information extraction unit 152, an edge information extraction unit 153, a face information extraction unit 154, and a subject map generation unit 155.

輝度情報抽出部１５１は、供給された入力画像に基づいて、入力画像の各領域における、輝度に関する情報を示す輝度情報マップを生成し、被写体マップ生成部１５５に供給する。
色情報抽出部１５２は、供給された入力画像に基づいて、入力画像の各領域における、色に関する情報を示す色情報マップを生成し、被写体マップ生成部１５５に供給する。 The luminance information extraction unit 151 generates a luminance information map indicating information regarding luminance in each region of the input image based on the supplied input image, and supplies the luminance information map to the subject map generation unit 155.
Based on the supplied input image, the color information extraction unit 152 generates a color information map indicating information regarding the color in each region of the input image, and supplies the color information map to the subject map generation unit 155.

エッジ情報抽出部１５３は、供給された入力画像に基づいて、入力画像の各領域における、エッジに関する情報を示すエッジ情報マップを生成し、被写体マップ生成部１５５に供給する。顔情報抽出部１５４は、供給された入力画像に基づいて、入力画像の各領域における、被写体としての人の顔に関する情報を示す顔情報マップを生成し、被写体マップ生成部１５５に供給する。 The edge information extraction unit 153 generates an edge information map indicating information regarding edges in each region of the input image based on the supplied input image, and supplies the edge information map to the subject map generation unit 155. The face information extraction unit 154 generates a face information map indicating information related to a human face as a subject in each region of the input image based on the supplied input image, and supplies the face information map to the subject map generation unit 155.

なお、以下、輝度情報抽出部１５１乃至顔情報抽出部１５４から出力される、輝度情報マップ乃至顔情報マップのそれぞれを個々に区別する必要のない場合、単に情報マップとも称する。これらの情報マップに含まれる情報は、輝度や色等の特徴の特徴量から求まる、入力画像上の各領域における被写体らしさを示す情報であり、それらの情報が入力画像の各領域に対応させて並べられたものが情報マップとされる。 Hereinafter, when it is not necessary to individually distinguish the luminance information map or the face information map output from the luminance information extraction unit 151 to the face information extraction unit 154, they are also simply referred to as an information map. The information included in these information maps is information indicating the object-likeness in each area on the input image, which is obtained from the feature quantities of features such as luminance and color, and the information is associated with each area of the input image. The arranged map is used as an information map.

被写体マップ生成部１５５は、輝度情報抽出部１５１乃至顔情報抽出部１５４から供給された情報マップを線形結合し、被写体マップを生成する。すなわち、輝度情報マップ乃至顔情報マップの各領域の情報が、同じ位置にある領域ごとに重み付き加算されて被写体マップが生成される。被写体マップ生成部１５５は、生成した被写体マップを被写体候補領域決定部１２２に供給する。 The subject map generation unit 155 linearly combines the information maps supplied from the luminance information extraction unit 151 to the face information extraction unit 154 to generate a subject map. That is, the information of each area of the luminance information map or the face information map is weighted and added for each area at the same position to generate a subject map. The subject map generation unit 155 supplies the generated subject map to the subject candidate region determination unit 122.

なお、以下においては、各情報マップおよび被写体マップのそれぞれにおける各領域（位置）を画素といい、その領域に対応する情報（値）を画素値ということとする。 In the following, each region (position) in each information map and subject map is referred to as a pixel, and information (value) corresponding to that region is referred to as a pixel value.

次に、図６乃至図９を参照して、図５の輝度情報抽出部１５１乃至顔情報抽出部１５４のより詳細な構成について説明する。 Next, a more detailed configuration of the luminance information extraction unit 151 to the face information extraction unit 154 in FIG. 5 will be described with reference to FIGS.

［輝度情報抽出部の構成］
図６は、輝度情報抽出部１５１の構成例を示すブロック図である。 [Configuration of luminance information extraction unit]
FIG. 6 is a block diagram illustrating a configuration example of the luminance information extraction unit 151.

輝度情報抽出部１５１は、輝度画像生成部１８１、ピラミッド画像生成部１８２、差分算出部１８３、および輝度情報マップ生成部１８４から構成される。 The luminance information extraction unit 151 includes a luminance image generation unit 181, a pyramid image generation unit 182, a difference calculation unit 183, and a luminance information map generation unit 184.

輝度画像生成部１８１は、供給された入力画像を用いて、入力画像の画素の輝度値を、画素の画素値とする輝度画像を生成し、ピラミッド画像生成部１８２に供給する。ここで、輝度画像の任意の画素の画素値は、その画素と同じ位置にある入力画像の画素の輝度値を示している。 Using the supplied input image, the luminance image generation unit 181 generates a luminance image having the pixel luminance value of the pixel of the input image as the pixel value of the pixel, and supplies the luminance image to the pyramid image generation unit 182. Here, the pixel value of an arbitrary pixel of the luminance image indicates the luminance value of the pixel of the input image at the same position as the pixel.

ピラミッド画像生成部１８２は、輝度画像生成部１８１から供給された輝度画像を用いて、互いに解像度の異なる複数の輝度画像を生成し、それらの輝度画像を輝度のピラミッド画像として差分算出部１８３に供給する。 The pyramid image generation unit 182 uses the luminance image supplied from the luminance image generation unit 181 to generate a plurality of luminance images having different resolutions, and supplies these luminance images to the difference calculation unit 183 as luminance pyramid images. To do.

例えば、レベルＬ１乃至レベルＬ８までの８つの解像度の階層のピラミッド画像が生成され、レベルＬ１のピラミッド画像が最も解像度が高く、レベルＬ１からレベルＬ８まで順番にピラミッド画像の解像度が低くなるものとする。 For example, it is assumed that pyramid images of eight resolution layers from level L1 to level L8 are generated, the pyramid image of level L1 has the highest resolution, and the resolution of the pyramid image sequentially decreases from level L1 to level L8. .

この場合、輝度画像生成部１８１により生成された輝度画像が、レベルＬ１のピラミッド画像とされる。また、レベルＬｉ（但し、１≦ｉ≦７）のピラミッド画像における、互いに隣接する４つの画素の画素値の平均値が、それらの画素と対応するレベルＬ（ｉ＋１）のピラミッド画像の１つの画素の画素値とされる。したがって、レベルＬ（ｉ＋１）のピラミッド画像は、レベルＬｉのピラミッド画像に対して縦横半分（割り切れない場合は切り捨て）の画像となる。 In this case, the luminance image generated by the luminance image generation unit 181 is a pyramid image of level L1. In addition, in the pyramid image of level Li (where 1 ≦ i ≦ 7), one pixel of the pyramid image of level L (i + 1) in which the average value of the pixel values of four pixels adjacent to each other corresponds to those pixels Pixel value. Accordingly, the pyramid image at the level L (i + 1) is an image of half the length and breadth (discarded if not divisible) with respect to the pyramid image at the level Li.

差分算出部１８３は、ピラミッド画像生成部１８２から供給された複数のピラミッド画像のうち、互いに階層の異なる２つのピラミッド画像を選択し、選択したピラミッド画像の差分を求めて輝度の差分画像を生成する。なお、各階層のピラミッド画像は、それぞれ大きさ（画素数）が異なるので、差分画像の生成時には、より小さい方のピラミッド画像が、より大きいピラミッド画像に合わせてアップコンバートされる。 The difference calculation unit 183 selects two pyramid images having different hierarchies from among the plurality of pyramid images supplied from the pyramid image generation unit 182, obtains a difference between the selected pyramid images, and generates a luminance difference image. . Since the pyramid images in each layer have different sizes (number of pixels), the smaller pyramid image is up-converted in accordance with the larger pyramid image when generating the difference image.

差分算出部１８３は、予め定められた数だけ輝度の差分画像を生成すると、生成したそれらの差分画像を正規化し、輝度情報マップ生成部１８４に供給する。輝度情報マップ生成部１８４は、差分算出部１８３から供給された差分画像に基づいて輝度情報マップを生成し、被写体マップ生成部１５５に供給する。 When the difference calculation unit 183 generates the difference images having the predetermined number of luminances, the difference calculation unit 183 normalizes the generated difference images and supplies the normalized difference images to the luminance information map generation unit 184. The luminance information map generation unit 184 generates a luminance information map based on the difference image supplied from the difference calculation unit 183 and supplies the luminance information map to the subject map generation unit 155.

［色情報抽出部の構成］
図７は、図５の色情報抽出部１５２の構成例を示すブロック図である。 [Configuration of color information extraction unit]
FIG. 7 is a block diagram illustrating a configuration example of the color information extraction unit 152 of FIG.

色情報抽出部１５２は、ＲＧ差分画像生成部２１１、ＢＹ差分画像生成部２１２、ピラミッド画像生成部２１３、ピラミッド画像生成部２１４、差分算出部２１５、差分算出部２１６、色情報マップ生成部２１７、および色情報マップ生成部２１８から構成される。 The color information extraction unit 152 includes an RG difference image generation unit 211, a BY difference image generation unit 212, a pyramid image generation unit 213, a pyramid image generation unit 214, a difference calculation unit 215, a difference calculation unit 216, a color information map generation unit 217, And a color information map generation unit 218.

ＲＧ差分画像生成部２１１は、供給された入力画像を用いて、入力画像の画素のＲ（赤）成分とＧ（緑）成分との差分を、画素の画素値とするＲＧ差分画像を生成し、ピラミッド画像生成部２１３に供給する。ＲＧ差分画像の任意の画素の画素値は、その画素と同じ位置にある入力画像の画素のＲの成分と、Ｇの成分との差分の値を示している。 The RG difference image generation unit 211 uses the supplied input image to generate an RG difference image in which the difference between the R (red) component and G (green) component of the pixel of the input image is the pixel value of the pixel. And supplied to the pyramid image generation unit 213. The pixel value of an arbitrary pixel of the RG difference image indicates a difference value between the R component and the G component of the pixel of the input image at the same position as the pixel.

ＢＹ差分画像生成部２１２は、供給された入力画像を用いて、入力画像の画素のＢ（青）成分とＹ（黄）成分との差分を、画素の画素値とするＢＹ差分画像を生成し、ピラミッド画像生成部２１４に供給する。ＢＹ差分画像の任意の画素の画素値は、その画素と同じ位置にある入力画像の画素のＢ（青）成分と、Ｙ（黄）成分との差分の値を示している。 The BY difference image generation unit 212 uses the supplied input image to generate a BY difference image in which the difference between the B (blue) component and the Y (yellow) component of the pixel of the input image is a pixel value of the pixel. And supplied to the pyramid image generation unit 214. The pixel value of an arbitrary pixel in the BY difference image indicates a difference value between the B (blue) component and the Y (yellow) component of the pixel of the input image at the same position as the pixel.

ピラミッド画像生成部２１３およびピラミッド画像生成部２１４は、ＲＧ差分画像生成部２１１およびＢＹ差分画像生成部２１２から供給されたＲＧ差分画像およびＢＹ差分画像を用いて、互いに解像度の異なる複数のＲＧ差分画像およびＢＹ差分画像を生成する。そして、ピラミッド画像生成部２１３およびピラミッド画像生成部２１４は、生成したそれらのＲＧ差分画像およびＢＹ差分画像を、ＲＧの差分のピラミッド画像、およびＢＹの差分のピラミッド画像として差分算出部２１５および差分算出部２１６に供給する。 The pyramid image generation unit 213 and the pyramid image generation unit 214 use the RG difference image and the BY difference image supplied from the RG difference image generation unit 211 and the BY difference image generation unit 212, and use a plurality of RG difference images having different resolutions. And a BY difference image are generated. Then, the pyramid image generation unit 213 and the pyramid image generation unit 214 use the generated RG difference image and BY difference image as the RG difference pyramid image and the BY difference pyramid image, respectively. To the unit 216.

例えば、ＲＧの差分のピラミッド画像、およびＢＹの差分のピラミッド画像として、輝度のピラミッド画像の場合と同様に、それぞれレベルＬ１乃至レベルＬ８までの８つの解像度の階層のピラミッド画像が生成される。 For example, as the pyramid image of the difference of RG and the pyramid image of the difference of BY, as in the case of the pyramid image of luminance, pyramid images of eight resolution layers from level L1 to level L8 are generated.

差分算出部２１５および差分算出部２１６は、ピラミッド画像生成部２１３およびピラミッド画像生成部２１４から供給された複数のピラミッド画像のうち、互いに階層の異なる２つのピラミッド画像を選択し、選択したピラミッド画像の差分を求めてＲＧの差分の差分画像、およびＢＹの差分の差分画像を生成する。なお、各階層のピラミッド画像は、それぞれ大きさが異なるので、差分画像の生成時には、より小さい方のピラミッド画像がアップコンバートされて、より大きい方のピラミッド画像と同じ大きさとされる。 The difference calculation unit 215 and the difference calculation unit 216 select two pyramid images having different hierarchies from among the plurality of pyramid images supplied from the pyramid image generation unit 213 and the pyramid image generation unit 214, and select the pyramid image of the selected pyramid image. The difference is obtained, and a difference image of RG difference and a difference image of BY difference are generated. Since the pyramid images of the respective layers have different sizes, the smaller pyramid image is up-converted to the same size as the larger pyramid image when the difference image is generated.

差分算出部２１５および差分算出部２１６は、予め定められた数だけＲＧの差分の差分画像、およびＢＹの差分の差分画像を生成すると、生成したそれらの差分画像を正規化し、色情報マップ生成部２１７および色情報マップ生成部２１８に供給する。色情報マップ生成部２１７および色情報マップ生成部２１８は、差分算出部２１５および差分算出部２１６から供給された差分画像に基づいて色情報マップを生成し、被写体マップ生成部１５５に供給する。色情報マップ生成部２１７ではＲＧの差分の色情報マップが生成され、色情報マップ生成部２１８ではＢＹの差分の色情報マップが生成される。 When the difference calculation unit 215 and the difference calculation unit 216 generate a difference image of RG differences and a difference image of BY differences by a predetermined number, the difference calculation unit 215 normalizes the generated difference images, and a color information map generation unit 217 and the color information map generator 218. The color information map generation unit 217 and the color information map generation unit 218 generate a color information map based on the difference image supplied from the difference calculation unit 215 and the difference calculation unit 216 and supply the color information map to the subject map generation unit 155. The color information map generation unit 217 generates a RG difference color information map, and the color information map generation unit 218 generates a BY difference color information map.

［エッジ情報抽出部の構成］
図８は、図５のエッジ情報抽出部１５３の構成例を示すブロック図である。 [Configuration of edge information extraction unit]
FIG. 8 is a block diagram illustrating a configuration example of the edge information extraction unit 153 in FIG.

エッジ情報抽出部１５３は、エッジ画像生成部２４１乃至エッジ画像生成部２４４、ピラミッド画像生成部２４５乃至ピラミッド画像生成部２４８、差分算出部２４９乃至差分算出部２５２、およびエッジ情報マップ生成部２５３乃至エッジ情報マップ生成部２５６から構成される。 The edge information extraction unit 153 includes an edge image generation unit 241 to an edge image generation unit 244, a pyramid image generation unit 245 to a pyramid image generation unit 248, a difference calculation unit 249 to a difference calculation unit 252, and an edge information map generation unit 253 to an edge. The information map generation unit 256 is configured.

エッジ画像生成部２４１乃至エッジ画像生成部２４４は、供給された入力画像に対して、ガボアフィルタを用いたフィルタ処理を施し、例えば０度、４５度、９０度、および１３５度の方向のエッジ強度を画素の画素値とするエッジ画像を生成し、ピラミッド画像生成部２４５乃至ピラミッド画像生成部２４８に供給する。 The edge image generation unit 241 to the edge image generation unit 244 perform a filtering process using a Gabor filter on the supplied input image, for example, to obtain edge strengths in directions of 0 degrees, 45 degrees, 90 degrees, and 135 degrees. An edge image as a pixel value of the pixel is generated and supplied to the pyramid image generation unit 245 to the pyramid image generation unit 248.

例えば、エッジ画像生成部２４１により生成されるエッジ画像の任意の画素の画素値は、その画素と同じ位置にある入力画像の画素における０度の方向のエッジ強度を示している。なお、各エッジの方向とは、ガボアフィルタを構成するガボア関数における角度成分により示される方向をいう。 For example, the pixel value of an arbitrary pixel of the edge image generated by the edge image generation unit 241 indicates the edge strength in the direction of 0 degree in the pixel of the input image at the same position as the pixel. Note that the direction of each edge refers to a direction indicated by an angle component in a Gabor function that constitutes a Gabor filter.

ピラミッド画像生成部２４５乃至ピラミッド画像生成部２４８は、エッジ画像生成部２４１乃至エッジ画像生成部２４４から供給された各方向のエッジ画像を用いて、互いに解像度の異なる複数のエッジ画像を生成する。そして、ピラミッド画像生成部２４５乃至ピラミッド画像生成部２４８は、生成したそれらの各方向のエッジ画像を、エッジの各方向のピラミッド画像として差分算出部２４９乃至差分算出部２５２に供給する。 The pyramid image generation unit 245 to the pyramid image generation unit 248 generate a plurality of edge images having different resolutions using the edge images in the respective directions supplied from the edge image generation unit 241 to the edge image generation unit 244. Then, the pyramid image generation unit 245 to the pyramid image generation unit 248 supply the generated edge images in the respective directions to the difference calculation units 249 to 252 as pyramid images in the respective directions of the edges.

例えば、エッジの各方向のピラミッド画像として、輝度のピラミッド画像の場合と同様に、それぞれレベルＬ１乃至レベルＬ８までの８つの階層のピラミッド画像が生成される。 For example, as the pyramid image in each direction of the edge, as in the case of the pyramid image of luminance, eight levels of pyramid images from level L1 to level L8 are generated.

差分算出部２４９乃至差分算出部２５２は、ピラミッド画像生成部２４５乃至ピラミッド画像生成部２４８から供給された、複数のピラミッド画像のうち、互いに階層の異なる２つのピラミッド画像を選択し、選択したピラミッド画像の差分を求めてエッジの各方向の差分画像を生成する。なお、各階層のピラミッド画像は、それぞれ大きさが異なるので、差分画像の生成時には、より小さい方のピラミッド画像がアップコンバートされる。 The difference calculation unit 249 to the difference calculation unit 252 select two pyramid images having different hierarchies from among the plurality of pyramid images supplied from the pyramid image generation unit 245 to the pyramid image generation unit 248, and select the selected pyramid image. To obtain a difference image in each direction of the edge. Note that the pyramid images of the respective layers have different sizes, so that the smaller pyramid image is up-converted when the difference image is generated.

差分算出部２４９乃至差分算出部２５２は、予め定められた数だけエッジの各方向の差分画像を生成すると、生成したそれらの差分画像を正規化し、エッジ情報マップ生成部２５３乃至エッジ情報マップ生成部２５６に供給する。エッジ情報マップ生成部２５３乃至エッジ情報マップ生成部２５６は、差分算出部２４９乃至差分算出部２５２から供給された差分画像に基づいて、各方向のエッジ情報マップを生成し、被写体マップ生成部１５５に供給する。 When the difference calculation unit 249 to the difference calculation unit 252 generate a difference image in each direction of the edge by a predetermined number, the generated difference images are normalized, and the edge information map generation unit 253 to the edge information map generation unit 256. The edge information map generation unit 253 through the edge information map generation unit 256 generate an edge information map in each direction based on the difference images supplied from the difference calculation unit 249 through the difference calculation unit 252, and send them to the subject map generation unit 155. Supply.

［顔情報抽出部の構成］
図９は、図５の顔情報抽出部１５４の構成例を示すブロック図である。 [Configuration of face information extraction unit]
FIG. 9 is a block diagram illustrating a configuration example of the face information extraction unit 154 of FIG.

顔情報抽出部１５４は、顔検出部２８１および顔情報マップ生成部２８２から構成される。 The face information extraction unit 154 includes a face detection unit 281 and a face information map generation unit 282.

顔検出部２８１は、供給された入力画像から被写体としての人の顔の領域を検出し、その検出結果を顔情報マップ生成部２８２に供給する。顔情報マップ生成部２８２は、顔検出部２８１からの検出結果に基づいて顔情報マップを生成し、被写体マップ生成部１５５に供給する。 The face detection unit 281 detects a human face region as a subject from the supplied input image, and supplies the detection result to the face information map generation unit 282. The face information map generation unit 282 generates a face information map based on the detection result from the face detection unit 281 and supplies the face information map to the subject map generation unit 155.

［トラッキング処理の説明］
ところで、撮像装置により時間的に連続して複数の入力画像が撮像され、各入力画像が画像処理装置１１に供給されると、表示制御部２３は、供給された入力画像を表示部２４に供給して表示させる。このとき、ユーザが撮像装置（画像処理装置１１）を操作して、被写体枠の表示を指示し、追尾すべき被写体が表示されている入力画像上の領域を指定すると、画像処理装置１１は、トラッキング処理を開始する。すなわち、画像処理装置１１は、入力画像から追尾対象の被写体を検出して、入力画像上に被写体枠を表示させる。 [Description of tracking process]
By the way, when a plurality of input images are picked up continuously by the image pickup device and each input image is supplied to the image processing device 11, the display control unit 23 supplies the supplied input image to the display unit 24. To display. At this time, when the user operates the imaging device (image processing device 11) to instruct display of the subject frame and designates an area on the input image where the subject to be tracked is displayed, the image processing device 11 Start the tracking process. That is, the image processing apparatus 11 detects a subject to be tracked from the input image and displays a subject frame on the input image.

以下、図１０のフローチャートを参照して、図２の画像処理装置１１によるトラッキング処理について説明する。 Hereinafter, the tracking process by the image processing apparatus 11 of FIG. 2 will be described with reference to the flowchart of FIG.

ステップＳ１１において、平坦判定部５１は、平坦度算出処理を行い、供給された現フレームの入力画像の平坦度を算出する。算出された平坦度は、平坦度算出部６２から判定部６３に供給される。なお、平坦度算出処理の詳細は後述する。 In step S11, the flatness determination unit 51 performs flatness calculation processing, and calculates the flatness of the input image of the supplied current frame. The calculated flatness is supplied from the flatness calculation unit 62 to the determination unit 63. Details of the flatness calculation processing will be described later.

ステップＳ１２において、判定部６３は、平坦度算出部６２から供給された平坦度に基づいて、現フレームの入力画像が平坦な画像であるか否かを判定する。 In step S 12, the determination unit 63 determines whether the input image of the current frame is a flat image based on the flatness supplied from the flatness calculation unit 62.

具体的には、平坦度が予め定められた閾値ｔｈＡより大きい場合、入力画像は平坦な画像であると判定される。ここで、平坦度は、入力画像の各ブロックのうち、分散値が予め定められた閾値ｔｈＢ未満であるブロックの数とされる。 Specifically, when the flatness is greater than a predetermined threshold thA, it is determined that the input image is a flat image. Here, the flatness is the number of blocks whose variance value is less than a predetermined threshold thB among the blocks of the input image.

分散値がある程度小さいブロックでは、空間方向における画素の画素値の変化が少ないので、そのブロックには、例えば、空などの平坦な模様の被写体が表示されているはずである。したがって、入力画像上に、このような平坦なブロックが一定数以上ある場合、その入力画像は、全体として起伏の少ない絵柄の被写体が表示される平坦な画像であるといえる。 In a block having a small variance value, there is little change in the pixel value of the pixel in the spatial direction, and therefore, a flat pattern subject such as the sky should be displayed in the block. Therefore, when there are a certain number or more of such flat blocks on the input image, it can be said that the input image is a flat image on which a subject having a pattern with few undulations is displayed as a whole.

ステップＳ１２において、平坦な画像でないと判定された場合、判定部６３は、供給された現フレームの入力画像を、トラッキング部５２のブロック動き検出部９１に供給し、トラッキング部５２に被写体領域の検出を指示して、処理はステップＳ１３に進む。 When it is determined in step S12 that the image is not a flat image, the determination unit 63 supplies the supplied input image of the current frame to the block motion detection unit 91 of the tracking unit 52, and the tracking unit 52 detects the subject area. And the process proceeds to step S13.

入力画像全体が平坦でない場合、つまり入力画像上の被写体にある程度起伏（テクスチャ）がある場合、ブロックマッチング等により、フレーム間の被写体の動きを精度よく検出することができる。そこで、判定部６３は、現フレームの入力画像が平坦でない場合には、動き検出により被写体領域を検出するトラッキング部５２を選択し、被写体領域を検出させる。 When the entire input image is not flat, that is, when the subject on the input image has a certain undulation (texture), the motion of the subject between frames can be detected with high accuracy by block matching or the like. Therefore, when the input image of the current frame is not flat, the determination unit 63 selects the tracking unit 52 that detects the subject region by motion detection, and detects the subject region.

ステップＳ１３において、トラッキング部５２は、動き検出による被写体検出処理を行って、判定部６３から供給された現フレームの入力画像から、追尾対象の被写体の領域を検出する。トラッキング部５２は、追尾対象の被写体の領域を検出すると、その検出結果を示す被写体領域情報を生成し、表示制御部２３に供給するとともに、被写体領域情報を保持部５４に供給し、保持させる。 In step S 13, the tracking unit 52 performs subject detection processing based on motion detection, and detects a tracking target subject region from the input image of the current frame supplied from the determination unit 63. When the tracking unit 52 detects the region of the subject to be tracked, the tracking unit 52 generates subject region information indicating the detection result, supplies the subject region information to the display control unit 23, and supplies and holds the subject region information to the holding unit 54.

動き検出により被写体の領域が検出されると、その後、処理はステップＳ１５に進む。なお、動き検出による被写体検出処理の詳細は後述する。 When the subject area is detected by the motion detection, the process proceeds to step S15. Details of the subject detection process based on motion detection will be described later.

また、ステップＳ１２において、平坦な画像であると判定された場合、判定部６３は、供給された現フレームの入力画像を、トラッキング部５３の被写体抽出部１２１に供給し、トラッキング部５３に被写体領域の検出を指示して、処理はステップＳ１４に進む。 If it is determined in step S12 that the image is a flat image, the determination unit 63 supplies the supplied input image of the current frame to the subject extraction unit 121 of the tracking unit 53, and the tracking unit 53 receives the subject region. The process proceeds to step S14.

入力画像全体が平坦な絵柄である場合、ブロックマッチング等の処理では、精度よく被写体の位置を特定することはできないので、動き検出を利用したトラッキングでは、安定して被写体を追尾することは困難である。 If the entire input image has a flat pattern, the subject position cannot be accurately identified by processing such as block matching. Therefore, tracking using motion detection is difficult to track the subject stably. is there.

これに対して、ビジュアルアテンションを利用した被写体の検出では、輝度や色、エッジなどの複数の特徴の特徴量を用いて入力画像から被写体を検出するので、平坦な画像であっても精度よく被写体を検出することができる。 On the other hand, in the detection of a subject using visual attention, the subject is detected from an input image using feature quantities of a plurality of features such as luminance, color, and edge, so that even a flat image can be accurately detected. Can be detected.

具体的には、ビジュアルアテンションでは、ユーザが入力画像を一瞥した場合に、目を向ける物体が被写体であるとされる。一般に、追尾対象となる被写体は、いわゆる前景であることが殆どであるが、入力画像全体においては、前景よりも背景の面積が大きいことが多い。そのため、入力画像全体、すなわち背景が平坦であれば、ユーザの目は背景には向きにくいはずであるから、容易に前景である追尾対象の被写体を捉えることができる。 Specifically, in visual attention, when a user glances at an input image, an object that turns his eyes is a subject. In general, the subject to be tracked is mostly a so-called foreground, but in the entire input image, the background area is often larger than the foreground. Therefore, if the entire input image, that is, the background is flat, the user's eyes should be difficult to face the background, so that the subject to be tracked that is the foreground can be easily captured.

また、ビジュアルアテンションでは、色に関する情報が用いられて被写体マップが生成されるため、背景だけでなく前景も平坦な絵柄であったとしても、前景と背景とである程度強いコントラストがあれば、充分な精度で被写体を検出することができる。 In addition, visual attention uses color information to generate a subject map, so even if the foreground is flat as well as the background, it is sufficient if there is a strong contrast between the foreground and the background. The subject can be detected with high accuracy.

そこで、判定部６３は、入力画像が平坦である場合には、動き検出を利用する場合よりも、より高精度に平坦な画像から被写体を検出可能な、ビジュアルアテンションを利用した被写体検出を行なうトラッキング部５３を選択し、被写体領域を検出させる。 Therefore, when the input image is flat, the determination unit 63 can perform subject detection using visual attention that can detect a subject from a flat image with higher accuracy than when using motion detection. The unit 53 is selected to detect the subject area.

なお、ビジュアルアテンションを利用した被写体検出では、入力画像の背景が平坦でない場合、ユーザが入力画像を一瞥したときに、ユーザの目は背景にも向けられる可能性があるため、追尾対象ではない背景の被写体が、追尾対象として検出されてしまうこともある。入力画像全体が平坦でない場合には、ビジュアルアテンションを利用したトラッキングでは、入力画像全体が平坦な場合よりも被写体の検出精度が低下してしまう恐れがあるため、入力画像が平坦でない場合には、動き検出を利用した被写体検出がより適している。 In subject detection using visual attention, if the background of the input image is not flat, the user's eyes may be directed toward the background when the user glances at the input image. May be detected as a tracking target. When the entire input image is not flat, tracking using visual attention may cause the subject detection accuracy to be lower than when the entire input image is flat. Subject detection using motion detection is more suitable.

ステップＳ１４において、トラッキング部５３は、ビジュアルアテンションによる被写体検出処理を行って、判定部６３から供給された現フレームの入力画像から、追尾対象の被写体の領域を検出する。トラッキング部５３は、追尾対象の被写体の領域を検出すると、その検出結果を示す被写体領域情報を生成し、表示制御部２３に供給するとともに、被写体領域情報を保持部５４に供給し、保持させる。 In step S 14, the tracking unit 53 performs subject detection processing based on visual attention, and detects a tracking target subject region from the input image of the current frame supplied from the determination unit 63. When the tracking unit 53 detects the region of the subject to be tracked, the tracking unit 53 generates subject region information indicating the detection result, supplies the subject region information to the display control unit 23, and supplies the subject region information to the holding unit 54 for holding.

トラッキング部５３により被写体の領域が検出されると、その後、処理はステップＳ１５に進む。なお、ビジュアルアテンションによる被写体検出処理の詳細は後述する。 When the subject area is detected by the tracking unit 53, the process proceeds to step S15. Details of the subject detection process by visual attention will be described later.

ステップＳ１３またはステップＳ１４において、現フレームの入力画像から被写体領域が検出されると、ステップＳ１５において、表示制御部２３は、供給された現フレームの入力画像を表示部２４に表示させるとともに、入力画像上に被写体枠を表示させる。 When the subject area is detected from the input image of the current frame in step S13 or step S14, in step S15, the display control unit 23 displays the supplied input image of the current frame on the display unit 24 and also displays the input image. Display the subject frame on the top.

すなわち、表示制御部２３は、トラッキング部５２またはトラッキング部５３から供給された被写体領域情報に基づいて、入力画像上の被写体領域情報により示される位置に、被写体枠が表示されるように入力画像を加工する。そして、表示制御部２３は、加工された入力画像を表示部２４に供給し、表示させる。 That is, the display control unit 23 displays the input image so that the subject frame is displayed at the position indicated by the subject region information on the input image based on the subject region information supplied from the tracking unit 52 or the tracking unit 53. Process. Then, the display control unit 23 supplies the processed input image to the display unit 24 for display.

なお、トラッキング処理の開始時、つまり１回目のトラッキング処理においては、ユーザにより指定された領域が、現フレームの被写体領域とされ、その被写体領域を示す被写体領域情報と入力画像とが保持部５４に保持される。この場合、ステップＳ１１乃至ステップＳ１４の処理は行われず、入力画像と被写体枠の表示の処理だけが行われる。 Note that at the start of the tracking process, that is, in the first tracking process, the area specified by the user is the subject area of the current frame, and subject area information indicating the subject area and the input image are stored in the holding unit 54. Retained. In this case, the process of steps S11 to S14 is not performed, and only the process of displaying the input image and the subject frame is performed.

ステップＳ１６において、画像処理装置１１は、被写体枠を表示させる処理を終了するか否かを判定する。例えば、ユーザにより処理の終了が指示された場合、処理を終了すると判定される。 In step S16, the image processing apparatus 11 determines whether or not to end the process of displaying the subject frame. For example, when the end of the process is instructed by the user, it is determined to end the process.

ステップＳ１６において、処理を終了しないと判定された場合、処理はステップＳ１１に戻り、上述した処理が繰り返される。すなわち、次のフレームの入力画像から被写体領域が検出され、入力画像とともに被写体枠が表示される。 If it is determined in step S16 that the process is not terminated, the process returns to step S11, and the above-described process is repeated. That is, the subject area is detected from the input image of the next frame, and the subject frame is displayed together with the input image.

これに対して、ステップＳ１６において、終了すると判定された場合、画像処理装置１１の各部は行なっている処理を終了し、トラッキング処理は終了する。 On the other hand, when it is determined in step S16 that the process is to be ended, each unit of the image processing apparatus 11 ends the process being performed, and the tracking process is ended.

このようにして、画像処理装置１１は、フレームごとに、入力画像が平坦であるか否かを判定し、その判定結果に応じて、動き検出、またはビジュアルアテンションの何れかを利用した方法により入力画像から被写体を検出し、被写体枠を表示させる。 In this manner, the image processing apparatus 11 determines whether or not the input image is flat for each frame, and inputs by a method using either motion detection or visual attention according to the determination result. A subject is detected from the image and a subject frame is displayed.

このように、入力画像が平坦であるかといった、入力画像の特性を特定し、特定された特性を有する画像を得意とするトラッキング方法により、入力画像から被写体を検出するようにしたので、より迅速に、かつより安定して被写体をトラッキングすることができる。 As described above, the characteristics of the input image, such as whether the input image is flat, are specified, and the subject is detected from the input image by the tracking method that is good at the image having the specified characteristics. In addition, the subject can be tracked more stably.

例えば、ビジュアルアテンションや動き検出では、Lucas-Kanadeアルゴリズムを利用して特徴点を検出する場合よりも処理量が少なくてすむので、より迅速に被写体を検出することができる。しかも、画像処理装置１１では、被写体の検出にシリコン網膜や深度カメラといった特殊な器具が不要であるため、より簡単にトラッキングを行なうことができ、一般的な撮像装置等にも、容易に実装することが可能である。 For example, in visual attention and motion detection, the amount of processing is smaller than in the case where feature points are detected using the Lucas-Kanade algorithm, so that the subject can be detected more quickly. In addition, the image processing apparatus 11 does not require a special instrument such as a silicon retina or a depth camera for detecting a subject, so that tracking can be performed more easily and can be easily mounted on a general imaging apparatus or the like. It is possible.

［平坦度算出処理の説明］
次に、図１１のフローチャートを参照して、図１０のステップＳ１１に対応する平坦度算出処理について説明する。 [Description of flatness calculation processing]
Next, the flatness calculation process corresponding to step S11 of FIG. 10 will be described with reference to the flowchart of FIG.

ステップＳ４１において、分散値算出部６１は、供給された現フレームの入力画像を複数のブロックに分割する。 In step S41, the variance value calculation unit 61 divides the supplied input image of the current frame into a plurality of blocks.

ステップＳ４２において、分散値算出部６１は、入力画像上の１つのブロックを選択し、そのブロック内の画素の画素値を用いて、画素値の分散値を算出する。 In step S42, the variance value calculation unit 61 selects one block on the input image, and calculates the variance value of the pixel values using the pixel values of the pixels in the block.

すなわち、分散値算出部６１は、ブロック内の各画素について、その画素の画素値と、ブロック内の画素の画素値の平均値との差分の２乗値を求め、それらの画素ごとの２乗値の総和をブロック内の全画素数で除算することにより、ブロックの分散値を求める。分散値算出部６１は、算出した分散値を平坦度算出部６２に供給する。 That is, the variance value calculation unit 61 obtains, for each pixel in the block, a square value of the difference between the pixel value of the pixel and the average value of the pixel values of the pixels in the block, and squares for each pixel. By dividing the sum of the values by the total number of pixels in the block, the variance value of the block is obtained. The variance value calculation unit 61 supplies the calculated variance value to the flatness calculation unit 62.

ステップＳ４３において、平坦度算出部６２は、分散値算出部６１から供給されたブロックの分散値が予め定められた閾値ｔｈＢ未満であるか否か、すなわち処理対象のブロックが平坦であるか否かを判定する。 In step S43, the flatness calculation unit 62 determines whether or not the variance value of the block supplied from the variance value calculation unit 61 is less than a predetermined threshold thB, that is, whether or not the processing target block is flat. Determine.

ブロックの分散値がある程度小さい場合、ブロック内の画素の画素値は、ばらつきが少ないため、そのブロックは平坦な絵柄であるといえる。そのため、分散値が閾値ｔｈＢ未満であるブロックは、平坦なブロックであるとされる。 When the variance value of the block is small to some extent, it can be said that the pixel value of the pixels in the block has a small variation, and thus the block has a flat picture. Therefore, a block having a variance value less than the threshold thB is regarded as a flat block.

ステップＳ４３において、分散値が閾値ｔｈＢ未満であると判定された場合、つまり処理対象のブロックが平坦なものである場合、ステップＳ４４において、平坦度算出部６２は、保持している平坦度に１を加算する。 If it is determined in step S43 that the variance is less than the threshold thB, that is, if the processing target block is flat, in step S44, the flatness calculation unit 62 adds 1 to the held flatness. Is added.

すなわち、平坦度算出部６２は、処理対象のフレームの入力画像について、その入力画像を構成する平坦なブロックの数を示す平坦度を保持しており、この平坦度は、新たなフレームの入力画像が供給されるたびに「０」に初期化される。平坦度算出部６２は、平坦なブロックが検出されると、保持している平坦度に１を加算する。 That is, the flatness calculation unit 62 holds the flatness indicating the number of flat blocks constituting the input image for the input image of the processing target frame, and this flatness is the input image of the new frame. Is initialized to “0” each time. When a flat block is detected, the flatness calculating unit 62 adds 1 to the held flatness.

したがって、入力画像の全てのブロックが処理対象とされたとき、平坦度により示される数は、処理対象のフレームの入力画像における、平坦なブロックの数と等しくなり、この平坦度を指標とすれば、入力画像全体が平坦であるかを特定することができる。 Therefore, when all the blocks of the input image are processed, the number indicated by the flatness is equal to the number of flat blocks in the input image of the processing target frame, and this flatness is used as an index. It is possible to specify whether the entire input image is flat.

ステップＳ４４において、平坦度に１が加算されると、その後、処理はステップＳ４５に進む。 If 1 is added to the flatness in step S44, then the process proceeds to step S45.

一方、ステップＳ４３において、分散値が閾値ｔｈＢ以上であると判定された場合、つまり処理対象のブロックが平坦ではないと判定された場合、平坦度は更新されないので、ステップＳ４４の処理は行われず、処理はステップＳ４５へと進む。 On the other hand, if it is determined in step S43 that the variance is greater than or equal to the threshold thB, that is, if it is determined that the block to be processed is not flat, the flatness is not updated, so the process of step S44 is not performed. The process proceeds to step S45.

ステップＳ４４において平坦度が更新されたか、またはステップＳ４３において分散値が閾値ｔｈＢ以上であると判定されると、ステップＳ４５において、平坦判定部５１は、入力画像上の全てのブロックが処理対象とされたか否かを判定する。 If it is determined in step S44 that the flatness has been updated or the variance value is greater than or equal to the threshold thB in step S43, the flatness determination unit 51 sets all the blocks on the input image to be processed in step S45. It is determined whether or not.

ステップＳ４５において、まだ全てのブロックが処理対象とされていないと判定された場合、処理はステップＳ４２に戻り、上述した処理が繰り返される。すなわち、次のブロックが処理対象とされて、平坦度が更新される。 If it is determined in step S45 that not all blocks have been processed, the process returns to step S42 and the above-described process is repeated. That is, the next block is set as a processing target, and the flatness is updated.

これに対して、ステップＳ４５において、全てのブロックが処理対象とされたと判定された場合、平坦度算出部６２は、保持している平坦度を、現フレームの入力画像の最終的な平坦度として判定部６３に供給し、平坦度算出処理は終了する。そして、その後、処理は図１０のステップＳ１２へと進む。 On the other hand, when it is determined in step S45 that all the blocks have been processed, the flatness calculation unit 62 uses the held flatness as the final flatness of the input image of the current frame. This is supplied to the determination unit 63, and the flatness calculation process ends. Then, the process proceeds to step S12 in FIG.

このようにして、平坦判定部５１は、現フレームの入力画像について、その入力画像の特性を示す平坦度を算出する。これにより、入力画像の特性を特定し、より適切なトラッキング方法を選択することができるようになる。 In this way, the flatness determination unit 51 calculates the flatness indicating the characteristics of the input image for the input image of the current frame. As a result, the characteristics of the input image can be specified and a more appropriate tracking method can be selected.

［動き検出による被写体検出処理の説明］
次に、図１２のフローチャートを参照して、図１０のステップＳ１３の処理に対応する動き検出による被写体検出処理について説明する。 [Description of subject detection processing by motion detection]
Next, a subject detection process by motion detection corresponding to the process of step S13 of FIG. 10 will be described with reference to the flowchart of FIG.

ステップＳ７１において、ブロック動き検出部９１は、保持部５４から前フレームの入力画像と被写体領域情報を取得して、前フレームの入力画像における被写体領域情報により示される被写体領域を複数のブロックに分割する。 In step S71, the block motion detection unit 91 acquires the input image and subject region information of the previous frame from the holding unit 54, and divides the subject region indicated by the subject region information in the input image of the previous frame into a plurality of blocks. .

ステップＳ７２において、ブロック動き検出部９１は、分割された各ブロックについて、ブロックと、供給された現フレームの入力画像とを用いて各ブロックの動きを検出する。 In step S72, the block motion detecting unit 91 detects the motion of each block using the block and the supplied input image of the current frame for each divided block.

例えば、ブロック動き検出部９１は、現フレームの入力画像と、処理対象のブロックとを用いたブロックマッチングにより、現フレームの入力画像上において、処理対象のブロックと最も相関の高い（類似の度合いの高い）領域を検索する。そして、ブロック動き検出部９１は、検索の結果に基づいて、フレーム間のブロックの動きとして、処理対象のブロックの動きベクトルを求める。 For example, the block motion detecting unit 91 has the highest correlation (the degree of similarity) with the processing target block on the input image of the current frame by block matching using the input image of the current frame and the processing target block. Search for high areas. Then, the block motion detection unit 91 obtains a motion vector of the processing target block as the motion of the block between frames based on the search result.

これにより、例えば図１３の図中、左側に示すように、前フレームの入力画像Ｐ（ｎ−１）上の被写体領域ＳＲ（ｎ−１）が複数のブロックに分割され、各ブロックの動きベクトルが求められる。図１３の例では、被写体領域ＳＲ（ｎ−１）は、縦４×横４の合計１６個のブロックに分割されており、これらのブロック内の矢印は、各ブロックの動きベクトルを表している。 Accordingly, for example, as shown on the left side in FIG. 13, the subject region SR (n−1) on the input image P (n−1) of the previous frame is divided into a plurality of blocks, and the motion vector of each block Is required. In the example of FIG. 13, the subject region SR (n−1) is divided into a total of 16 blocks of 4 × 4 in the vertical direction, and the arrows in these blocks represent the motion vectors of each block. .

ブロック動き検出部９１は、各ブロックの動きベクトルを求めると、それらの動きベクトルを被写体動き検出部９２に供給する。 When the block motion detection unit 91 obtains the motion vector of each block, the block motion detection unit 91 supplies the motion vector to the subject motion detection unit 92.

ステップＳ７３において、被写体動き検出部９２は、ブロック動き検出部９１から供給された各ブロックの動きベクトルを用いて、被写体領域全体の動きを検出する。例えば、被写体動き検出部９２は、被写体領域の動きとして、各ブロックの動きベクトルの平均を求め、得られた動きベクトルを被写体領域決定部９３に供給する。 In step S 73, the subject motion detection unit 92 detects the motion of the entire subject region using the motion vector of each block supplied from the block motion detection unit 91. For example, the subject motion detection unit 92 obtains the average of the motion vectors of each block as the motion of the subject region, and supplies the obtained motion vector to the subject region determination unit 93.

ステップＳ７４において、被写体領域決定部９３は、被写体動き検出部９２から供給された被写体領域全体の動きを示す動きベクトルと、保持部５４に保持されている前フレームの被写体領域情報とから、現フレームの入力画像上の被写体領域を特定する。 In step S 74, the subject area determination unit 93 determines the current frame from the motion vector indicating the movement of the entire subject area supplied from the subject motion detection unit 92 and the subject area information of the previous frame held in the holding unit 54. The subject area on the input image is specified.

具体的には、例えば図１３に示すように、被写体領域ＳＲ（ｎ−１）内の各ブロックの動きベクトルの平均が求められ、図中、右側に示されるように、被写体領域ＳＲ（ｎ−１）全体の動きを示す動きベクトルＶ（ｎ−１）が得られたとする。 Specifically, for example, as shown in FIG. 13, the average of the motion vectors of the respective blocks in the subject region SR (n−1) is obtained, and as shown on the right side in the drawing, the subject region SR (n− 1) Assume that a motion vector V (n−1) indicating the overall motion is obtained.

ここで、図中、横方向および縦方向をそれぞれｘ方向およびｙ方向とし、各ブロックの動きベクトルがｘ成分およびｙ成分からなるとする。このとき、ブロックの動きベクトルのｘ成分およびｙ成分の平均値が、それぞれ動きベクトルＶ（ｎ−１）のｘ成分およびｙ成分とされる。 Here, in the figure, it is assumed that the horizontal direction and the vertical direction are the x direction and the y direction, respectively, and the motion vector of each block consists of an x component and a y component. At this time, the average values of the x and y components of the motion vector of the block are set as the x and y components of the motion vector V (n−1), respectively.

被写体領域決定部９３は、このようにして得られた動きベクトルＶ（ｎ−１）と、前フレームの被写体領域情報により示される被写体領域ＳＲ（ｎ−１）の位置とから、現フレームの被写体領域ＳＲ（ｎ）を特定する。すなわち、被写体領域決定部９３は、現フレームの入力画像Ｐ（ｎ）上において、被写体領域ＳＲ（ｎ−１）と同じ位置の領域を、動きベクトルＶ（ｎ−１）の方向に、動きベクトルＶ（ｎ−１）の大きさだけ移動させ、移動後の領域を被写体領域ＳＲ（ｎ）とする。 The subject area determination unit 93 determines the subject of the current frame from the motion vector V (n−1) obtained in this way and the position of the subject area SR (n−1) indicated by the subject area information of the previous frame. Region SR (n) is specified. That is, the subject region determination unit 93 moves the region at the same position as the subject region SR (n−1) on the input image P (n) of the current frame in the direction of the motion vector V (n−1). The region is moved by the size of V (n−1), and the region after the movement is defined as a subject region SR (n).

前フレームの被写体領域ＳＲ（ｎ−１）には、追尾対象の被写体が含まれているから、その被写体領域全体のフレーム間の動きは、追尾対象の被写体のフレーム間の動きとなる。したがって、現フレームの入力画像上では、入力画像上の前フレームの被写体領域と同じ位置から、その被写体領域の動きの分だけ離れた位置にある領域内に、追尾対象の被写体が存在するはずである。 Since the subject area SR (n−1) of the previous frame includes the subject to be tracked, the movement between frames of the entire subject area is the movement between frames of the subject to be tracked. Therefore, on the input image of the current frame, the subject to be tracked should exist in an area that is separated from the same position as the subject area of the previous frame on the input image by the movement of the subject area. is there.

そこで、被写体領域決定部９３は、現フレームの入力画像において、前フレームの被写体領域と同じ位置から、被写体領域の動きの分だけ離れた位置の領域を、現フレームの被写体領域とする。 Therefore, the subject region determination unit 93 sets a region at a position away from the same position as the subject region of the previous frame by the amount of movement of the subject region in the input image of the current frame as the subject region of the current frame.

ステップＳ７５において、被写体領域決定部９３は、特定された現フレームの被写体領域の位置を示す被写体領域情報を生成し、表示制御部２３に供給するとともに、現フレームの被写体領域情報を保持部５４に供給し、保持させる。 In step S75, the subject area determination unit 93 generates subject area information indicating the position of the identified subject area of the current frame, supplies the subject area information to the display control unit 23, and stores the subject area information of the current frame in the holding unit 54. Supply and hold.

現フレームの被写体領域情報が生成されると、動き検出による被写体検出処理は終了し、その後、処理は図１０のステップＳ１５に進む。 When the subject area information of the current frame is generated, the subject detection process by motion detection ends, and then the process proceeds to step S15 in FIG.

このようにして、トラッキング部５２は、被写体領域のフレーム間の動きを検出することで、現フレームの被写体領域を検出する。このように、フレーム間の動きを利用して、現フレームにおける被写体の位置を検出すれば、入力画像が平坦でない、ある程度起伏のある画像である場合には、より高い精度で被写体を検出することができる。 In this way, the tracking unit 52 detects the subject area of the current frame by detecting the movement of the subject area between frames. In this way, if the position of the subject in the current frame is detected using the motion between frames, the subject can be detected with higher accuracy if the input image is uneven and has some undulation. Can do.

［ビジュアルアテンションによる被写体検出処理の説明］
さらに、図１４のフローチャートを参照して、図１０のステップＳ１４の処理に対応するビジュアルアテンションによる被写体検出処理について説明する。 [Description of subject detection processing by visual attention]
Further, a subject detection process based on visual attention corresponding to the process of step S14 of FIG. 10 will be described with reference to the flowchart of FIG.

ステップＳ１０１において、被写体抽出部１２１は、判定部６３から供給された現フレームの入力画像を用いて被写体マップ生成処理を行い、入力画像の各領域における被写体らしさを示す被写体マップを生成し、被写体候補領域決定部１２２に供給する。この被写体マップでは、画素の画素値が大きい領域ほど、被写体らしい領域であることを示している。なお、被写体マップ生成処理の詳細は後述する。 In step S101, the subject extraction unit 121 performs subject map generation processing using the input image of the current frame supplied from the determination unit 63, generates a subject map indicating the likelihood of the subject in each area of the input image, and sets the subject candidate. This is supplied to the area determination unit 122. In this subject map, a region having a larger pixel value indicates a region that is likely to be a subject. Details of the subject map generation process will be described later.

ステップＳ１０２において、被写体候補領域決定部１２２は、被写体抽出部１２１から供給された被写体マップを、予め定められた閾値ｔｈＣを用いた閾値処理により２値化する。具体的には、被写体候補領域決定部１２２は、被写体マップの画素の画素値が閾値ｔｈＣ以上であれば、その画素の画素値を「１」とし、被写体マップの画素の画素値が閾値ｔｈＣ未満であれば、その画素の画素値を「０」とする。 In step S102, the subject candidate area determination unit 122 binarizes the subject map supplied from the subject extraction unit 121 by threshold processing using a predetermined threshold thC. Specifically, if the pixel value of the pixel of the subject map is equal to or greater than the threshold thC, the subject candidate area determination unit 122 sets the pixel value of the pixel to “1” and the pixel value of the pixel of the subject map is less than the threshold thC. If so, the pixel value of the pixel is set to “0”.

２値化された被写体マップにおいては、画素値が「１」である画素が、被写体らしい領域であり、画素値が「０」である画素は、被写体ではない領域（例えば、背景の領域）であるとされる。つまり、２値化後の被写体マップは、入力画像における被写体らしい領域を示している。 In the binarized subject map, a pixel with a pixel value “1” is a region that seems to be a subject, and a pixel with a pixel value “0” is a region that is not a subject (for example, a background region). It is supposed to be. That is, the binarized subject map indicates a region that seems to be a subject in the input image.

ステップＳ１０３において、被写体候補領域決定部１２２は、２値化された被写体マップに対して矩形化処理を行い、被写体マップ上における被写体領域の候補となる被写体候補領域を抽出する。 In step S103, the subject candidate region determination unit 122 performs a rectangular process on the binarized subject map, and extracts subject candidate regions that are subject region candidates on the subject map.

具体的には、被写体候補領域決定部１２２は、２値化後の被写体マップにおいて、互いに隣接する、画素値が１である画素からなる領域を検出し、検出された領域を囲む矩形の領域を、被写体候補領域とする。画素値が「１」である画素からなる領域は、１つの被写体全体の領域を表しているため、この領域が、追尾対象の被写体が含まれる被写体領域の候補となる被写体候補領域とされる。 Specifically, the subject candidate region determination unit 122 detects a region including pixels adjacent to each other and having a pixel value of 1 in the binarized subject map, and determines a rectangular region surrounding the detected region. The subject candidate area. An area composed of pixels having a pixel value of “1” represents an entire area of one subject. Therefore, this area is a subject candidate area that is a candidate for a subject area including the subject to be tracked.

被写体候補領域決定部１２２は、被写体候補領域を抽出すると、各被写体候補領域の位置を示す情報を被写体領域決定部１２３に供給する。 When the subject candidate region determination unit 122 extracts the subject candidate region, the subject candidate region determination unit 122 supplies information indicating the position of each subject candidate region to the subject region determination unit 123.

ステップＳ１０４において、被写体領域決定部１２３は、保持部５４に保持されている前フレームの被写体領域情報を用いて、被写体候補領域決定部１２２から供給された情報に示される被写体候補領域のうちの何れかを、現フレームの被写体領域として選択する。 In step S 104, the subject area determination unit 123 uses any of the subject candidate areas indicated in the information supplied from the subject candidate area determination unit 122 using the subject area information of the previous frame held in the holding unit 54. Is selected as the subject area of the current frame.

例えば、図１５の左側に示すように、前フレームの入力画像Ｐ（ｎ−１）上のほぼ中央に被写体領域ＳＲ（ｎ−１）が検出されたとする。この場合、被写体領域決定部１２３は、現フレームの入力画像Ｐ（ｎ）上において、前フレームの被写体領域情報により示される被写体領域ＳＲ（ｎ−１）の中心と同じ位置を中心位置Ｃ（ｎ−１）とする。 For example, as shown on the left side of FIG. 15, it is assumed that the subject region SR (n−1) is detected at approximately the center of the input image P (n−1) of the previous frame. In this case, the subject area determination unit 123 sets the same position as the center of the subject area SR (n−1) indicated by the subject area information of the previous frame on the input image P (n) of the current frame as the center position C (n -1).

そして、被写体領域決定部１２３は、入力画像Ｐ（ｎ）において、現フレームの被写体候補領域のうち、中心位置Ｃ（ｎ−１）が含まれる被写体候補領域を、現フレームの被写体領域として選択する。例えば、図中、右側の例では、入力画像Ｐ（ｎ）上の中心位置Ｃ（ｎ−１）が含まれる被写体候補領域Ｓ（ｎ）が、現フレームの被写体領域として選択される。すなわち、被写体領域として選択された被写体候補領域Ｓ（ｎ）は、被写体候補領域のうち、現フレームの入力画像Ｐ（ｎ）上における、最も前フレームの被写体領域と同じ位置から近い被写体候補領域である。 Then, the subject region determination unit 123 selects a subject candidate region including the center position C (n−1) among the subject candidate regions in the current frame in the input image P (n) as the subject region in the current frame. . For example, in the example on the right side in the figure, the subject candidate region S (n) including the center position C (n−1) on the input image P (n) is selected as the subject region of the current frame. That is, the subject candidate region S (n) selected as the subject region is a subject candidate region closest to the same position as the subject region of the previous frame on the input image P (n) of the current frame among the subject candidate regions. is there.

被写体候補領域が複数検出された場合、前フレームの被写体領域から近い位置にある被写体候補領域内に、追尾対象となる被写体が含まれている可能性が高い。そこで、被写体領域決定部１２３は、中心位置Ｃ（ｎ−１）が含まれる被写体候補領域を、現フレームの被写体領域として選択する。 When a plurality of subject candidate areas are detected, there is a high possibility that a subject to be tracked is included in the subject candidate area located near the subject area of the previous frame. Therefore, the subject area determination unit 123 selects a subject candidate area including the center position C (n−1) as the subject area of the current frame.

なお、入力画像上に中心位置Ｃ（ｎ−１）が含まれる被写体候補領域が複数ある場合、それらの被写体候補領域のうち、最も被写体らしさの評価の高い被写体候補領域が、現フレームの被写体領域とされる。 If there are a plurality of subject candidate regions including the center position C (n−1) on the input image, the subject candidate region with the highest evaluation of subjectness among those subject candidate regions is the subject region of the current frame. It is said.

そのような場合、例えば被写体候補領域決定部１２２は、２値化前の被写体マップに基づいて、各被写体候補領域の被写体らしさの評価を示す評価値を算出する。例えば、被写体マップ上の被写体候補領域と同じ領域内の画素の画素値の平均値または最大値が、その被写体候補領域の評価値とされる。そして、被写体領域決定部１２３は、被写体候補領域決定部１２２により算出された評価値を用いて、中心位置Ｃ（ｎ−１）を含む被写体候補領域のうち、評価値が最大の被写体候補領域を被写体領域として選択する。 In such a case, for example, the subject candidate area determination unit 122 calculates an evaluation value indicating an evaluation of the subject likeness of each subject candidate area based on the subject map before binarization. For example, the average value or the maximum value of the pixel values in the same region as the subject candidate region on the subject map is set as the evaluation value of the subject candidate region. Then, the subject area determination unit 123 uses the evaluation value calculated by the subject candidate area determination unit 122 to select a subject candidate area having the maximum evaluation value from among the subject candidate areas including the center position C (n−1). Select as subject area.

また、被写体候補領域は検出されたが、中心位置Ｃ（ｎ−１）が含まれる被写体候補領域がないこともあり得る。そのような場合、検出された被写体候補領域のうちの評価値が最大のものが被写体領域とされてもよいし、中心位置Ｃ（ｎ−１）から最も近い位置に中心がある被写体候補領域が、被写体領域として選択されてもよい。 In addition, although the subject candidate area is detected, there may be no subject candidate area including the center position C (n−1). In such a case, the detected subject candidate region having the largest evaluation value may be set as the subject region, or the subject candidate region having the center closest to the center position C (n−1). May be selected as the subject area.

図１４のフローチャートの説明に戻り、ステップＳ１０５において、被写体領域決定部１２３は、特定された現フレームの被写体領域の位置を示す被写体領域情報を生成し、表示制御部２３に供給するとともに、現フレームの被写体領域情報を保持部５４に供給し、保持させる。 Returning to the description of the flowchart of FIG. 14, in step S105, the subject region determination unit 123 generates subject region information indicating the position of the subject region of the identified current frame and supplies the subject region information to the display control unit 23. The subject area information is supplied to the holding unit 54 and held.

現フレームの被写体領域情報が生成されると、ビジュアルアテンションによる被写体検出処理は終了し、その後、処理は図１０のステップＳ１５に進む。 When the subject area information of the current frame is generated, the subject detection processing by visual attention is finished, and then the processing proceeds to step S15 in FIG.

このようにして、トラッキング部５３は、被写体マップを用いて入力画像から任意の被写体らしい領域を被写体候補領域として抽出する。そして、トラッキング部５３は、それらの被写体候補領域のなかから、前フレームの被写体領域の中心位置を含むものを、追尾対象の被写体が含まれる現フレームの被写体領域として選択する。このように、被写体マップを利用して、現フレームにおける被写体の位置を検出すれば、入力画像が平坦である場合でも、より高い精度で被写体を検出することができる。 In this way, the tracking unit 53 uses the subject map to extract an arbitrary subject-like region from the input image as a subject candidate region. Then, the tracking unit 53 selects, from these subject candidate regions, the one including the center position of the subject region of the previous frame as the subject region of the current frame including the subject to be tracked. In this way, if the position of the subject in the current frame is detected using the subject map, the subject can be detected with higher accuracy even when the input image is flat.

［被写体マップ生成処理の説明］
また、以下、図１６のフローチャートを参照して、図１４のステップＳ１０１の処理に対応する被写体マップ生成処理について説明する。 [Description of subject map generation processing]
Hereinafter, the subject map generation process corresponding to the process of step S101 of FIG. 14 will be described with reference to the flowchart of FIG.

ステップＳ１３１において、輝度情報抽出部１５１は、輝度情報抽出処理を行って、判定部６３から供給された入力画像に基づいて輝度情報マップを生成し、被写体マップ生成部１５５に供給する。そして、ステップＳ１３２において、色情報抽出部１５２は、色情報抽出処理を行って、判定部６３から供給された入力画像に基づいて色情報マップを生成し、被写体マップ生成部１５５に供給する。 In step S 131, the luminance information extraction unit 151 performs luminance information extraction processing, generates a luminance information map based on the input image supplied from the determination unit 63, and supplies the luminance information map to the subject map generation unit 155. In step S132, the color information extraction unit 152 performs color information extraction processing, generates a color information map based on the input image supplied from the determination unit 63, and supplies the color information map to the subject map generation unit 155.

ステップＳ１３３において、エッジ情報抽出部１５３は、エッジ情報抽出処理を行って、判定部６３から供給された入力画像に基づいてエッジ情報マップを生成し、被写体マップ生成部１５５に供給する。また、ステップＳ１３４において、顔情報抽出部１５４は、顔情報抽出処理を行って、判定部６３から供給された入力画像に基づいて顔情報マップを生成し、被写体マップ生成部１５５に供給する。 In step S 133, the edge information extraction unit 153 performs edge information extraction processing, generates an edge information map based on the input image supplied from the determination unit 63, and supplies it to the subject map generation unit 155. In step S 134, the face information extraction unit 154 performs face information extraction processing, generates a face information map based on the input image supplied from the determination unit 63, and supplies the face information map to the subject map generation unit 155.

なお、これらの輝度情報抽出処理、色情報抽出処理、エッジ情報抽出処理、および顔情報抽出処理の詳細は後述する。 Details of these luminance information extraction processing, color information extraction processing, edge information extraction processing, and face information extraction processing will be described later.

ステップＳ１３５において、被写体マップ生成部１５５は、輝度情報抽出部１５１乃至顔情報抽出部１５４から供給された輝度情報マップ乃至顔情報マップを用いて、被写体マップを生成し、被写体候補領域決定部１２２に供給する。 In step S135, the subject map generation unit 155 generates a subject map using the luminance information map or the face information map supplied from the luminance information extraction unit 151 to the face information extraction unit 154, and sends the subject map to the subject candidate region determination unit 122. Supply.

例えば、被写体マップ生成部１５５は、情報マップごとに予め求められている重みである、情報重みＷｂを用いて各情報マップを線形結合し、さらに、その結果得られたマップの画素値に、予め求められた重みである、被写体重みＷｃを乗算して正規化し、被写体マップとする。 For example, the subject map generation unit 155 linearly combines the information maps using the information weight Wb, which is a weight obtained in advance for each information map, and further, the pixel value of the map obtained as a result is preliminarily stored. The subject weight Wc, which is the obtained weight, is multiplied and normalized to obtain a subject map.

つまり、これから求めようとする被写体マップ上の注目する画素を注目画素とすると、各情報マップの注目画素と同じ位置の画素の画素値に、情報マップごとの情報重みＷｂが乗算され、情報重みＷｂの乗算された画素値の総和が、注目画素の画素値とされる。さらに、このようにして求められた被写体マップの各画素の画素値に、被写体マップに対して予め求められた被写体重みＷｃが乗算されて正規化され、最終的な被写体マップとされる。 That is, assuming that the pixel of interest on the subject map to be obtained is the pixel of interest, the pixel value of the pixel at the same position as the pixel of interest of each information map is multiplied by the information weight Wb for each information map, and the information weight Wb Is the pixel value of the target pixel. Further, the pixel value of each pixel of the subject map obtained in this way is multiplied by a subject weight Wc obtained in advance for the subject map to be normalized to obtain a final subject map.

なお、より詳細には、色情報マップとして、ＲＧの差分の色情報マップと、ＢＹの差分の色情報マップとが用いられ、エッジ情報マップとして、０度、４５度、９０度、１３５度のそれぞれの方向のエッジ情報マップが用いられて、被写体マップが生成される。 More specifically, an RG difference color information map and a BY difference color information map are used as the color information map, and 0 degree, 45 degree, 90 degree, and 135 degree are used as the edge information map. The subject information map is generated using the edge information maps in the respective directions.

被写体マップが生成されて被写体候補領域決定部１２２に供給されると、被写体マップ生成処理は終了し、その後、処理は図１４のステップＳ１０２へと進む。 When the subject map is generated and supplied to the subject candidate area determination unit 122, the subject map generation process ends, and then the process proceeds to step S102 in FIG.

［輝度情報抽出処理の説明］
次に、図１７乃至図２０のフローチャートを参照して、図１６のステップＳ１３１乃至ステップＳ１３４の処理のそれぞれに対応する処理について説明する。 [Description of luminance information extraction processing]
Next, processing corresponding to each of the processing of step S131 to step S134 of FIG. 16 will be described with reference to the flowcharts of FIGS.

まず、図１７のフローチャートを参照して、図１６のステップＳ１３１の処理に対応する輝度情報抽出処理について説明する。 First, the luminance information extraction process corresponding to the process of step S131 of FIG. 16 will be described with reference to the flowchart of FIG.

ステップＳ１６１において、輝度画像生成部１８１は、判定部６３から供給された入力画像を用いて輝度画像を生成し、ピラミッド画像生成部１８２に供給する。例えば、輝度画像生成部１８１は、入力画像の画素のＲ、Ｇ、およびＢの各成分の値に、成分ごとに予め定められた係数を乗算し、係数の乗算された各成分の値の和を、入力画像の画素と同じ位置にある輝度画像の画素の画素値とする。つまり、輝度成分（Ｙ）および色差成分（Ｃｂ，Ｃｒ）からなるコンポーネント信号の輝度成分が求められる。なお、画素のＲ、Ｇ、およびＢの各成分の値の平均値が、輝度画像の画素の画素値とされてもよい。 In step S 161, the luminance image generation unit 181 generates a luminance image using the input image supplied from the determination unit 63 and supplies the luminance image to the pyramid image generation unit 182. For example, the luminance image generation unit 181 multiplies the values of the R, G, and B components of the pixels of the input image by a coefficient predetermined for each component, and sums the values of the components multiplied by the coefficients. Is the pixel value of the pixel of the luminance image at the same position as the pixel of the input image. That is, the luminance component of the component signal composed of the luminance component (Y) and the color difference components (Cb, Cr) is obtained. Note that the average value of the R, G, and B component values of the pixel may be the pixel value of the pixel of the luminance image.

ステップＳ１６２において、ピラミッド画像生成部１８２は、輝度画像生成部１８１から供給された輝度画像に基づいて、レベルＬ１乃至レベルＬ８の各階層のピラミッド画像を生成し、差分算出部１８３に供給する。 In step S 162, the pyramid image generation unit 182 generates pyramid images of the respective levels L 1 to L 8 based on the luminance image supplied from the luminance image generation unit 181, and supplies the pyramid image to the difference calculation unit 183.

ステップＳ１６３において、差分算出部１８３は、ピラミッド画像生成部１８２から供給されたピラミッド画像を用いて差分画像を生成して正規化し、輝度情報マップ生成部１８４に供給する。正規化は、差分画像の画素の画素値が、例えば０乃至２５５の間の値となるように行われる。 In step S 163, the difference calculation unit 183 generates and normalizes a difference image using the pyramid image supplied from the pyramid image generation unit 182, and supplies the difference image to the luminance information map generation unit 184. The normalization is performed so that the pixel value of the pixel of the difference image becomes a value between 0 and 255, for example.

具体的には、差分算出部１８３は、各階層の輝度のピラミッド画像のうち、レベルＬ６およびレベルＬ３、レベルＬ７およびレベルＬ３、レベルＬ７およびレベルＬ４、レベルＬ８およびレベルＬ４、並びにレベルＬ８およびレベルＬ５の各階層の組み合わせのピラミッド画像の差分を求める。これにより、合計５つの輝度の差分画像が得られる。 Specifically, the difference calculation unit 183 includes the level L6 and the level L3, the level L7 and the level L3, the level L7 and the level L4, the level L8 and the level L4, and the level L8 and the level among the luminance pyramid images of each layer. The difference of the pyramid image of the combination of each layer of L5 is obtained. As a result, a total of five luminance difference images are obtained.

例えば、レベルＬ６およびレベルＬ３の組み合わせの差分画像が生成される場合、レベルＬ６のピラミッド画像が、レベルＬ３のピラミッド画像の大きさに合わせてアップコンバートされる。つまり、アップコンバート前のレベルＬ６のピラミッド画像の１つの画素の画素値が、その画素に対応する、アップコンバート後のレベルＬ６のピラミッド画像の互いに隣接するいくつかの画素の画素値とされる。そして、レベルＬ６のピラミッド画像の画素の画素値と、その画素と同じ位置にあるレベルＬ３のピラミッド画像の画素の画素値との差分が求められ、その差分が差分画像の画素の画素値とされる。 For example, when a difference image of a combination of level L6 and level L3 is generated, the pyramid image of level L6 is up-converted according to the size of the pyramid image of level L3. That is, the pixel value of one pixel of the pyramid image of level L6 before up-conversion is the pixel value of several pixels adjacent to each other of the pyramid image of level L6 after up-conversion corresponding to that pixel. Then, the difference between the pixel value of the pixel of the level L6 pyramid image and the pixel value of the pixel of the level L3 pyramid image at the same position as the pixel is obtained, and the difference is set as the pixel value of the pixel of the difference image. The

これらの差分画像を生成する処理は、輝度画像にバンドパスフィルタを用いたフィルタ処理を施して、輝度画像から所定の周波数成分を抽出することと等価である。このようにして得られた差分画像の画素の画素値は、各レベルのピラミッド画像の輝度値の差、つまり入力画像における所定の画素における輝度と、その画素の周囲の平均的な輝度との差分を示している。 The process of generating these difference images is equivalent to performing a filter process using a bandpass filter on the luminance image and extracting a predetermined frequency component from the luminance image. The pixel value of the pixel of the difference image obtained in this way is the difference between the luminance values of the pyramid images at each level, that is, the difference between the luminance at a predetermined pixel in the input image and the average luminance around the pixel. Is shown.

一般的に、画像において周囲との輝度の差分の大きい領域は、その画像を見る人の目を引く領域であるので、その領域は被写体の領域である可能性が高い。したがって、各差分画像において、より画素値の大きい画素が、より被写体の領域である可能性の高い領域であることを示しているということができる。 In general, an area having a large luminance difference from the surroundings in an image is an area that catches the eye of a person who sees the image, so that the area is highly likely to be a subject area. Therefore, it can be said that in each difference image, a pixel having a larger pixel value is a region that is more likely to be a subject region.

ステップＳ１６４において、輝度情報マップ生成部１８４は、差分算出部１８３から供給された差分画像に基づいて輝度情報マップを生成し、被写体マップ生成部１５５に供給する。輝度情報マップが輝度情報マップ生成部１８４から被写体マップ生成部１５５に供給されると、輝度情報抽出処理は終了し、処理は図１６のステップＳ１３２に進む。 In step S 164, the luminance information map generation unit 184 generates a luminance information map based on the difference image supplied from the difference calculation unit 183 and supplies the luminance information map to the subject map generation unit 155. When the luminance information map is supplied from the luminance information map generation unit 184 to the subject map generation unit 155, the luminance information extraction process ends, and the process proceeds to step S132 in FIG.

例えば、輝度情報マップ生成部１８４は、供給された５つの差分画像を、予め求められた差分画像ごとの重みである差分重みＷａにより重み付き加算し、１つの画像を求める。すなわち、各差分画像の同じ位置にある画素の画素値のそれぞれに差分重みＷａが乗算されて、差分重みＷａが乗算された画素値の総和が求められる。 For example, the luminance information map generation unit 184 weights and adds the supplied five difference images with a difference weight Wa that is a weight for each difference image obtained in advance, thereby obtaining one image. That is, the pixel values of the pixels at the same position in each difference image are multiplied by the difference weight Wa to obtain the sum of the pixel values multiplied by the difference weight Wa.

なお、輝度情報マップの生成時において、各差分画像が同じ大きさとなるように、差分画像のアップコンバートが行われる。 Note that, at the time of generating the luminance information map, the difference image is up-converted so that each difference image has the same size.

このようにして、輝度情報抽出部１５１は、入力画像から輝度画像を求め、その輝度画像から輝度情報マップを生成する。このようにして得られた輝度情報マップによれば、入力画像において、輝度の差の大きい領域、つまり入力画像を一瞥した観察者の目に付きやすい領域を簡単に検出することができる。 In this way, the luminance information extraction unit 151 obtains a luminance image from the input image, and generates a luminance information map from the luminance image. According to the luminance information map thus obtained, it is possible to easily detect a region having a large difference in luminance, that is, a region that is easily noticeable by an observer who looks at the input image.

［色情報抽出処理の説明］
次に、図１８のフローチャートを参照して、図１６のステップＳ１３２の処理に対応する色情報抽出処理について説明する。 [Description of color information extraction processing]
Next, color information extraction processing corresponding to the processing in step S132 in FIG. 16 will be described with reference to the flowchart in FIG.

ステップＳ１９１において、ＲＧ差分画像生成部２１１は、判定部６３から供給された入力画像を用いて、ＲＧ差分画像を生成し、ピラミッド画像生成部２１３に供給する。 In step S191, the RG difference image generation unit 211 generates an RG difference image using the input image supplied from the determination unit 63, and supplies the RG difference image to the pyramid image generation unit 213.

ステップＳ１９２において、ＢＹ差分画像生成部２１２は、判定部６３から供給された入力画像を用いてＢＹ差分画像を生成し、ピラミッド画像生成部２１４に供給する。 In step S 192, the BY difference image generation unit 212 generates a BY difference image using the input image supplied from the determination unit 63 and supplies the BY difference image to the pyramid image generation unit 214.

ステップＳ１９３において、ピラミッド画像生成部２１３およびピラミッド画像生成部２１４は、ＲＧ差分画像生成部２１１からのＲＧ差分画像、およびＢＹ差分画像生成部２１２からのＢＹ差分画像を用いて、ピラミッド画像を生成する。 In step S193, the pyramid image generation unit 213 and the pyramid image generation unit 214 generate a pyramid image using the RG difference image from the RG difference image generation unit 211 and the BY difference image from the BY difference image generation unit 212. .

例えば、ピラミッド画像生成部２１３は、解像度の異なる複数のＲＧ差分画像を生成することにより、レベルＬ１乃至レベルＬ８の各階層のピラミッド画像を生成し、差分算出部２１５に供給する。同様に、ピラミッド画像生成部２１４は、解像度の異なる複数のＢＹ差分画像を生成することにより、レベルＬ１乃至レベルＬ８の各階層のピラミッド画像を生成し、差分算出部２１６に供給する。 For example, the pyramid image generation unit 213 generates a plurality of RG difference images having different resolutions, thereby generating pyramid images of the respective levels L1 to L8 and supplies the pyramid images to the difference calculation unit 215. Similarly, the pyramid image generation unit 214 generates a plurality of BY difference images having different resolutions, thereby generating pyramid images of the respective levels L1 to L8 and supplies the pyramid images to the difference calculation unit 216.

ステップＳ１９４において、差分算出部２１５および差分算出部２１６は、ピラミッド画像生成部２１３およびピラミッド画像生成部２１４から供給されたピラミッド画像に基づいて差分画像を生成して正規化し、色情報マップ生成部２１７および色情報マップ生成部２１８に供給する。差分画像の正規化では、例えば、画素の画素値が０乃至２５５の間の値となるようにされる。 In step S194, the difference calculation unit 215 and the difference calculation unit 216 generate and normalize the difference image based on the pyramid images supplied from the pyramid image generation unit 213 and the pyramid image generation unit 214, and the color information map generation unit 217. And supplied to the color information map generation unit 218. In normalization of the difference image, for example, the pixel value of the pixel is set to a value between 0 and 255.

例えば、差分算出部２１５は、各階層のＲＧの差分のピラミッド画像のうち、レベルＬ６およびレベルＬ３、レベルＬ７およびレベルＬ３、レベルＬ７およびレベルＬ４、レベルＬ８およびレベルＬ４、並びにレベルＬ８およびレベルＬ５の各階層の組み合わせのピラミッド画像の差分を求める。これにより、合計５つのＲＧの差分の差分画像が得られる。 For example, the difference calculation unit 215 includes the level L6 and level L3, the level L7 and level L3, the level L7 and level L4, the level L8 and level L4, and the level L8 and level L5 among the RG difference pyramid images of each layer. The difference of the pyramid image of each layer combination is obtained. As a result, a difference image of a total of five RG differences is obtained.

同様に、差分算出部２１６は、各階層のＢＹの差分のピラミッド画像のうち、レベルＬ６およびレベルＬ３、レベルＬ７およびレベルＬ３、レベルＬ７およびレベルＬ４、レベルＬ８およびレベルＬ４、並びにレベルＬ８およびレベルＬ５の各階層の組み合わせのピラミッド画像の差分を求める。これにより、合計５つのＢＹの差分の差分画像が得られる。 Similarly, the difference calculation unit 216 includes a level L6 and a level L3, a level L7 and a level L3, a level L7 and a level L4, a level L8 and a level L4, and a level L8 and a level among the BY difference pyramid images of each layer. The difference of the pyramid image of the combination of each layer of L5 is obtained. As a result, a difference image of a total of five BY differences is obtained.

これらの差分画像を生成する処理は、ＲＧ差分画像またはＢＹ差分画像にバンドパスフィルタを用いたフィルタ処理を施して、ＲＧ差分画像またはＢＹ差分画像から所定の周波数成分を抽出することと等価である。このようにして得られた差分画像の画素の画素値は、各レベルのピラミッド画像の特定の色成分の差、つまり入力画像の画素における特定の色の成分と、その画素の周囲の平均的な特定の色の成分との差分を示している。 The process of generating these difference images is equivalent to performing a filtering process using a bandpass filter on the RG difference image or the BY difference image and extracting a predetermined frequency component from the RG difference image or the BY difference image. . The pixel value of the pixel of the difference image obtained in this way is the difference between the specific color component of the pyramid image at each level, that is, the specific color component in the pixel of the input image and the average around the pixel. A difference from a specific color component is shown.

一般的に、画像において周囲と比べて目立つ色の領域、つまり特定の色成分の周囲との差分の大きい領域は、その画像を見る人の目を引く領域であるので、その領域は被写体の領域である可能性が高い。したがって、各差分画像において、より画素値の大きい画素が、より被写体の領域である可能性の高い領域であることを示しているということができる。 In general, an area of a color that stands out from the surroundings in the image, that is, an area that has a large difference from the surroundings of a specific color component is an eye-catching area of the person who sees the image. Is likely. Therefore, it can be said that in each difference image, a pixel having a larger pixel value is a region that is more likely to be a subject region.

ステップＳ１９５において、色情報マップ生成部２１７および色情報マップ生成部２１８は、差分算出部２１５からの差分画像、および差分算出部２１６からの差分画像を用いて色情報マップを生成し、被写体マップ生成部１５５に供給する。 In step S195, the color information map generation unit 217 and the color information map generation unit 218 generate a color information map using the difference image from the difference calculation unit 215 and the difference image from the difference calculation unit 216, and generate a subject map. Part 155.

例えば、色情報マップ生成部２１７は、差分算出部２１５から供給されたＲＧの差分の差分画像を、予め求められた差分画像ごとの差分重みＷａにより重み付き加算して、１つのＲＧの差分の色情報マップとする。 For example, the color information map generation unit 217 weights and adds the difference image of the RG supplied from the difference calculation unit 215 with the difference weight Wa for each difference image obtained in advance. Let it be a color information map.

同様に、色情報マップ生成部２１８は、差分算出部２１６から供給されたＢＹの差分の差分画像を、予め求められた差分重みＷａにより重み付き加算して、１つのＢＹの差分の色情報マップとする。なお、色情報マップの生成時において、各差分画像が同じ大きさとなるように、差分画像のアップコンバートが行われる。 Similarly, the color information map generation unit 218 weights and adds the difference image of the BY difference supplied from the difference calculation unit 216 with the difference weight Wa obtained in advance, and the color information map of one BY difference And Note that when the color information map is generated, the difference images are up-converted so that the difference images have the same size.

色情報マップ生成部２１７および色情報マップ生成部２１８が、このようにして得られたＲＧの差分の色情報マップ、およびＢＹの差分の色情報マップを被写体マップ生成部１５５に供給すると、色情報抽出処理は終了し、処理は図１６のステップＳ１３３に進む。 When the color information map generation unit 217 and the color information map generation unit 218 supply the RG difference color information map and the BY difference color information map obtained in this way to the subject map generation unit 155, the color information The extraction process ends, and the process proceeds to step S133 in FIG.

このようにして、色情報抽出部１５２は、入力画像から特定の色の成分の差分の画像を求め、その画像から色情報マップを生成する。このようにして得られた色情報マップによれば、入力画像において、特定の色成分の差の大きい領域、つまり入力画像を一瞥した観察者の目に付きやすい領域を簡単に検出することができる。 In this way, the color information extraction unit 152 obtains an image of a difference between specific color components from the input image, and generates a color information map from the image. According to the color information map obtained in this way, it is possible to easily detect a region where a difference between specific color components is large in the input image, that is, a region that is easily noticeable by an observer who looks at the input image. .

なお、色情報抽出部１５２では、入力画像から抽出される色の情報として、Ｒ（赤）の成分と、Ｇ（緑）の成分の差分、およびＢ（青）の成分と、Ｙ（黄）の成分との差分が抽出されると説明したが、色差成分Ｃｒと色差成分Ｃｂなどが抽出されるようにしてもよい。ここで、色差成分Ｃｒは、Ｒ成分と輝度成分との差分であり、色差成分Ｃｂは、Ｂ成分と輝度成分との差分である。 In the color information extraction unit 152, the difference between the R (red) component, the G (green) component, the B (blue) component, and the Y (yellow) as color information extracted from the input image. Although it has been described that the difference from the component is extracted, the color difference component Cr and the color difference component Cb may be extracted. Here, the color difference component Cr is a difference between the R component and the luminance component, and the color difference component Cb is a difference between the B component and the luminance component.

［エッジ情報抽出処理の説明］
図１９は、図１６のステップＳ１３３の処理に対応するエッジ情報抽出処理を説明するフローチャートである。以下、このエッジ情報抽出処理について説明する。 [Description of edge information extraction processing]
FIG. 19 is a flowchart for explaining edge information extraction processing corresponding to the processing in step S133 in FIG. Hereinafter, the edge information extraction process will be described.

ステップＳ２２１において、エッジ画像生成部２４１乃至エッジ画像生成部２４４は、判定部６３から供給された入力画像に対して、ガボアフィルタを用いたフィルタ処理を施し、０度、４５度、９０度、および１３５度の方向のエッジ強度を画素の画素値とするエッジ画像を生成する。そして、エッジ画像生成部２４１乃至エッジ画像生成部２４４は、生成されたエッジ画像をピラミッド画像生成部２４５乃至ピラミッド画像生成部２４８に供給する。 In step S 221, the edge image generation unit 241 to the edge image generation unit 244 perform a filtering process using a Gabor filter on the input image supplied from the determination unit 63, and performs 0 degree, 45 degree, 90 degree, and 135 degree. An edge image having the edge intensity in the direction of degrees as the pixel value of the pixel is generated. Then, the edge image generation unit 241 to the edge image generation unit 244 supply the generated edge image to the pyramid image generation unit 245 to the pyramid image generation unit 248.

ステップＳ２２２において、ピラミッド画像生成部２４５乃至ピラミッド画像生成部２４８は、エッジ画像生成部２４１乃至エッジ画像生成部２４４からのエッジ画像を用いて、ピラミッド画像を生成し、差分算出部２４９乃至差分算出部２５２に供給する。 In step S222, the pyramid image generation unit 245 to the pyramid image generation unit 248 generate pyramid images using the edge images from the edge image generation unit 241 to the edge image generation unit 244, and the difference calculation unit 249 to the difference calculation unit. 252.

例えば、ピラミッド画像生成部２４５は、解像度の異なる複数の０度方向のエッジ画像を生成することにより、レベルＬ１乃至レベルＬ８の各階層のピラミッド画像を生成し、差分算出部２４９に供給する。同様に、ピラミッド画像生成部２４６乃至ピラミッド画像生成部２４８は、レベルＬ１乃至レベルＬ８の各階層のピラミッド画像を生成して差分算出部２５０乃至差分算出部２５２に供給する。 For example, the pyramid image generation unit 245 generates a plurality of edge images in the 0-degree direction having different resolutions, thereby generating pyramid images of the respective levels L1 to L8 and supplies the pyramid images to the difference calculation unit 249. Similarly, the pyramid image generation unit 246 to the pyramid image generation unit 248 generate pyramid images of the respective levels L1 to L8 and supply them to the difference calculation unit 250 to the difference calculation unit 252.

ステップＳ２２３において、差分算出部２４９乃至差分算出部２５２は、ピラミッド画像生成部２４５乃至ピラミッド画像生成部２４８からのピラミッド画像を用いて差分画像を生成して正規化し、エッジ情報マップ生成部２５３乃至エッジ情報マップ生成部２５６に供給する。差分画像の正規化では、例えば、画素の画素値が０乃至２５５の間の値となるようにされる。 In step S223, the difference calculation unit 249 through the difference calculation unit 252 generate and normalize the difference image using the pyramid image from the pyramid image generation unit 245 through the pyramid image generation unit 248, and the edge information map generation unit 253 through the edge. This is supplied to the information map generator 256. In normalization of the difference image, for example, the pixel value of the pixel is set to a value between 0 and 255.

例えば、差分算出部２４９は、ピラミッド画像生成部２４５から供給された、各階層の０度方向のエッジのピラミッド画像のうち、レベルＬ６およびレベルＬ３、レベルＬ７およびレベルＬ３、レベルＬ７およびレベルＬ４、レベルＬ８およびレベルＬ４、並びにレベルＬ８およびレベルＬ５の組み合わせのピラミッド画像の差分を求める。これにより、合計５つのエッジの差分画像が得られる。 For example, the difference calculation unit 249 includes the level L6 and the level L3, the level L7 and the level L3, the level L7 and the level L4, among the pyramid images of the 0-degree direction edge of each layer supplied from the pyramid image generation unit 245. The difference of the pyramid image of the combination of level L8 and level L4, and level L8 and level L5 is calculated | required. Thereby, a difference image of a total of five edges is obtained.

同様に、差分算出部２５０乃至差分算出部２５２は、各階層のピラミッド画像のうち、レベルＬ６およびレベルＬ３、レベルＬ７およびレベルＬ３、レベルＬ７およびレベルＬ４、レベルＬ８およびレベルＬ４、並びにレベルＬ８およびレベルＬ５の各階層の組み合わせのピラミッド画像の差分を求める。これにより、各方向のエッジについて、それぞれ合計５つの差分画像が得られる。 Similarly, the difference calculation unit 250 to the difference calculation unit 252 include the level L6 and the level L3, the level L7 and the level L3, the level L7 and the level L4, the level L8 and the level L4, and the level L8 and The difference of the pyramid image of the combination of each hierarchy of level L5 is calculated | required. As a result, a total of five difference images are obtained for the edges in each direction.

これらの差分画像を生成する処理は、エッジ画像にバンドパスフィルタを用いたフィルタ処理を施して、エッジ画像から所定の周波数成分を抽出することと等価である。このようにして得られた差分画像の画素の画素値は、各レベルのピラミッド画像のエッジ強度の差、つまり入力画像の所定の位置のエッジ強度と、その位置の周囲の平均的なエッジ強度との差を示している。 The process of generating these difference images is equivalent to performing a filter process using a bandpass filter on the edge image and extracting a predetermined frequency component from the edge image. The pixel values of the pixels of the difference image obtained in this way are the difference in edge strength between the pyramid images at each level, that is, the edge strength at a predetermined position of the input image and the average edge strength around that position. The difference is shown.

一般的に、画像において周囲と比べてエッジ強度の強い領域は、その画像を見る人の目を引く領域であるので、その領域は被写体の領域である可能性が高い。したがって、各差分画像において、より画素値の大きい画素が、より被写体の領域である可能性の高い領域であることを示しているということができる。 Generally, an area having a higher edge strength than the surrounding area in an image is an area that catches the eye of a person who sees the image, and therefore, there is a high possibility that the area is a subject area. Therefore, it can be said that in each difference image, a pixel having a larger pixel value is a region that is more likely to be a subject region.

ステップＳ２２４において、エッジ情報マップ生成部２５３乃至エッジ情報マップ生成部２５６は、差分算出部２４９乃至差分算出部２５２からの差分画像を用いて各方向のエッジ情報マップを生成し、被写体マップ生成部１５５に供給する。 In step S224, the edge information map generation unit 253 through the edge information map generation unit 256 generate edge information maps in the respective directions using the difference images from the difference calculation units 249 through 252, and the subject map generation unit 155. To supply.

例えば、エッジ情報マップ生成部２５３は、差分算出部２４９から供給された差分画像を、予め求められた差分重みＷａにより重み付き加算して０度方向のエッジ情報マップとする。 For example, the edge information map generation unit 253 weights and adds the difference image supplied from the difference calculation unit 249 with the difference weight Wa obtained in advance to obtain an edge information map in the 0 degree direction.

同様に、エッジ情報マップ生成部２５４は差分算出部２５０からの差分画像を差分重みＷａにより重み付き加算し、エッジ情報マップ生成部２５５は差分算出部２５１からの差分画像を差分重みＷａにより重み付き加算し、エッジ情報マップ生成部２５６は差分算出部２５２からの差分画像を差分重みＷａにより重み付き加算する。これにより、４５度、９０度、および１３５度の各方向のエッジ情報マップが得られる。なお、エッジ情報マップの生成時において、各差分画像が同じ大きさとなるように、差分画像のアップコンバートが行われる。 Similarly, the edge information map generation unit 254 weights and adds the difference image from the difference calculation unit 250 with the difference weight Wa, and the edge information map generation unit 255 weights the difference image from the difference calculation unit 251 with the difference weight Wa. In addition, the edge information map generation unit 256 weights and adds the difference image from the difference calculation unit 252 with the difference weight Wa. As a result, edge information maps in directions of 45 degrees, 90 degrees, and 135 degrees are obtained. Note that when the edge information map is generated, the difference image is up-converted so that each difference image has the same size.

エッジ情報マップ生成部２５３乃至エッジ情報マップ生成部２５６が、このようにして得られた各方向の合計４つのエッジ情報マップを被写体マップ生成部１５５に供給すると、エッジ情報抽出処理は終了し、処理は図１６のステップＳ１３４に進む。 When the edge information map generation unit 253 to the edge information map generation unit 256 supply a total of four edge information maps in each direction thus obtained to the subject map generation unit 155, the edge information extraction processing ends and the processing is completed. Advances to step S134 in FIG.

このようにして、エッジ情報抽出部１５３は、入力画像から特定の方向のエッジの差分画像を求め、その差分画像からエッジ情報マップを生成する。このようにして得られた方向ごとのエッジ情報マップによれば、入力画像において、周囲の領域と比べて、特定の方向へのエッジ強度の大きい領域、つまり入力画像を一瞥した観察者の目に付きやすい領域を簡単に検出することができる。 In this way, the edge information extraction unit 153 obtains a difference image of edges in a specific direction from the input image, and generates an edge information map from the difference image. According to the edge information map for each direction obtained in this way, in the input image, compared to the surrounding area, a region having a large edge strength in a specific direction, that is, the eyes of the observer who glanced at the input image. An easily attached region can be easily detected.

なお、エッジ情報抽出処理では、エッジの抽出にガボアフィルタが用いられると説明したが、その他、Sobelフィルタや、Robertsフィルタなどのエッジ抽出フィルタが用いられるようにしてもよい。 In the edge information extraction process, it has been described that a Gabor filter is used for edge extraction. However, an edge extraction filter such as a Sobel filter or a Roberts filter may be used.

［顔情報抽出処理の説明］
次に、図２０のフローチャートを参照して、図１６のステップＳ１３４の処理に対応する顔情報抽出処理について説明する。 [Description of face information extraction processing]
Next, face information extraction processing corresponding to the processing in step S134 in FIG. 16 will be described with reference to the flowchart in FIG.

ステップＳ２５１において、顔検出部２８１は、判定部６３から供給された入力画像から、人の顔の領域を検出し、その検出結果を顔情報マップ生成部２８２に供給する。例えば、顔検出部２８１は、入力画像に対してガボアフィルタを用いたフィルタ処理を施し、入力画像から人の目、口、鼻などの特徴的な領域を抽出することにより、入力画像における顔の領域を検出する。 In step S 251, the face detection unit 281 detects a human face region from the input image supplied from the determination unit 63, and supplies the detection result to the face information map generation unit 282. For example, the face detection unit 281 performs a filtering process using a Gabor filter on the input image, and extracts characteristic regions such as human eyes, mouth, and nose from the input image, whereby a face region in the input image is obtained. Is detected.

ステップＳ２５２において、顔情報マップ生成部２８２は、顔検出部２８１からの検出結果を用いて顔情報マップを生成し、被写体マップ生成部１５５に供給する。 In step S 252, the face information map generation unit 282 generates a face information map using the detection result from the face detection unit 281 and supplies the face information map to the subject map generation unit 155.

例えば、入力画像からの顔の検出結果として、顔が含まれると推定される入力画像上の矩形の領域（以下、候補領域と称する）が複数検出されたとする。ここで、入力画像上の所定の位置近傍に複数の候補領域が検出され、それらの候補領域の一部が互いに重なることもあることとする。すなわち、例えば、入力画像上の１つの顔の領域に対して、その顔を含む複数の領域が候補領域として得られた場合には、それらの候補領域の一部が互いに重なることになる。 For example, it is assumed that a plurality of rectangular areas (hereinafter referred to as candidate areas) on the input image that are estimated to include a face are detected as face detection results from the input image. Here, a plurality of candidate areas are detected near a predetermined position on the input image, and some of these candidate areas may overlap each other. That is, for example, when a plurality of areas including the face are obtained as candidate areas for one face area on the input image, some of these candidate areas overlap each other.

顔情報マップ生成部２８２は、顔の検出により得られた候補領域に対して、候補領域ごとに、入力画像と同じ大きさの検出画像を生成する。この検出画像は、検出画像上における処理対象の候補領域と同じ領域内の画素の画素値が、候補領域とは異なる領域内の画素の画素値よりも大きい値とされる。 The face information map generation unit 282 generates a detection image having the same size as the input image for each candidate region, with respect to the candidate region obtained by face detection. In this detected image, the pixel value of a pixel in the same region as the candidate region to be processed on the detected image is set to a value larger than the pixel value of a pixel in a region different from the candidate region.

また、検出画像上の画素の画素値は、より人の顔が含まれる可能性が高いと推定された候補領域の画素と同じ位置の画素ほど、画素値が大きくなる。顔情報マップ生成部２８２は、このようにして得られた検出画像を足し合わせて、１つの画像を生成して正規化し、顔情報マップとする。したがって、顔情報マップ上において、入力画像上の複数の候補領域の一部が重なる領域と同じ位置の領域の画素の画素値は大きくなり、より顔が含まれる可能性が高いことになる。なお、正規化は、顔情報マップの画素の画素値が、例えば０乃至２５５の間の値となるようにされる。 In addition, the pixel value of the pixel on the detected image has a larger pixel value as the pixel is located at the same position as the pixel in the candidate area that is estimated to be more likely to include a human face. The face information map generation unit 282 adds the detected images obtained in this way, generates one image, normalizes it, and sets it as a face information map. Therefore, on the face information map, the pixel value of the pixel in the region at the same position as the region where a part of the plurality of candidate regions on the input image overlaps increases, and there is a high possibility that a face will be included. Note that the normalization is performed so that the pixel value of the face information map pixel is a value between 0 and 255, for example.

顔情報マップが生成されると、顔情報抽出処理は終了し、処理は図１６のステップＳ１３５に進む。 When the face information map is generated, the face information extraction process ends, and the process proceeds to step S135 in FIG.

このようにして、顔情報抽出部１５４は、入力画像から顔を検出し、その検出結果から顔情報マップを生成する。このようにして得られた顔情報マップによれば、入力画像において、被写体としての人の顔の領域を簡単に検出することができる。 In this way, the face information extraction unit 154 detects a face from the input image and generates a face information map from the detection result. According to the face information map obtained in this way, it is possible to easily detect a human face area as a subject in an input image.

以上において説明した輝度情報抽出処理乃至顔情報抽出処理により、各情報マップが得られ、これらの情報マップから被写体マップが生成される。 Each information map is obtained by the luminance information extraction processing or face information extraction processing described above, and a subject map is generated from these information maps.

〈第２の実施の形態〉
［画像処理装置の構成］
また、以上においては、入力画像の特性に応じて２つのトラッキング方法のうちの何れかを選択する例について説明したが、３以上のトラッキング方法から、入力画像に適した１つのトラッキング方法が選択されるようにしてもよい。 <Second Embodiment>
[Configuration of image processing apparatus]
In the above description, an example of selecting one of the two tracking methods according to the characteristics of the input image has been described. However, one tracking method suitable for the input image is selected from the three or more tracking methods. You may make it do.

そのような場合、例えば画像処理装置１１は、図２１に示すように構成される。 In such a case, for example, the image processing apparatus 11 is configured as shown in FIG.

すなわち、画像処理装置１１は、顔検出部３１１、被写体領域決定部３１２、平坦判定部５１、トラッキング部５２、トラッキング部５３、保持部５４、表示制御部２３、および表示部２４から構成される。なお、図２１において、図２における場合と対応する部分には、同一の符号を付してあり、その説明は適宜省略する。 That is, the image processing apparatus 11 includes a face detection unit 311, a subject area determination unit 312, a flatness determination unit 51, a tracking unit 52, a tracking unit 53, a holding unit 54, a display control unit 23, and a display unit 24. In FIG. 21, the same reference numerals are given to the portions corresponding to those in FIG. 2, and the description thereof will be omitted as appropriate.

図２１の画像処理装置１１では、顔検出部３１１および平坦判定部５１が図１の切り替え部２１に対応し、被写体領域決定部３１２、トラッキング部５２、およびトラッキング部５３のそれぞれが、図１のトラッキング部２２に対応する。 In the image processing apparatus 11 of FIG. 21, the face detection unit 311 and the flatness determination unit 51 correspond to the switching unit 21 of FIG. 1, and each of the subject region determination unit 312, the tracking unit 52, and the tracking unit 53 is illustrated in FIG. 1. This corresponds to the tracking unit 22.

顔検出部３１１は、撮像装置から供給された入力画像から人の顔を検出し、検出の結果、入力画像から顔が検出された場合、被写体領域決定部３１２にその検出結果を供給し、被写体の検出を指示する。また、顔検出部３１１は、顔検出の結果、入力画像から顔が検出されなかった場合、平坦判定部５１に入力画像が平坦であるかの判定を指示する。 The face detection unit 311 detects a human face from the input image supplied from the imaging device. When a face is detected from the input image as a result of the detection, the face detection unit 311 supplies the detection result to the subject region determination unit 312. Instruct the detection of. Further, when a face is not detected from the input image as a result of the face detection, the face detection unit 311 instructs the flatness determination unit 51 to determine whether the input image is flat.

被写体領域決定部３１２は、顔検出部３１１から供給された顔の検出結果と、保持部５４に保持されている被写体領域情報とを用いて、入力画像から追尾対象の被写体を検出し、その検出結果を示す被写体領域情報を表示制御部２３と保持部５４に供給する。 The subject area determination unit 312 detects the subject to be tracked from the input image using the face detection result supplied from the face detection unit 311 and the subject area information held in the holding unit 54, and the detection thereof. Subject area information indicating the result is supplied to the display control unit 23 and the holding unit 54.

［トラッキング処理の説明］
次に、図２２のフローチャートを参照して、図２１の画像処理装置１１により行なわれるトラッキング処理について説明する。 [Description of tracking process]
Next, tracking processing performed by the image processing apparatus 11 of FIG. 21 will be described with reference to the flowchart of FIG.

ステップＳ２８１において、顔検出部３１１は、供給された入力画像から、人の顔を検出する。例えば、顔検出部３１１は、入力画像に対してガボアフィルタを用いたフィルタ処理を施して、入力画像から人の目や口、鼻などの特徴的な部位を抽出することにより、入力画像における顔の領域を検出する。 In step S281, the face detection unit 311 detects a human face from the supplied input image. For example, the face detection unit 311 performs a filtering process using a Gabor filter on the input image, and extracts characteristic parts such as a human eye, mouth, and nose from the input image, thereby detecting the face of the input image. Detect areas.

すなわち、顔検出部３１１では、入力画像の特性を特定するための特徴の特徴量として、入力画像の各領域における人の顔らしさを示す値が抽出される。なお、人の顔の検出は、入力画像から肌色の画素を検出することにより行なったり、テンプレートマッチングにより行なったりするようにしてもよい。 That is, the face detection unit 311 extracts a value indicating the human face-likeness in each region of the input image as the feature amount for specifying the characteristics of the input image. The human face may be detected by detecting skin color pixels from the input image or by template matching.

ステップＳ２８２において、顔検出部３１１は、入力画像から人の顔が検出されたか否かを判定する。ステップＳ２８２において、顔が検出されたと判定された場合、顔検出部３１１は、顔の検出結果を被写体領域決定部３１２に供給し、被写体の検出を指示する。追尾対象の被写体の検出が指示されると、処理はステップＳ２８３に進む。 In step S282, the face detection unit 311 determines whether a human face has been detected from the input image. If it is determined in step S282 that a face has been detected, the face detection unit 311 supplies the face detection result to the subject region determination unit 312 and instructs the subject detection. When detection of the tracking target subject is instructed, the process proceeds to step S283.

なお、追尾対象の被写体が人の顔でないことが、予めユーザにより指定されている場合には、入力画像から顔が検出されても、ステップＳ２８２において顔が検出されなかったと判定される。 If the user has previously designated that the subject to be tracked is not a human face, it is determined in step S282 that no face has been detected even if a face is detected from the input image.

ステップＳ２８３において、被写体領域決定部３１２は、顔検出部３１１から供給された顔の検出結果と、保持部５４に保持されている被写体領域情報とを用いて、被写体領域を決定し、被写体領域情報を生成する。 In step S283, the subject region determination unit 312 determines the subject region using the face detection result supplied from the face detection unit 311 and the subject region information held in the holding unit 54, and the subject region information. Is generated.

すなわち、被写体領域決定部３１２は、入力画像上において、顔検出部３１１により検出された顔が含まれる矩形領域のうち、前フレームの被写体領域情報により示される被写体領域に最も近い位置にある矩形領域を、現フレームの被写体領域として選択する。そして、被写体領域決定部３１２は、選択した被写体領域の位置を示す被写体領域情報を生成し、表示制御部２３に供給するとともに、被写体領域情報を保持部５４に供給し、保持させる。被写体領域情報が生成されると、その後、処理はステップＳ２８８に進む。 That is, the subject area determination unit 312 has a rectangular area that is closest to the subject area indicated by the subject area information of the previous frame among the rectangular areas that include the face detected by the face detection unit 311 on the input image. Are selected as the subject area of the current frame. Then, the subject region determination unit 312 generates subject region information indicating the position of the selected subject region, supplies the subject region information to the display control unit 23, and supplies the subject region information to the holding unit 54 for holding. After the subject area information is generated, the process proceeds to step S288.

また、ステップＳ２８２において、顔が検出されなかったと判定された場合、顔検出部３１１は、平坦判定部５１に入力画像が平坦であるかの判定を指示し、処理はステップＳ２８４に進む。そして、その後、ステップＳ２８４乃至ステップＳ２８９の処理が行われてトラッキング処理は終了するが、これらの処理は図１０のステップＳ１１乃至ステップＳ１６の処理と同様であるので、その説明は省略する。なお、ステップＳ２８９において、処理を終了しないと判定された場合、処理はステップＳ２８１に戻る。 If it is determined in step S282 that no face has been detected, the face detection unit 311 instructs the flatness determination unit 51 to determine whether the input image is flat, and the process proceeds to step S284. After that, the processing from step S284 to step S289 is performed and the tracking processing ends. However, since these processing are the same as the processing from step S11 to step S16 in FIG. 10, the description thereof is omitted. If it is determined in step S289 that the process is not ended, the process returns to step S281.

このようにして、画像処理装置１１は、入力画像から顔を検出し、顔が検出された場合には、その顔検出の結果を用いて、入力画像から追尾対象の被写体を検出する。また、画像処理装置１１は、顔が検出されなかった場合には、入力画像が平坦であるか否かに応じて、ビジュアルアテンションまたは動き検出の何れかを利用して、入力画像から追尾対象の被写体を検出する。 In this way, the image processing apparatus 11 detects a face from the input image, and when a face is detected, detects the subject to be tracked from the input image using the result of the face detection. In addition, when no face is detected, the image processing apparatus 11 uses either visual attention or motion detection depending on whether or not the input image is flat, and the tracking target is detected from the input image. Detect the subject.

このように、入力画像が人の顔を含む画像であるか、平坦であるかといった入力画像の特性を特定し、特定された特性を有する画像を得意とするトラッキング方法により、入力画像から被写体を検出するようにしたので、より簡単かつ迅速に、安定して被写体をトラッキングすることができる。 In this way, the characteristics of the input image, such as whether the input image is an image including a human face or is flat, are specified, and a subject is extracted from the input image by a tracking method that is good at an image having the specified characteristics. Since the detection is made, the subject can be tracked more easily, quickly and stably.

〈第３の実施の形態〉
［画像処理装置の構成］
以上においては、トラッキング方法として、ビジュアルアテンションや動き検出を利用する方法を例に説明したが、入力画像の各領域の色に関する情報や、輪郭に関する情報が利用されてトラッキングされるようにしてもよい。そのような場合、画像処理装置１１は、例えば図２３に示すように構成される。 <Third Embodiment>
[Configuration of image processing apparatus]
In the above, a method using visual attention or motion detection has been described as an example of the tracking method. However, information regarding the color of each region of the input image or information regarding the contour may be used for tracking. . In such a case, the image processing apparatus 11 is configured as shown in FIG. 23, for example.

図２３の画像処理装置１１は、色判定部３４１、トラッキング部３４２、トラッキング部３４３、保持部５４、表示制御部２３、および表示部２４から構成される。なお、図２３において、図２における場合と対応する部分には、同一の符号を付してあり、その説明は適宜省略する。 23 includes a color determination unit 341, a tracking unit 342, a tracking unit 343, a holding unit 54, a display control unit 23, and a display unit 24. In FIG. 23, parts corresponding to those in FIG. 2 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.

色判定部３４１は、入力画像の特性を特定するための特徴の特徴量として、入力画像から画素の色成分を抽出し、入力画像の被写体領域である前景と、入力画像の被写体領域を除く領域である背景との色の分布が類似しているか否かを判定する。色判定部３４１は、前景ヒストグラム生成部３５１、背景ヒストグラム生成部３５２、および距離算出部３５３から構成される。 The color determination unit 341 extracts a color component of a pixel from the input image as a feature amount for specifying the characteristics of the input image, and excludes the foreground that is the subject region of the input image and the subject region of the input image. It is determined whether the color distribution with the background is similar. The color determination unit 341 includes a foreground histogram generation unit 351, a background histogram generation unit 352, and a distance calculation unit 353.

前景ヒストグラム生成部３５１は、保持部５４から前フレームの入力画像と被写体領域情報を取得して、前フレームの入力画像の被写体領域（前景）の色の分布を示す前景ヒストグラムを生成する。前景ヒストグラムは、前景を構成する画素の色の範囲をビンとし、各ビンに属す（分類された）前景内の画素の数を頻度とするヒストグラムである。前景ヒストグラム生成部３５１は、生成した前景ヒストグラムを距離算出部３５３に供給する。 The foreground histogram generation unit 351 acquires the input image and subject area information of the previous frame from the holding unit 54, and generates a foreground histogram indicating the color distribution of the subject area (foreground) of the input image of the previous frame. The foreground histogram is a histogram in which the color range of the pixels constituting the foreground is binned, and the number of pixels in the foreground belonging to (classified) each bin is the frequency. The foreground histogram generation unit 351 supplies the generated foreground histogram to the distance calculation unit 353.

背景ヒストグラム生成部３５２は、保持部５４から前フレームの入力画像と被写体領域情報を取得して、前フレームの入力画像の背景の色の分布を示す背景ヒストグラムを生成する。背景ヒストグラムは、背景を構成する画素の色の範囲をビンとし、各ビンに属す（分類された）背景内の画素の数を頻度とするヒストグラムである。背景ヒストグラム生成部３５２は、生成した背景ヒストグラムを距離算出部３５３に供給する。 The background histogram generation unit 352 acquires the input image and subject area information of the previous frame from the holding unit 54, and generates a background histogram indicating the distribution of the background color of the input image of the previous frame. The background histogram is a histogram in which the color range of the pixels constituting the background is a bin and the number of pixels in the background (classified) belonging to each bin is the frequency. The background histogram generation unit 352 supplies the generated background histogram to the distance calculation unit 353.

距離算出部３５３は、前景ヒストグラム生成部３５１からの前景ヒストグラムと、背景ヒストグラム生成部３５２からの背景ヒストグラムとの距離、つまりそれらのヒストグラムの類似の度合いを算出する。また、距離算出部３５３は、算出した距離に応じて、トラッキング部３４２またはトラッキング部３４３の何れか一方に、入力画像からの被写体の検出を指示する。なお、距離算出部３５３は、トラッキング部３４３に被写体の検出を指示する場合、トラッキング部３４３に前景ヒストグラムを供給する。 The distance calculation unit 353 calculates the distance between the foreground histogram from the foreground histogram generation unit 351 and the background histogram from the background histogram generation unit 352, that is, the degree of similarity between these histograms. The distance calculation unit 353 instructs the tracking unit 342 or the tracking unit 343 to detect the subject from the input image according to the calculated distance. The distance calculation unit 353 supplies the foreground histogram to the tracking unit 343 when instructing the tracking unit 343 to detect the subject.

トラッキング部３４２は、距離算出部３５３の指示に応じて、供給された現フレームの入力画像と、保持部５４に保持されている前フレームの入力画像および被写体領域情報とを用いて、被写体の輪郭を利用して、入力画像から被写体を検出する。また、トラッキング部３４２は、被写体の検出結果を示す被写体領域情報を生成し、表示制御部２３に供給するとともに、被写体領域情報を保持部５４に供給し、保持させる。 The tracking unit 342 uses the supplied input image of the current frame, the input image of the previous frame held in the holding unit 54, and the subject area information in accordance with an instruction from the distance calculating unit 353, and uses the contour of the subject. Is used to detect the subject from the input image. Further, the tracking unit 342 generates subject area information indicating the detection result of the subject, supplies the subject area information to the display control unit 23, and supplies the subject area information to the holding unit 54 to hold it.

トラッキング部３４３は、距離算出部３５３の指示に応じて、供給された現フレームの入力画像と、距離算出部３５３からの前景ヒストグラムとを用い、入力画像の色の分布を利用して入力画像から被写体を検出し、被写体領域情報を生成する。また、トラッキング部３４３は、生成した被写体領域情報を表示制御部２３に供給するとともに、被写体領域情報を保持部５４に供給し、保持させる。 The tracking unit 343 uses the input image of the current frame supplied and the foreground histogram from the distance calculation unit 353 according to the instruction of the distance calculation unit 353, and uses the color distribution of the input image to calculate the input image. A subject is detected and subject region information is generated. The tracking unit 343 supplies the generated subject area information to the display control unit 23 and also supplies the subject area information to the holding unit 54 to hold it.

［トラッキング部３４２の構成］
また、図２３のトラッキング部３４２およびトラッキング部３４３は、より詳細には図２４および図２５に示すように構成される。 [Configuration of Tracking Unit 342]
Further, the tracking unit 342 and the tracking unit 343 in FIG. 23 are configured as shown in FIGS. 24 and 25 in more detail.

図２４は、トラッキング部３４２の構成例を示す図である。トラッキング部３４２は、輪郭画像生成部３８１、輪郭画像生成部３８２、および被写体領域決定部３８３から構成される。 FIG. 24 is a diagram illustrating a configuration example of the tracking unit 342. The tracking unit 342 includes a contour image generation unit 381, a contour image generation unit 382, and a subject area determination unit 383.

輪郭画像生成部３８１は、保持部５４から前フレームの入力画像および被写体領域情報を取得して、前フレームの入力画像の被写体領域の輪郭を示す前景輪郭画像を生成し、被写体領域決定部３８３に供給する。輪郭画像生成部３８２は、撮像装置から供給された現フレームの入力画像を用いて、その入力画像上の被写体の輪郭を示す輪郭画像を生成し、被写体領域決定部３８３に供給する。 The contour image generation unit 381 acquires the input image and subject area information of the previous frame from the holding unit 54, generates a foreground contour image indicating the contour of the subject area of the input image of the previous frame, and sends it to the subject region determination unit 383. Supply. The contour image generation unit 382 generates a contour image indicating the contour of the subject on the input image using the input image of the current frame supplied from the imaging device, and supplies the contour image to the subject region determination unit 383.

被写体領域決定部３８３は、輪郭画像生成部３８２から供給された、現フレームの輪郭画像上において、輪郭画像生成部３８１から供給された前フレームの前景輪郭画像と最も類似する領域を検索することで、現フレームの入力画像上の被写体領域を検出する。被写体領域決定部３８３は、検出された被写体領域の位置を示す被写体領域情報を、表示制御部２３および保持部５４に供給する。 The subject region determination unit 383 searches the contour image of the current frame supplied from the contour image generation unit 382 for a region most similar to the foreground contour image of the previous frame supplied from the contour image generation unit 381. A subject area on the input image of the current frame is detected. The subject area determination unit 383 supplies subject area information indicating the position of the detected subject area to the display control unit 23 and the holding unit 54.

［トラッキング部３４３の構成］
図２５は、トラッキング部３４３のより詳細な構成例を示す図である。トラッキング部３４３は、ヒストグラム生成部４１１および被写体領域決定部４１２から構成される。 [Configuration of Tracking Unit 343]
FIG. 25 is a diagram illustrating a more detailed configuration example of the tracking unit 343. The tracking unit 343 includes a histogram generation unit 411 and a subject area determination unit 412.

ヒストグラム生成部４１１は、撮像装置から供給された現フレームの入力画像上の領域を比較対象領域とし、入力画像上の比較対象領域内の画素の色の分布を示すヒストグラムを生成する。このとき、ヒストグラム生成部４１１は、保持部５４に保持されている被写体領域情報を参照し、比較対象領域を前フレームの被写体領域と同じ大きさの領域とするとともに、入力画像における比較対象領域の位置をずらしながら、入力画像上の各領域を比較対象領域とする。ヒストグラム生成部４１１は、生成した現フレームの入力画像の各比較対象領域のヒストグラムを被写体領域決定部４１２に供給する。 The histogram generation unit 411 uses a region on the input image of the current frame supplied from the imaging device as a comparison target region, and generates a histogram indicating the color distribution of pixels in the comparison target region on the input image. At this time, the histogram generation unit 411 refers to the subject area information held in the holding unit 54, sets the comparison target area to the same size as the subject area of the previous frame, and sets the comparison target area in the input image. While shifting the position, each area on the input image is set as a comparison target area. The histogram generation unit 411 supplies a histogram of each comparison target region of the generated input image of the current frame to the subject region determination unit 412.

被写体領域決定部４１２は、距離算出部３５３から供給された、前フレームの前景ヒストグラムと、ヒストグラム生成部４１１から供給された、現フレームの比較対象領域のヒストグラムとを用いて、現フレームの入力画像の被写体領域を検出する。また、被写体領域決定部４１２は、検出された被写体領域の位置を示す被写体領域情報を、表示制御部２３および保持部５４に供給する。 The subject area determination unit 412 uses the foreground histogram of the previous frame supplied from the distance calculation unit 353 and the histogram of the comparison target area of the current frame supplied from the histogram generation unit 411 to input an image of the current frame. The subject area is detected. The subject area determination unit 412 supplies subject area information indicating the position of the detected subject area to the display control unit 23 and the holding unit 54.

［トラッキング処理の説明］
次に、図２６のフローチャートを参照して、図２３の画像処理装置１１により行なわれるトラッキング処理について説明する。 [Description of tracking process]
Next, tracking processing performed by the image processing apparatus 11 in FIG. 23 will be described with reference to the flowchart in FIG.

ステップＳ３１１において、前景ヒストグラム生成部３５１は、保持部５４に保持されている前フレームの入力画像および被写体領域情報を用いて、前フレームの前景ヒストグラムを生成し、距離算出部３５３に供給する。 In step S 311, the foreground histogram generation unit 351 generates a foreground histogram of the previous frame using the input image and subject area information stored in the holding unit 54 and supplies the foreground histogram to the distance calculation unit 353.

ステップＳ３１２において、背景ヒストグラム生成部３５２は、保持部５４に保持されている前フレームの入力画像および被写体領域情報を用いて、前フレームの背景ヒストグラムを生成し、距離算出部３５３に供給する。 In step S 312, the background histogram generation unit 352 generates a background histogram of the previous frame using the previous frame input image and subject area information held in the holding unit 54, and supplies the background histogram to the distance calculation unit 353.

ステップＳ３１３において、距離算出部３５３は、前景ヒストグラム生成部３５１からの前景ヒストグラムと、背景ヒストグラム生成部３５２からの背景ヒストグラムとの距離を算出する。例えば、前景ヒストグラムと背景ヒストグラムとの距離は、ＥＭＤ（Earth Mover's Distance）などとされる。 In step S313, the distance calculation unit 353 calculates the distance between the foreground histogram from the foreground histogram generation unit 351 and the background histogram from the background histogram generation unit 352. For example, the distance between the foreground histogram and the background histogram is EMD (Earth Mover's Distance).

ステップＳ３１４において、距離算出部３５３は、算出した前景ヒストグラムと背景ヒストグラムの距離が、予め定められた閾値ｔｈＤ以下であるか否かを判定する。 In step S314, the distance calculation unit 353 determines whether or not the calculated distance between the foreground histogram and the background histogram is equal to or less than a predetermined threshold thD.

ステップＳ３１４において、距離が閾値ｔｈＤ以下であると判定された場合、距離算出部３５３は、トラッキング部３４２に追尾対象の被写体の検出を指示し、処理はステップＳ３１５に進む。 If it is determined in step S314 that the distance is equal to or smaller than the threshold thD, the distance calculation unit 353 instructs the tracking unit 342 to detect the tracking target subject, and the process proceeds to step S315.

前景ヒストグラムと背景ヒストグラムの距離が閾値ｔｈＤ以下となるのは、前フレームにおいて、入力画像上の前景（被写体領域）と背景の色の分布がある程度類似している場合である。入力画像の被写体の領域と背景の色の分布が似ていると、入力画像上の被写体部分の領域と背景部分の領域とを精度よく分離することが困難になるので、入力画像の色に関する情報を用いて追尾対象の被写体を検出すると、その検出精度が低下してしまう。 The distance between the foreground histogram and the background histogram is equal to or less than the threshold thD when the foreground (subject area) on the input image and the background color distribution are somewhat similar in the previous frame. If the subject area of the input image and the background color distribution are similar, it will be difficult to accurately separate the subject area and the background area on the input image. If the subject to be tracked is detected using, the detection accuracy decreases.

これに対して、入力画像上の各被写体の輪郭を利用して、入力画像から追尾対象の被写体を検出する場合には、入力画像の色に関する情報は用いられないので、前景と背景の色の分布が類似している場合であっても、高精度に被写体の検出が可能である。 On the other hand, when the subject to be tracked is detected from the input image using the contour of each subject on the input image, information on the color of the input image is not used. Even if the distribution is similar, the subject can be detected with high accuracy.

そこで、距離算出部３５３は、前景ヒストグラムと背景ヒストグラムの距離が閾値ｔｈＤ以下である場合、トラッキング部３４２に輪郭を利用したトラッキングを指示する。 Therefore, when the distance between the foreground histogram and the background histogram is equal to or less than the threshold thD, the distance calculation unit 353 instructs the tracking unit 342 to perform tracking using the contour.

ステップＳ３１５において、輪郭画像生成部３８１は、保持部５４に保持されている前フレームの入力画像および被写体領域情報を用いて、入力画像の被写体領域にフィルタ処理を施すことで、被写体領域内にある被写体の輪郭を抽出し、前景輪郭画像を生成する。 In step S315, the contour image generation unit 381 applies the filtering process to the subject area of the input image using the input image and subject area information of the previous frame held in the holding unit 54, so that it is within the subject area. The contour of the subject is extracted and a foreground contour image is generated.

この前景輪郭画像は、前フレームの被写体領域と同じ大きさの画像であり、被写体領域内にある被写体のエッジの部分と同じ位置にある画素の画素値が「１」となり、被写体のエッジではない部分と同じ位置にある画素の画素値が「０」となる画像である。 This foreground contour image is an image having the same size as the subject area of the previous frame, and the pixel value of the pixel at the same position as the edge portion of the subject in the subject area is “1”, not the subject edge. This is an image in which the pixel value of a pixel at the same position as the portion is “0”.

輪郭画像生成部３８１は、前フレームの前景輪郭画像を生成すると、生成した前景輪郭画像を被写体領域決定部３８３に供給する。 When the foreground contour image of the previous frame is generated, the contour image generation unit 381 supplies the generated foreground contour image to the subject region determination unit 383.

ステップＳ３１６において、輪郭画像生成部３８２は、供給された現フレームの入力画像にフィルタ処理を施すことで入力画像上にある被写体の輪郭を抽出し、輪郭画像を生成する。輪郭画像生成部３８２は、得られた輪郭画像を被写体領域決定部３８３に供給する。 In step S316, the contour image generation unit 382 extracts the contour of the subject on the input image by performing a filtering process on the supplied input image of the current frame, and generates a contour image. The contour image generation unit 382 supplies the obtained contour image to the subject region determination unit 383.

現フレームの輪郭画像は、前景輪郭画像と同様に、入力画像上の被写体のエッジの部分と同じ位置にある画素の画素値が「１」となり、被写体のエッジではない部分と同じ位置にある画素の画素値が「０」となる画像である。また、輪郭画像は入力画像と同じ大きさとされる。 As in the foreground contour image, the contour image of the current frame has a pixel value “1” at the same position as the edge portion of the subject on the input image, and the pixel at the same position as the portion that is not the edge of the subject. Is an image with a pixel value of “0”. The contour image is the same size as the input image.

ステップＳ３１７において、被写体領域決定部３８３は、輪郭画像生成部３８２から供給された現フレームの輪郭画像と、輪郭画像生成部３８１から供給された前フレームの前景輪郭画像とに基づいて、現フレームの入力画像上の被写体領域を特定する。 In step S317, the subject region determination unit 383 determines the current frame based on the contour image of the current frame supplied from the contour image generation unit 382 and the foreground contour image of the previous frame supplied from the contour image generation unit 381. The subject area on the input image is specified.

具体的には、被写体領域決定部３８３は、現フレームの輪郭画像上における前景輪郭画像と同じ大きさの領域を処理対象の領域（以下、比較領域とも称する）とし、比較領域と前景輪郭画像との画素の画素値の差分絶対値和を求める。つまり、比較領域と前景輪郭画像の同じ位置の画素の画素値の差分の絶対値の総和が求められる。 Specifically, the subject region determination unit 383 sets a region having the same size as the foreground contour image on the contour image of the current frame as a processing target region (hereinafter also referred to as a comparison region), and compares the comparison region and the foreground contour image. The sum of absolute differences of the pixel values of the pixels is obtained. That is, the sum of absolute values of differences between pixel values of pixels at the same position in the comparison region and the foreground contour image is obtained.

ここで、例えば比較領域と前景輪郭画像との同じ位置に、同じ被写体があれば、画素値の差分絶対値和は「０」となるはずである。また、比較領域と同じ位置の現フレームの入力画像の部分が、前フレームの入力画像の被写体領域により似た（近い）画像であるほど、比較領域と前景輪郭画像の差分絶対値和は小さくなるはずである。 Here, for example, if there is the same subject at the same position in the comparison area and the foreground contour image, the sum of absolute differences of pixel values should be “0”. In addition, the difference absolute value sum between the comparison region and the foreground contour image becomes smaller as the portion of the input image of the current frame at the same position as the comparison region is an image that is more similar (closer) to the subject region of the input image of the previous frame. It should be.

被写体領域決定部３８３は、現フレームの輪郭画像における比較領域とする領域の位置をずらしながら、輪郭画像の各位置を比較領域として、それらの比較領域と前景輪郭画像との差分絶対値和を求める。そして、被写体領域決定部３８３は、求めた差分絶対値和が最小となる比較領域と同じ位置にある現フレームの入力画像の領域を、被写体領域とする。 The subject region determination unit 383 obtains the sum of absolute differences between the comparison region and the foreground contour image using each position of the contour image as the comparison region while shifting the position of the region as the comparison region in the contour image of the current frame. . Then, the subject area determination unit 383 sets the area of the input image in the current frame at the same position as the comparison area where the calculated sum of absolute differences is minimized as the subject area.

前景輪郭画像は、前フレームの被写体領域の輪郭を示す画像である。したがって、前景輪郭画像との差分絶対値和が最小となる領域、つまり前景輪郭画像と最も類似する現フレームの輪郭画像の領域と同じ位置にある現フレームの入力画像の領域には、追尾対象の被写体が含まれているはずである。そこで、被写体領域決定部３８３は、前景輪郭画像と最も類似する輪郭画像の領域と同じ位置にある、現フレームの入力画像上の領域を、現フレームの被写体領域とする。 The foreground contour image is an image showing the contour of the subject area of the previous frame. Therefore, in the region where the sum of absolute differences from the foreground contour image is the minimum, that is, the region of the input image of the current frame at the same position as the region of the contour image of the current frame most similar to the foreground contour image, The subject should be included. Therefore, the subject region determination unit 383 sets the region on the input image of the current frame that is at the same position as the region of the contour image most similar to the foreground contour image as the subject region of the current frame.

ステップＳ３１８において、被写体領域決定部３８３は、特定した現フレームの被写体領域の位置を示す被写体領域情報を生成し、表示制御部２３に供給するとともに、被写体領域情報を保持部５４に供給し、保持させる。そして、その後、処理はステップＳ３２２に進む。 In step S318, the subject region determination unit 383 generates subject region information indicating the position of the subject region of the identified current frame, supplies the subject region information to the display control unit 23, and supplies the subject region information to the holding unit 54 for holding. Let Thereafter, the process proceeds to step S322.

また、ステップＳ３１４において、前景ヒストグラムと背景ヒストグラムの距離が閾値ｔｈＤより大きい、つまり閾値を超えると判定された場合、距離算出部３５３は、トラッキング部３４３に追尾対象の被写体の検出を指示し、処理はステップＳ３１９に進む。また、このとき、距離算出部３５３は、前フレームの前景ヒストグラムをトラッキング部３４３の被写体領域決定部４１２に供給する。 In step S314, if it is determined that the distance between the foreground histogram and the background histogram is greater than the threshold thD, that is, exceeds the threshold, the distance calculation unit 353 instructs the tracking unit 343 to detect the subject to be tracked, and performs processing. Advances to step S319. At this time, the distance calculation unit 353 supplies the foreground histogram of the previous frame to the subject region determination unit 412 of the tracking unit 343.

前景ヒストグラムと背景ヒストグラムの距離が閾値ｔｈＤよりも大きくなるのは、前フレームにおいて、入力画像上の前景（被写体領域）と背景の色の分布がある程度異なる場合である。そのような場合、色に関する情報を指標とすれば、入力画像の前景と背景の特徴が大きく異なることになるため、色に関する情報を用いて容易に前景と背景を分離することができる。 The distance between the foreground histogram and the background histogram is larger than the threshold thD when the foreground (subject area) and background color distribution on the input image are somewhat different in the previous frame. In such a case, if the information about the color is used as an index, the foreground and background characteristics of the input image are greatly different. Therefore, the foreground and the background can be easily separated using the information about the color.

一方、被写体の輪郭を利用して追尾対象の被写体を検出する場合には、前景と背景の色の分布が異なる場合であっても、入力画像全体でエッジが少なければ、精度よく追尾対象の被写体を検出できない可能性がある。 On the other hand, when the subject to be tracked is detected using the contour of the subject, even if the foreground and background color distributions are different, if the entire input image has few edges, the subject to be tracked can be accurately detected. May not be detected.

そこで、距離算出部３５３は、前景ヒストグラムと背景ヒストグラムの距離が閾値ｔｈＤよりも大きい場合、トラッキング部３４３に色のヒストグラムを利用したトラッキングを指示する。 Therefore, when the distance between the foreground histogram and the background histogram is larger than the threshold thD, the distance calculation unit 353 instructs the tracking unit 343 to perform tracking using the color histogram.

ステップＳ３１９において、ヒストグラム生成部４１１は保持部５４に保持されている被写体領域情報を参照し、供給された現フレームの入力画像の比較対象領域とする位置をずらしながら、比較対象領域の色のヒストグラムを生成し、被写体領域決定部４１２に供給する。 In step S319, the histogram generation unit 411 refers to the subject area information held in the holding unit 54, and shifts the position of the supplied current frame input image as the comparison target area while shifting the position of the histogram of the color of the comparison target area. Is generated and supplied to the subject region determination unit 412.

ステップＳ３２０において、被写体領域決定部４１２は、距離算出部３５３から供給された前フレームの前景ヒストグラムと、ヒストグラム生成部４１１から供給された現フレームの比較対象領域のヒストグラムとを用いて、入力画像の被写体領域を特定する。 In step S320, the subject region determination unit 412 uses the foreground histogram of the previous frame supplied from the distance calculation unit 353 and the histogram of the comparison target region of the current frame supplied from the histogram generation unit 411. Identify the subject area.

具体的には、被写体領域決定部４１２は、各比較対象領域について、比較対象領域のヒストグラムと、前景ヒストグラムとの距離を求める。この距離は、例えば、ＥＭＤなどのヒストグラム同士の類似の度合いを示す距離とされる。そして、被写体領域決定部４１２は、現フレームの入力画像の比較対象領域のうち、最も前景ヒストグラムとの距離が小さい比較対象領域を、現フレームの被写体領域として選択する。 Specifically, the subject region determination unit 412 obtains the distance between the comparison target region histogram and the foreground histogram for each comparison target region. This distance is, for example, a distance indicating the degree of similarity between histograms such as EMD. Then, the subject area determination unit 412 selects the comparison target area having the smallest distance from the foreground histogram among the comparison target areas of the input image of the current frame as the subject area of the current frame.

前景ヒストグラムは、前フレームの入力画像の被写体領域の色のヒストグラムである。したがって、入力画像において、最も前景ヒストグラムと類似するヒストグラムが得られる領域（比較対象領域）は、色の分布を指標とした場合に、前フレームの被写体領域と最も類似する領域であり、追尾対象の被写体が含まれている可能性が高い。 The foreground histogram is a color histogram of the subject area of the input image of the previous frame. Therefore, in the input image, the region (comparison target region) from which the histogram most similar to the foreground histogram is obtained is the region most similar to the subject region in the previous frame when the color distribution is used as an index, and is the tracking target. There is a high possibility that the subject is included.

そこで、被写体領域決定部４１２は、現フレームの入力画像上において、前景ヒストグラムとの距離が最も小さい比較対象領域を、現フレームの被写体領域とする。 Therefore, the subject region determination unit 412 sets the comparison target region having the smallest distance from the foreground histogram on the input image of the current frame as the subject region of the current frame.

ステップＳ３２１において、被写体領域決定部４１２は、特定した現フレームの被写体領域の位置を示す被写体領域情報を生成し、表示制御部２３に供給するとともに、被写体領域情報を保持部５４に供給し、保持させる。そして、その後、処理はステップＳ３２２に進む。 In step S321, the subject region determination unit 412 generates subject region information indicating the position of the subject region of the specified current frame, supplies the subject region information to the display control unit 23, and supplies the subject region information to the holding unit 54 for holding. Let Thereafter, the process proceeds to step S322.

ステップＳ３１８またはステップＳ３２１において、被写体領域情報が生成されると、その後、ステップＳ３２２およびステップＳ３２３の処理が行われてトラッキング処理は終了する。すなわち、表示制御部２３は、現フレームの入力画像を表示部２４に表示させるとともに、被写体領域情報に基づいて、入力画像上に被写体枠を表示させる。なお、これらの処理は図１０のステップＳ１５およびステップＳ１６の処理と同様であるので、その詳細な説明は省略する。 When the subject area information is generated in step S318 or step S321, the processing in step S322 and step S323 is performed thereafter, and the tracking process ends. That is, the display control unit 23 displays the input image of the current frame on the display unit 24 and displays a subject frame on the input image based on the subject region information. Since these processes are the same as the processes in steps S15 and S16 in FIG. 10, detailed description thereof will be omitted.

なお、ステップＳ３２３において、処理を終了しないと判定された場合、処理はステップＳ３１１に戻り、次のフレームの被写体領域が検出される。 If it is determined in step S323 that the process is not to be terminated, the process returns to step S311 to detect the subject area of the next frame.

このようにして、画像処理装置１１は、フレームごとに、入力画像の前景と背景の色の分布が類似しているか否かを判定し、その判定結果に応じて、被写体の輪郭または色の分布の何れかを利用した方法により入力画像から被写体を検出し、被写体枠を表示させる。 In this manner, the image processing apparatus 11 determines whether the foreground and background color distributions of the input image are similar for each frame, and the contour or color distribution of the subject according to the determination result. A subject is detected from the input image by a method using any of the above, and a subject frame is displayed.

このように、前景と背景の色の分布が類似している画像であるかといった、入力画像の特性を特定し、特定された特性を有する画像を得意とするトラッキング方法により、追尾対象の被写体を検出するようにしたので、より簡単かつ迅速に、安定して被写体をトラッキングすることができる。 In this way, the characteristics of the input image, such as whether the color distribution of the foreground and the background is similar, are specified, and the tracking target subject is identified by a tracking method that is good at images having the specified characteristics. Since the detection is made, the subject can be tracked more easily, quickly and stably.

なお、入力画像から追尾対象の被写体を検出する方法は、ビジュアルアテンションを利用した方法等、以上において説明した方法に限らず、例えば前フレームの被写体領域と、現フレームの入力画像とのブロックマッチングなど、どのような方法であってもよい。また、入力画像の特性として、入力画像全体の明るさ等から昼または夜など、どのようなシーンで撮像された画像であるかなどが特定されるようにしてもよい。 Note that the method of detecting the tracking target subject from the input image is not limited to the method described above, such as a method using visual attention. For example, block matching between the subject region of the previous frame and the input image of the current frame, etc. Any method may be used. Further, as a characteristic of the input image, it may be specified in what scene the image is captured such as day or night from the brightness of the entire input image.

上述した一連の処理は、ハードウェアにより実行することもできるし、ソフトウェアにより実行することもできる。一連の処理をソフトウェアにより実行する場合には、そのソフトウェアを構成するプログラムが、専用のハードウェアに組み込まれているコンピュータ、または、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータなどに、プログラム記録媒体からインストールされる。 The series of processes described above can be executed by hardware or can be executed by software. When a series of processing is executed by software, a program constituting the software may execute various functions by installing a computer incorporated in dedicated hardware or various programs. For example, it is installed from a program recording medium in a general-purpose personal computer or the like.

図２７は、上述した一連の処理をプログラムにより実行するコンピュータのハードウェアの構成例を示すブロック図である。 FIG. 27 is a block diagram illustrating a configuration example of hardware of a computer that executes the above-described series of processing by a program.

コンピュータにおいて、CPU（Central Processing Unit）６０１，ROM（Read Only Memory）６０２，RAM（Random Access Memory）６０３は、バス６０４により相互に接続されている。 In a computer, a CPU (Central Processing Unit) 601, a ROM (Read Only Memory) 602, and a RAM (Random Access Memory) 603 are connected to each other by a bus 604.

バス６０４には、さらに、入出力インターフェース６０５が接続されている。入出力インターフェース６０５には、キーボード、マウス、マイクロホンなどよりなる入力部６０６、ディスプレイ、スピーカなどよりなる出力部６０７、ハードディスクや不揮発性のメモリなどよりなる記録部６０８、ネットワークインターフェースなどよりなる通信部６０９、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリなどのリムーバブルメディア６１１を駆動するドライブ６１０が接続されている。 An input / output interface 605 is further connected to the bus 604. The input / output interface 605 includes an input unit 606 including a keyboard, a mouse, and a microphone, an output unit 607 including a display and a speaker, a recording unit 608 including a hard disk and a non-volatile memory, and a communication unit 609 including a network interface. A drive 610 for driving a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is connected.

以上のように構成されるコンピュータでは、CPU６０１が、例えば、記録部６０８に記録されているプログラムを、入出力インターフェース６０５及びバス６０４を介して、RAM６０３にロードして実行することにより、上述した一連の処理が行われる。 In the computer configured as described above, the CPU 601 loads the program recorded in the recording unit 608 to the RAM 603 via the input / output interface 605 and the bus 604 and executes the program, for example. Is performed.

コンピュータ（CPU６０１）が実行するプログラムは、例えば、磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM(Compact Disc-Read Only Memory),DVD(Digital Versatile Disc)等）、光磁気ディスク、もしくは半導体メモリなどよりなるパッケージメディアであるリムーバブルメディア６１１に記録して、あるいは、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の伝送媒体を介して提供される。 The program executed by the computer (CPU 601) is, for example, a magnetic disk (including a flexible disk), an optical disk (CD-ROM (Compact Disc-Read Only Memory), DVD (Digital Versatile Disc), etc.), a magneto-optical disk, or a semiconductor. It is recorded on a removable medium 611 that is a package medium composed of a memory or the like, or provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.

そして、プログラムは、リムーバブルメディア６１１をドライブ６１０に装着することにより、入出力インターフェース６０５を介して、記録部６０８にインストールすることができる。また、プログラムは、有線または無線の伝送媒体を介して、通信部６０９で受信し、記録部６０８にインストールすることができる。その他、プログラムは、ROM６０２や記録部６０８に、あらかじめインストールしておくことができる。 The program can be installed in the recording unit 608 via the input / output interface 605 by attaching the removable medium 611 to the drive 610. Further, the program can be received by the communication unit 609 via a wired or wireless transmission medium and installed in the recording unit 608. In addition, the program can be installed in the ROM 602 or the recording unit 608 in advance.

なお、コンピュータが実行するプログラムは、本明細書で説明する順序に沿って時系列に処理が行われるプログラムであっても良いし、並列に、あるいは呼び出しが行われたとき等の必要なタイミングで処理が行われるプログラムであっても良い。 The program executed by the computer may be a program that is processed in time series in the order described in this specification, or in parallel or at a necessary timing such as when a call is made. It may be a program for processing.

なお、本発明の実施の形態は、上述した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiment of the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the gist of the present invention.

１１画像処理装置，２１切り替え部，２２−１乃至２２−Ｎ，２２トラッキング部，５１平坦判定部，５２トラッキング部，５３トラッキング部，９１ブロック動き検出部，９２被写体動き検出部，９３被写体領域決定部，１２１被写体抽出部，１２２被写体候補領域決定部，１２３被写体領域決定部，３１１顔検出部，３１２被写体領域決定部，３４１色判定部，３４２トラッキング部，３４３トラッキング部 DESCRIPTION OF SYMBOLS 11 Image processing apparatus, 21 Switching part, 22-1 thru | or 22-N, 22 Tracking part, 51 Flatness determination part, 52 Tracking part, 53 Tracking part, 91 Block motion detection part, 92 Subject motion detection part, 93 Subject area determination , 121 subject extraction unit, 122 subject candidate region determination unit, 123 subject region determination unit, 311 face detection unit, 312 subject region determination unit, 341 color determination unit, 342 tracking unit, 343 tracking unit

Claims

An image processing apparatus for detecting a subject from the input image for each of a plurality of continuous input images,
Based on the input image of the current frame to be processed and the detection result of the tracking target subject in the previous frame temporally prior to the current frame, the tracking target subject is obtained from the input image of the current frame. Tracking means for detecting
A feature amount of a first feature determined in advance from the input image is extracted, and a characteristic of the input image is specified based on the feature amount. An image processing apparatus comprising: a switching unit configured to detect the tracking target subject in any of the plurality of tracking units that detect the tracking target subject from an input image.

The switching means is
Flatness calculation means for extracting a variance value of pixel values of pixels in each region of the input image as the feature amount, and calculating a flatness indicating a degree of flatness of the input image from the variance value;
It is determined from the flatness whether the input image is a flat image with little change in pixel values in the spatial direction, and the tracking is performed in any of the plurality of tracking units according to the determination result. The image processing apparatus according to claim 1, further comprising: a determination unit that detects a target subject.

The tracking means includes
If the input image is not a flat image, motion detection is performed using a subject area including the subject to be tracked on the input image of the previous frame and the input image of the current frame, and the subject First tracking means for detecting a region of the subject to be tracked on the input image of the current frame by obtaining a movement of the region;
When the input image is a flat image, feature quantities of a plurality of second features are extracted from the input image of the current frame, and subjects in each region of the input image are extracted from the feature quantities of the second feature. Generating a subject map indicating the likelihood, and among regions that are likely to be a subject in the input image specified by the subject map, a region including a region at the same position as the subject region of the previous frame is included in the current frame. The image processing apparatus according to claim 2, further comprising: a second tracking unit that detects the tracking target subject area on the input image.

The switching means further comprises face detection means for detecting a human face from the input image of the current frame,
The tracking means is based on a detection result of the person's face from the input image of the current frame, of the area of the person's face detected from the input image of the current frame, and A third tracking unit for detecting a region closest to the subject region as a region of the subject to be tracked on the input image of the current frame;
The face detection means, when the human face is detected from the input image, causes the third tracking means to detect the subject to be tracked,
The image processing apparatus according to claim 3, wherein the flatness calculation unit calculates the flatness when the human face is not detected from the input image.

The switching means is
A foreground histogram that extracts a color component of a pixel of the input image as the feature amount and generates a foreground histogram indicating a distribution of pixel colors in a subject region including the subject to be tracked on the input image of the previous frame. Generating means;
Background histogram generation means for extracting a color component of a pixel of the input image as the feature amount, and generating a background histogram indicating a color distribution of pixels in a region excluding the subject region in the input image of the previous frame;
The determination unit according to claim 1, further comprising: a determination unit configured to detect the tracking target subject in any of the plurality of tracking units according to a distance indicating a degree of similarity between the foreground histogram and the background histogram. Image processing device.

The tracking means includes
When the distance is less than or equal to a predetermined threshold, the contour image showing the contour of the subject in each area of the input image of the current frame is most similar to the foreground contour image showing the contour of the subject in the subject area of the previous frame. A fourth tracking means for detecting the tracking target object on the input image of the current frame by searching a region having a high degree of
When the distance is greater than the threshold, the input image of the current frame is searched for an area from which a histogram showing a color distribution having the highest degree of similarity with the foreground histogram is obtained, and The image processing apparatus according to claim 5, further comprising: a fifth tracking unit that detects the tracking target subject on the input image.

An image processing apparatus that detects a subject from the input image for each of a plurality of continuous input images,
Based on the input image of the current frame to be processed and the detection result of the tracking target subject in the previous frame temporally prior to the current frame, the tracking target subject is obtained from the input image of the current frame. Tracking means for detecting
A feature amount of a predetermined feature is extracted from the input image, the characteristic of the input image is specified based on the feature amount, and the input image is extracted from the input image by different methods according to the characteristic of the input image. An image processing method for an image processing apparatus, comprising: a switching unit that detects the tracking target subject in any of the plurality of tracking units that detect the tracking target subject,
The switching unit causes the tracking target to be detected by any one of the plurality of tracking units according to the characteristics of the input image,
The tracking unit detects the tracking target object on the input image of the current frame based on the input image of the current frame and the detection result of the tracking target object of the previous frame. Including image processing method.

An image processing program for detecting a subject from the input image for each of a plurality of consecutive frames of input images,
A feature amount of a predetermined feature is extracted from the input image, the characteristic of the input image is specified based on the feature amount, and the input image is extracted from the input image by different methods according to the characteristic of the input image. Let any one of a plurality of tracking means for detecting a tracking target subject detect the tracking target subject;
Based on the input image of the current frame to be processed and the detection result of the subject to be tracked in the previous frame temporally prior to the current frame, the tracking means A program for causing a computer to execute processing including a step of detecting the subject to be tracked.