JP3726973B2

JP3726973B2 - Subject recognition apparatus and method

Info

Publication number: JP3726973B2
Application number: JP13451296A
Authority: JP
Inventors: 太郎水藤; 忠房富高; 正和小柳
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1996-05-29
Filing date: 1996-05-29
Publication date: 2005-12-14
Anticipated expiration: 2016-05-29
Also published as: JPH09322050A

Description

【０００１】
【発明の属する技術分野】
本発明は、被写体認識装置および方法に関し、特に、適応的に被写体のモデルを変形することにより、より確実に被写体を認識することができるようにした被写体認識装置および方法に関する。
【０００２】
【従来の技術】
所定の被写体をビデオカメラにより自動的に追尾させようとする場合、追尾すべき被写体を予め登録する処理が必要となる。従来、このような登録をするのに、被写体の画像を背景画像とともに撮像し、撮像した結果得られる画像中において、被写体を枠で囲むなどして指定し、その枠の中の画像データの特徴量を求め、その特徴量を有する画像を被写体の画像として認識するようにしている。
【０００３】
【発明が解決しようとする課題】
このように、従来の装置においては、背景（枠の外部のデータ）を考慮せずに、被写体（枠の内部のデータ）の特徴量を抽出するようにしている。このため、背景に被写体に似た画像が存在するような場合、被写体を正確に認識することができなくなる課題があった。
【０００４】
本発明はこのような状況に鑑みてなされたものであり、被写体を確実に認識することができるようにするものである。
【０００５】
【課題を解決するための手段】
請求項１に記載の被写体認識装置は、被写体と背景を含む画像の画像データを記憶する記憶手段と、記憶手段に記憶された画像データの画像中の、所定の枠の内の領域の画像の特徴量に基づいて、被写体の画像の特徴量のモデルである被写体モデルを生成するモデル生成手段と、記憶手段に記憶された画像データの画像中の、枠の外の領域において、モデル生成手段により生成された被写体モデルに含まれる画素の数を計数する計数手段と、計数手段により計数された値が、基準値より大きいとき、被写体モデルの範囲を狭くするよう、被写体モデルを変形する変形手段と、変形手段により変形された被写体モデルに基づいて、被写体を認識する認識手段とを備えることを特徴とする。
【０００６】
請求項８に記載の被写体認識方法は、被写体と背景を含む画像の画像データを記憶し、記憶された画像データの画像中の、所定の枠の内の領域の画像の特徴量に基づいて、被写体の画像の特徴量のモデルである被写体モデルを生成し、記憶された画像データの画像中の、枠の外の領域において、生成された被写体モデルに含まれる画素の数を計数し、計数された値が、基準値より大きいとき、被写体モデルの範囲を狭くするよう、被写体モデルを変形し、変形された被写体モデルに基づいて、被写体を認識することを特徴とする。
請求項９に記載の被写体認識装置は、被写体と背景を含む画像の画像データを記憶する記憶手段と、記憶手段に記憶された画像データの画像中の、所定の枠の内の領域の画像の特徴量に基づいて、被写体の画像の特徴量のモデルである被写体モデルを生成するモデル生成手段と、記憶手段に記憶された画像データの画像中の、枠の外の領域において、モデル生成手段により生成された被写体モデルに含まれる画素の数を計数する計数手段と、計数手段により計数された値が、基準値より小さいとき、被写体モデルの範囲を広くするよう、被写体モデルを変形する変形手段と、変形手段により変形された被写体モデルに基づいて、被写体を認識する認識手段とを備えることを特徴とする。
請求項１６に記載の被写体認識方法は、被写体と背景を含む画像の画像データを記憶し、記憶された画像データの画像中の、所定の枠の内の領域の画像の特徴量に基づいて、被写体の画像の特徴量のモデルである被写体モデルを生成し、記憶された画像データの画像中の、枠の外の領域において、生成された被写体モデルに含まれる画素の数を計数し、計数された値が、基準値より小さいとき、被写体モデルの範囲を広くするよう、被写体モデルを変形し、変形された被写体モデルに基づいて、被写体を認識することを特徴とする。
【０００７】
請求項１に記載の被写体認識装置および請求項８に記載の被写体認識方法においては、被写体と背景を含む画像の画像データが記憶され、その記憶された画像データの画像中の、所定の枠の内の領域の画像の特徴量に基づいて、被写体の画像の特徴量のモデルである被写体モデルが生成され、記憶された画像データの画像中の、枠の外の領域において、生成された被写体モデルに含まれる画素の数が計数され、計数された値が、基準値より小さいとき、被写体モデルの範囲を広くするよう、被写体モデルが変形され、変形された被写体モデルに基づいて、被写体が認識される。
請求項９に記載の被写体認識装置および請求項１６に記載の被写体認識方法においては、被写体と背景を含む画像の画像データが記憶され、その記憶された画像データの画像中の、所定の枠の内の領域の画像の特徴量に基づいて、被写体の画像の特徴量のモデルである被写体モデルが生成され、記憶された画像データの画像中の、枠の外の領域において、生成された被写体モデルに含まれる画素の数が計数され、計数された値が、基準値より小さいとき、被写体モデルの範囲を広くするよう、被写体モデルが変形され、変形された被写体モデルに基づいて、被写体が認識される。
【０００８】
【発明の実施の形態】
図１は、本発明の被写体認識装置を適用したビデオカメラの一実施例の構成を示している。レンズブロック１（撮像手段）は、レンズ２、アイリス３、およびＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）４から構成され、被写体からの光Ｌを撮像し、電気信号としての画像信号を出力する。すなわち、被写体からの光Ｌは、レンズ２により、アイリス３を介してＣＣＤ４上に結像される。これによりＣＣＤ４からは、その受光量に対応した画像信号が出力される。なお、アイリス３は、いわゆるオートアイリス（ＡＥ）機構を構成しており、ＣＣＤ４で受光される光量を適正な値に調整するようになされている。
【０００９】
レンズブロック１から出力された画像信号は、サンプルホールド（Ｓ／Ｈ）および自動利得調整（ＡｕｔｏｍａｔｉｃＧａｉｎＣｏｎｔｒｏｌ（ＡＧＣ））回路５においてサンプルホールドされ、さらにオートアイリス機構からの制御信号によって、所定のゲインを持つように利得制御された後、Ａ／Ｄ変換器６に出力される。
【００１０】
なお、本実施例では、オートアイリス機構で露光量を制御するようにしたが、これを機能させず、固定の露出で撮像を行うようにすることも可能である。
【００１１】
Ａ／Ｄ変換器６は、サンプルホールドおよび自動利得調整回路５からの画像信号（アナログ信号）を所定のクロックに従ってＡ／Ｄ変換する。Ａ／Ｄ変換器６によってディジタル信号とされた画像信号は、ディジタルカメラ処理回路７に供給される。ディジタルカメラ処理回路７は、Ａ／Ｄ変換器６からの画像信号に基づいて、その画像信号に対応する画像を構成する各画素の輝度信号Ｙ、ならびに色差信号Ｒ−Ｙ，Ｂ−Ｙ、およびクロマ信号Ｃを生成する。例えばＮＴＳＣ方式の輝度信号Ｙおよびクロマ信号Ｃは、Ｄ／Ａ変換器８に出力され、そこでＤ／Ａ変換された後、モニタ１２に供給される。これにより、モニタ１２には、レンズブロック１で撮像された画像が表示される。このモニタ１２にはまた、枠表示ＩＣ１３より出力された枠も重畳表示されるようになされている。
【００１２】
また、ディジタルカメラ処理回路７で生成された輝度信号Ｙと、色差信号Ｒ−Ｙ，Ｂ−Ｙは、被写体認識回路９に供給される。被写体認識回路９は、ディジタルカメラ処理回路７からの輝度信号Ｙと色差信号Ｒ−Ｙ，Ｂ−Ｙで構成される画像の中から、追尾すべき被写体を検出する。
【００１３】
このため、被写体認識回路９は、フレームメモリで構成される画像メモリ１０（記憶手段）と、マイクロプロセッサで構成される追尾信号処理回路１１とを有する。画像メモリ１０は追尾信号処理回路１１から書き込み許可信号Ｓ１を受信すると、ディジタルカメラ処理回路７が出力する輝度信号Ｙと、色差信号Ｒ−Ｙ，Ｂ−Ｙを、それぞれ独立に画素単位で記憶する。
【００１４】
ここで、以下適宜、色差信号Ｒ−Ｙ，Ｂ−ＹをそれぞれＲ，Ｂと略記する。また、レンズブロック１が出力する画像の最も左上の画素の位置を原点（０，０）とし、その位置の左からｉ番目で、かつ、上からｊ番目の画素の輝度信号Ｙ、色差信号Ｒ，Ｂを、以下適宜、それぞれＹｉｊ、Ｒｉｊ、Ｂｉｊと表す。さらに以下適宜、輝度信号Ｙ、色差信号Ｒ，Ｂをまとめて、画像データともいう。
【００１５】
画像メモリ１０は、１フレーム（または１フィールド）分の画像データを記憶すると、読みだし許可信号Ｓ２を、追尾信号処理回路１１に出力する。その後、画像メモリ１０は、追尾信号処理回路１１が出力するメモリアドレス（上述のｉ，ｊに対応する）Ｓ３を受信すると、そのアドレスに記憶された画像データＳ４を追尾信号処理回路１１に出力する。
【００１６】
追尾信号処理回路１１は、画像メモリ１０から読みだし許可信号Ｓ２を受信すると、被写体の追尾に必要な画像データＳ４を、上述したように、画像メモリ１０にアドレス（メモリアドレス）Ｓ３を与えることで読みだし、これにより、レンズブロック１より出力された画像から、追尾すべき被写体を検出する。その後、追尾信号処理回路１１は書き込み許可信号Ｓ１を画像メモリ１０に供給し、これにより、画像メモリ１０ではレンズブロック１で撮像された画像が新たに記憶される（すでに記憶されている画像に上書きされる）。このとき、画像メモリ１０は上述したように読みだし許可信号Ｓ２を再び出力する。以下同様にして、画像メモリ１０ではレンズブロック１で撮像された画像が順次記憶されていく。
【００１７】
また、追尾信号処理回路１１は被写体を検出すると、その被写体がレンズブロック１から出力される画像の中央に表示されるように、パンモータ１４およびチルトモータ１５（駆動手段）を駆動する。各種のキー、スイッチ、ボタンなどよりなる入力部１７は、被写体設定ボタン１６を有する。このボタン１６は、被写体設定処理が完了したとき操作される。
【００１８】
次に図２のフローチャートを参照して、追尾信号処理回路１１内で行われる一連の処理について説明する。まず最初に、ステップ１において、追尾すべき被写体の設定処理が完了したか否か（被写体設定ボタン１６が操作されたか否か）が判定される。被写体設定処理がまだ完了していないとき、使用者は、入力部１７を操作して、追尾信号処理回路１１に被写体設定処理を指令する。このとき、追尾信号処理回路１１は、枠表示ＩＣ１３を制御し、被写体設定枠（指定手段）を発生させ、モニタ１２に出力し、表示させる。これにより、モニタ１２には、例えば図３に示すように、被写体設定枠Ｄが表示される。
【００１９】
一方、レンズ２とアイリス３を介してＣＣＤ４に入射された被写体からの光が、ＣＣＤ４において光電変換され、サンプルホールドおよび自動利得制御回路５によりサンプリングされ、かつ、適当なゲインを持つように入力制御された後、Ａ／Ｄ変換器６に入力される。Ａ／Ｄ変換器６において、Ａ／Ｄ変換された信号は、ディジタルカメラ処理回路７に入力され、輝度信号Ｙとクロマ信号Ｃが生成される。この輝度信号Ｙとクロマ信号Ｃは、Ｄ／Ａ変換器８によりＤ／Ａ変換された後、モニタ１２に出力され、表示される。従って、図３に示すように、モニタ１２には、被写体の画像が背景画像とともに、表示される。そして、そこには、上述したように、被写体設定枠Ｄも重畳表示される。
【００２０】
図３に示される被写体設定枠Ｄは、レンズブロック１が撮像する画像の所定の位置（この実施例の場合、画面の中央）に配置されており、ユーザは追尾すべき被写体を設定するために、その被写体設定枠Ｄ内にその被写体が表示されるように、レンズブロック１をパンニングまたはチルティングする。すなわち、ユーザは入力部１７を操作して、追尾信号処理回路１１にレンズブロック１を所定の方向にパンニングまたはチルティングさせるように指令する。追尾信号処理回路１１は、この指令に対応して、パンモータ１４とチルトモータ１５に制御信号を出力する。その結果、パンモータ１４とチルトモータ１５は、レンズブロック１を所望のパン位置とチルト位置に駆動する。
【００２１】
このようにして、追尾すべき被写体を被写体設定枠Ｄ内に配置するようにした後、ユーザは、被写体設定処理が完了したことを入力するために、被写体設定ボタン１６を操作する。
【００２２】
ステップ１において被写体設定ボタン１６が操作されたと判定された場合、ステップ２において被写体設定枠Ｄの内部の画像データが、画像メモリ１０から読み出される。例えば、図３の実施例の場合、所定の人物の顔の画像データが、被写体のデータとして画像メモリ１０から読みだされる。
【００２３】
この被写体設定枠Ｄの内部の画像データは、その特徴量を表すために、輝度信号Ｙｉｊ、および色差信号Ｒｉｊ，Ｂｉｊの組（Ｙｉｊ，Ｒｉｊ，Ｂｉｊ）で規定される点の集合とされる。そして、この点は、図４に示すように、（Ｒ−Ｙ，Ｙ）（Ｒ，Ｙ）平面（図４（Ａ））と、（Ｂ−Ｙ，Ｙ）（Ｂ，Ｙ）平面（図４（Ｂ））上にプロットされる。換言すれば、これらの平面上の位置（座標）が、被写体の特徴量を表すものとされる。
【００２４】
ただし、点（Ｙｉｊ，Ｒｉｊ，Ｂｉｊ）の集合にはノイズが含まれ、この集合は、被写体を表す代表的な点の集合に過ぎない。そこで、ステップ３では、点（Ｙｉｊ，Ｒｉｊ，Ｂｉｊ）の集合に幅を持たせるために、（Ｙｉｊ，ＨＲｉｊ，ＨＢｉｊ）、（Ｙｉｊ，ＬＲｉｊ，ＬＢｉｊ）を被写体情報として生成する。ここで、ＨＲｉｊ，ＬＲｉｊ，ＨＢｉｊ，ＬＢｉｊは、それぞれ次式に従って計算される。
ＨＲｉｊ＝Ｒｉｊ×（１＋α）
ＬＲｉｊ＝Ｒｉｊ×（１−α）
ＨＢｉｊ＝Ｂｉｊ×（１＋α）
ＬＢｉｊ＝Ｂｉｊ×（１−α）
【００２５】
なお、上式においては、αは正の定数であり、所定の画素を被写体の画素として認識するための許容誤差を表している。
【００２６】
以上のように、図４に示すデータ（ステップ２で取得したデータ）に対して、上記演算を施すことにより、図５に示すように、許容誤差αを考慮したデータが得られる。図５（Ａ）は、点（Ｙｉｊ，ＨＲｉｊ）、（Ｙｉｊ，ＬＲｉｊ）をプロットしたものを、また、図５（Ｂ）は、点（Ｙｉｊ，ＨＢｉｊ）、（Ｙｉｊ，ＬＢｉｊ）をプロットしたものを、それぞれ示す。なお、本実施例では、ＲおよびＢを表す値として、−１２８〜１２７の範囲を割り当てている。
【００２７】
次にステップ４（モデル生成手段）で、図５（Ａ）、図５（Ｂ）に示す許容誤差αを考慮した点集合に対し、Ｙを引き数として、ＲまたはＢに関する例えば２次関数で近似した被写体モデルを作る。本実施例では、異なる被写体について近似した場合でも、ある程度、似通った形の被写体モデル（本実施例では２次関数）が得られるように、２次関数のＹ切片（被写体モデルである２次関数がＹ軸と交わる点）が決められている。
【００２８】
具体的に述べると、それぞれのＹ切片は、Ｙ−Ｒ座標系については、図５（Ａ）に示すように、ＲｌｏｗおよびＲｈｉｇｈ（ただし、Ｒｌｏｗ＜Ｒｈｉｇｈとする）が予め設定され、Ｙ−Ｂ座標系については、図５（Ｂ）に示すように、ＢｌｏｗおよびＢｈｉｇｈ（ただし、Ｂｌｏｗ＜Ｂｈｉｇｈとする）が予め設定される。
【００２９】
このように、Ｙ切片を固定した状態で、図５（Ａ）の（Ｙｉｊ，ＨＲｉｊ）および（Ｙｉｊ，ＬＲｉｊ）と、図５（Ｂ）の（Ｙｉｊ，ＨＢｉｊ）および（Ｙｉｊ，ＬＢｉｊ）、それぞれについて、２次関数で近似（例えば最小自乗近似）を行い、次式で示される被写体モデルとしての２次関数ＨＦｒ（Ｙ）（Ｙに関するＲの上限特徴モデル）、ＬＦｒ（Ｙ）（Ｙに関するＲの下限特徴モデル）、ＨＦｂ（Ｙ）（Ｙに関するＢの上限特徴モデル）、ＬＦｂ（Ｙ）（Ｙに関するＢの下限特徴モデル）が生成される。
ＨＦｒ（Ｙ）＝Ａ０×（Ｙ−Ｒｌｏｗ）×（Ｙ−Ｒｈｉｇｈ）
ＨＦｂ（Ｙ）＝Ａ１×（Ｙ−Ｂｌｏｗ）×（Ｙ−Ｂｈｉｇｈ）
ＬＦｒ（Ｙ）＝Ａ２×（Ｙ−Ｒｌｏｗ）×（Ｙ−Ｒｈｉｇｈ）
ＬＦｂ（Ｙ）＝Ａ３×（Ｙ−Ｂｌｏｗ）×（Ｙ−Ｂｈｉｇｈ）
【００３０】
ここで、Ａ０は（Ｙｉｊ，ＨＲｉｊ）の、Ａ１は（Ｙｉｊ，ＬＲｉｊ）の、Ａ２は（Ｙｉｊ，ＨＢｉｊ）の、Ａ３は（Ｙｉｊ，ＬＢｉｊ）の、それぞれデータに対する近似により求められた定数である。
【００３１】
このようにして定められた、（Ｒ，Ｙ）平面上において、ＨＦｒ（Ｙ）とＬＦｒ（Ｙ）の間に存在し、かつ、（Ｂ，Ｙ）平面上において、ＨＦｂ（Ｙ）とＬＦｂ（Ｙ）の間に存在する点の画素が、被写体に対応する画素とされる。
【００３２】
なお、以上のようにして、モデルを作成する方法の詳細は、本出願人が特願平８−１１６５５号として先に開示している。
【００３３】
ステップ５（変形手段）では、ステップ４で作られた被写体モデル（図６（Ａ），（Ｂ））を、背景に応じて修正する。つまり、被写体初期設定時の画面において、被写体設定枠の外側の領域（背景）に、被写体モデルを満たす領域が存在する場合、被写体モデルを細く絞り（図７（Ａ），（Ｂ））、存在しない場合、被写体モデルを広げるように（図８（Ａ），（Ｂ））修正する。
【００３４】
すなわち、ステップ５の背景によるモデル変更の処理の詳細を示すと、図９のフローチャートに示すようになる。ステップ２１においては、図３における被写体設定枠Ｄの外側の領域の全ての画素のデータを画像メモリ１０から読みだす。そして、ステップ２２（計数手段）において、ステップ４で生成された被写体のモデル（図６）に含まれる画素の数Ｎを計数する。すなわち、各画素（Ｙｉｊ，Ｒｉｊ，Ｂｉｊ）のうち、次の式を満足する画素の数を計数する。
ＬＦｒ（Ｙｉｊ）＜Ｒｉｊ＜ＨＦｒ（Ｙｉｊ）
ＬＦｂ（Ｙｉｊ）＜Ｂｉｊ＜ＨＦｂ（Ｙｉｊ）
【００３５】
次に、ステップ２３において、ステップ２２で計数した画素の数Ｎが、予め設定してある所定の基準値β（β＞０）より大きいか否かを判定する。計数された数Ｎが基準値βより大きいと判定された場合、ステップ２４に進み、ステップ４で生成された被写体のモデルの許容誤差αを、α１（α１＜α）に設定する。これにより、図７に示すように、被写体モデルが狭くなることになる。
【００３６】
これに対して、ステップ２３において、計数した数Ｎが基準値β以下であると判定された場合、ステップ２５に進み、計数した数Ｎが基準値γ（γ＞０）より小さいか否かを判定する。数Ｎが基準値γより小さい場合においては、ステップ２６に進み、許容誤差αを、α２（α２＞α）に設定する。これにより、図８に示すように、ステップ４で生成された被写体モデル（図６）の幅が広げられる。
【００３７】
このようにして、背景に被写体に近似した画像が存在する場合においては、被写体のモデルの幅が狭くなるように修正するようにし、被写体でない部分が被写体として誤って認識されることを抑制する。
【００３８】
逆に、背景に被写体に似た画像が存在しない場合においては、被写体モデルの幅を広げることにより、より確実に被写体を認識することができるようにする。
【００３９】
次に、図２に戻って、ステップ６（認識手段）では、ステップ５で生成された被写体モデル（修正された被写体モデル）を用いて、画像メモリ１０の中から被写体の一部と予想される画素を抽出する。すなわち、レンズブロック１で撮像され、画像メモリ１０に記憶された画像を構成する各画素のうち、その輝度Ｙｉｊと、色差信号ＲｉｊおよびＢｉｊが、それぞれ次の２つの式の両方を満足するものを被写体の構成画素として抽出する。
ＬＦｒ（Ｙｉｊ）＜Ｒｉｊ＜ＨＦｒ（Ｙｉｊ）
ＬＦｂ（Ｙｉｊ）＜Ｂｉｊ＜ＨＦｂ（Ｙｉｊ）
【００４０】
すなわち、図５（Ａ）に示した２つの２次関数（但し、修正された関数）ＬＦｒ（Ｙｉｊ）とＨＦｒ（Ｙｉｊ）との間にプロットされ、かつ、図５（Ｂ）に示した２つの２次関数（但し、修正された関数）ＬＦｂ（Ｙｉｊ）とＨＦｂ（Ｙｉｊ）との間にプロットされる画素が、被写体を構成する画素として検出される。
【００４１】
ステップ６において、画像メモリ１０に記憶された画像から被写体構成画素の検出が行われた後、ステップ７において、その被写体構成画素の数により、被写体が存在するかどうかを判定する。すなわち、ステップ６で検出された被写体構成画素の数が所定の閾値δより大きい場合、画像メモリ１０に記憶された画像の中に、被写体が存在すると判定し、被写体構成画素の数が所定の閾値δ以下である場合、画像メモリ１０に記憶された画像の中に、被写体は存在しないと判定する。
【００４２】
ステップ７で画像メモリ１０の中に被写体が存在すると判定された場合、ステップ８において、ステップ７で検出された被写体構成画素のうち、その被写体構成画素で構成される領域の周辺にある、いわばノイズ的な領域を除去するために、被写体構成画素で構成される領域に対してフィルタリング処理を行う。例えば、図１０（Ａ）に影を付して示すように、被写体構成画素が検出されている場合には、このフィルタリング処理により、図１０（Ａ）に影を付して示す被写体構成領域は、図１０（Ｂ）に示すように変形される。
【００４３】
その後、ステップ９において、図１０（Ｂ）に示すように、ステップ７で検出された被写体構成画素集団を囲むように表示枠（認識された被写体であることを示す枠）を表示させる。このため、追尾信号処理回路１１は、枠表示ＩＣ１３を制御し、表示枠を表示させる位置に枠パルスを発生させる。モニタ１２は、枠パルスを映像信号に重畳する。例えば図１１に示すように表示枠を表示する場合、図１１の矢印で示す位置のラインの枠パルスは、図１２に示すようになり、この枠パルスをそのラインの映像信号に重畳すると、図１３に示すような映像信号が得られる。この映像信号をモニタ１２で表示することにより、図１１に示すような画像が表示される。
【００４４】
その後、ステップ１０において、ステップ８でフィルタリングされた被写体構成画素の集合の重心（例えば、水平方向をｘ軸、垂直方向をｙ軸とするｘｙ平面上の重心）が求められ（図１０（Ｂ）において×印で示す位置（座標）が求められ）、これが被写体の位置とされる。
【００４５】
さらに、ステップ１１において、ステップ１０で算出された被写体の位置が、レンズブロック１から出力される画像の中央の位置に一致するように、パンモータ１４、およびチルトモータ１５を回転駆動し、これによりレンズブロック１がパンニングおよびチルティングされ、モニタ１２上の被写体が表示画面中央に引き込まれる。
【００４６】
次に、ステップ１２に進み、処理の終了が指令されたか否かが判定され、指令されていなければ、ステップ６に戻り、それ以降の処理が繰り返し実行される。処理の終了が指令されていれば、処理が終了される。
【００４７】
また、ステップ７において、被写体が存在しないと判定された場合、ステップ１３に進み、枠を消去する処理が実行される。すなわち、追尾信号処理回路１１は、枠表示ＩＣ１３を制御し、枠パルスの発生を中止させる。枠を消去する場合は、枠パルスを０Ｖにすればよい。なお、この他、枠を消去する代わりに、枠の大きさを変化させたり、枠の大きさを最大にしたりして、被写体が認識されている場合と異なるように表示させてもよい。
【００４８】
上記実施例においては、被写体認識回路９を、ビデオカメラ内に内蔵させるようにしたが、ビデオカメラの外部の装置として設けるようにすることも可能である。
【００４９】
このように、このビデオカメラを用いて、例えば所定の室内を監視し、室内への進入者を自動追尾したり、テレビ会議システムにおいて、発言者を自動追尾するシステムを実現することができる。
【００５０】
なお、本発明は、ビデオカメラ以外にも適用することができる。
【００５１】
【発明の効果】
以上の如く、請求項１に記載の被写体認識装置および請求項８に記載の被写体認識方法によれば、背景に対して適応的に被写体モデルが生成され、背景に拘らず、被写体を確実に認識することが可能となる。また、被写体でない部分が被写体として誤って認識されることを抑制することができる。
さらに、請求項９に記載の被写体認識装置および請求項１６に記載の被写体認識方法によれば、背景に対して適応的に被写体モデルが生成され、背景に拘らず、被写体を確実に認識することが可能となる。また、より確実に被写体を認識することができる。
【図面の簡単な説明】
【図１】本発明の被写体認識装置を適用したビデオカメラの構成例を示すブロック図である。
【図２】図１の実施例の追尾信号処理回路１１の動作を説明するフローチャートである。
【図３】被写体の設定を説明する図である。
【図４】被写体設定時における被写体画像を説明する図である。
【図５】被写体モデルを説明する図である。
【図６】被写体モデルを説明する図である。
【図７】狭くした被写体モデルを説明する図である。
【図８】広くした被写体モデルを説明する図である。
【図９】図２のステップ５の背景によるモデル変更処理の詳細を示すフローチャートである。
【図１０】図２のステップ８のフィルタリング処理を説明する図である。
【図１１】枠の表示を説明する図である。
【図１２】枠パルスの例を示す図である。
【図１３】映像信号に枠パルスを重畳した信号を示す図である。
【符号の説明】
１レンズブロック，２レンズ，３アイリス，４ＣＣＤ，５サンプルホールドおよび自動利得調整回路，６Ａ／Ｄ変換器，７ディジタルカメラ処理回路，８Ｄ／Ａ変換器，９被写体認識回路，１０画像メモリ，１１追尾信号処理回路，１２モニタ，１３枠表示ＩＣ，１４パンモータ，１５チルトモータ，１６被写体設定ボタン，１７入力部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a subject recognition apparatus and method, and more particularly, to a subject recognition apparatus and method that can recognize a subject more reliably by adaptively deforming a subject model.
[0002]
[Prior art]
When a predetermined subject is to be automatically tracked by a video camera, a process for registering a subject to be tracked in advance is required. Conventionally, in order to perform such registration, an image of a subject is imaged together with a background image, and the image obtained as a result of imaging is specified by surrounding the subject with a frame, etc., and features of image data in the frame An amount is obtained, and an image having the feature amount is recognized as an image of the subject.
[0003]
[Problems to be solved by the invention]
As described above, in the conventional apparatus, the feature amount of the subject (data inside the frame) is extracted without considering the background (data outside the frame). For this reason, when an image similar to the subject exists in the background, there is a problem that the subject cannot be accurately recognized.
[0004]
The present invention has been made in view of such a situation, and makes it possible to reliably recognize a subject.
[0005]
[Means for Solving the Problems]
The subject recognition apparatus according to claim 1 is a storage unit that stores image data of an image including a subject and a background, and an image of an area within a predetermined frame in the image data image stored in the storage unit . Based on the feature amount, a model generation unit that generates a subject model that is a model of the feature amount of the image of the subject, and a model generation unit in a region outside the frame in the image data image stored in the storage unit Counting means for counting the number of pixels included in the generated subject model, and deformation means for deforming the subject model so that the range of the subject model is narrowed when the value counted by the counting means is larger than a reference value. And recognizing means for recognizing the subject based on the subject model deformed by the deforming means.
[0006]
The subject recognition method according to claim 8 stores image data of an image including a subject and a background, and based on a feature amount of an image in an area within a predetermined frame in an image of the stored image data. A subject model that is a model of the feature amount of the subject image is generated, and the number of pixels included in the generated subject model is counted in an area outside the frame in the image of the stored image data. value is, is larger than the reference value, so as to narrow the range of the object model, to deform the object model, based on the modified object model, and recognizes the object.
According to a ninth aspect of the present invention, there is provided a subject recognition apparatus comprising: a storage unit that stores image data of an image including a subject and a background; and an image in an area within a predetermined frame in the image data stored in the storage unit . Based on the feature amount, a model generation unit that generates a subject model that is a model of the feature amount of the image of the subject, and a model generation unit in a region outside the frame in the image data image stored in the storage unit A counting unit that counts the number of pixels included in the generated subject model; and a deformation unit that deforms the subject model so that the range of the subject model is widened when the value counted by the counting unit is smaller than a reference value. And recognizing means for recognizing the subject based on the subject model deformed by the deforming means.
The subject recognition method according to claim 16 stores image data of an image including a subject and a background, and based on a feature amount of an image in an area within a predetermined frame in an image of the stored image data. A subject model that is a model of the feature amount of the subject image is generated, and the number of pixels included in the generated subject model is counted in an area outside the frame in the image of the stored image data. value is, is smaller than the reference value, so as to widen the range of the object model, to deform the object model, based on the modified object model, and recognizes the object.
[0007]
In the subject recognition device according to claim 1 and the subject recognition method according to claim 8, image data of an image including a subject and a background is stored, and a predetermined frame in the image of the stored image data is stored. A subject model that is a model of the feature amount of the image of the subject is generated based on the feature amount of the image in the region inside, and the subject model generated in the region outside the frame in the image of the stored image data When the number of pixels included in the image is counted and the counted value is smaller than the reference value, the subject model is deformed to widen the subject model range, and the subject is recognized based on the deformed subject model. The
In the subject recognition device according to claim 9 and the subject recognition method according to claim 16, image data of an image including the subject and the background is stored, and a predetermined frame in the image of the stored image data is stored. A subject model that is a model of the feature amount of the image of the subject is generated based on the feature amount of the image in the region inside, and the subject model generated in the region outside the frame in the image of the stored image data When the number of pixels included in the image is counted and the counted value is smaller than the reference value, the subject model is deformed to widen the subject model range, and the subject is recognized based on the deformed subject model. The
[0008]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 shows the configuration of an embodiment of a video camera to which the subject recognition apparatus of the present invention is applied. The lens block 1 (imaging means) includes a lens 2, an iris 3, and a CCD (Charge Coupled Device) 4, images the light L from the subject, and outputs an image signal as an electrical signal. That is, the light L from the subject is imaged on the CCD 4 by the lens 2 through the iris 3. Thereby, the CCD 4 outputs an image signal corresponding to the amount of received light. The iris 3 constitutes a so-called auto iris (AE) mechanism, and adjusts the amount of light received by the CCD 4 to an appropriate value.
[0009]
The image signal output from the lens block 1 is sampled and held in a sample hold (S / H) and automatic gain control (Automatic Gain Control (AGC)) circuit 5 and further given a predetermined gain by a control signal from the auto iris mechanism. The gain is controlled so as to be held, and then output to the A / D converter 6.
[0010]
In the present embodiment, the exposure amount is controlled by the auto iris mechanism, but it is also possible to perform imaging with a fixed exposure without functioning this.
[0011]
The A / D converter 6 A / D converts the image signal (analog signal) from the sample hold and automatic gain adjustment circuit 5 according to a predetermined clock. The image signal converted into a digital signal by the A / D converter 6 is supplied to the digital camera processing circuit 7. Based on the image signal from the A / D converter 6, the digital camera processing circuit 7, the luminance signal Y of each pixel constituting the image corresponding to the image signal, and the color difference signals RY, BY, and A chroma signal C is generated. For example, the NTSC luminance signal Y and chroma signal C are output to the D / A converter 8 where they are D / A converted and then supplied to the monitor 12. Thereby, the image captured by the lens block 1 is displayed on the monitor 12. The monitor 12 also displays a frame output from the frame display IC 13 in a superimposed manner.
[0012]
Further, the luminance signal Y and the color difference signals RY and BY generated by the digital camera processing circuit 7 are supplied to the subject recognition circuit 9. The subject recognition circuit 9 detects a subject to be tracked from the image composed of the luminance signal Y and the color difference signals RY and BY from the digital camera processing circuit 7.
[0013]
For this reason, the subject recognition circuit 9 has an image memory 10 (storage means) constituted by a frame memory and a tracking signal processing circuit 11 constituted by a microprocessor. When the image memory 10 receives the write permission signal S1 from the tracking signal processing circuit 11, the image memory 10 stores the luminance signal Y and the color difference signals RY, BY output from the digital camera processing circuit 7 independently for each pixel. .
[0014]
Here, the color difference signals RY and BY are abbreviated as R and B, respectively, as appropriate. Also, the position of the upper left pixel of the image output by the lens block 1 is the origin (0, 0), and the luminance signal Y and color difference signal R of the i th pixel from the left of the position and the j th pixel from the top. , B are respectively represented as Yij, Rij, and Bij as appropriate below. Further, hereinafter, the luminance signal Y and the color difference signals R and B are collectively referred to as image data as appropriate.
[0015]
When image data for one frame (or one field) is stored, the image memory 10 outputs a read permission signal S2 to the tracking signal processing circuit 11. Thereafter, when the image memory 10 receives a memory address (corresponding to i and j described above) S3 output from the tracking signal processing circuit 11, the image memory 10 outputs the image data S4 stored at the address to the tracking signal processing circuit 11. .
[0016]
When the tracking signal processing circuit 11 receives the reading permission signal S2 from the image memory 10, the tracking signal processing circuit 11 gives the image data S4 necessary for tracking the subject to the image memory 10 by giving an address (memory address) S3 as described above. Thus, the subject to be tracked is detected from the image output from the lens block 1. Thereafter, the tracking signal processing circuit 11 supplies the write permission signal S1 to the image memory 10, whereby the image captured by the lens block 1 is newly stored in the image memory 10 (overwriting the already stored image). ) At this time, the image memory 10 outputs the read permission signal S2 again as described above. Similarly, in the image memory 10, the images picked up by the lens block 1 are sequentially stored.
[0017]
Further, when the tracking signal processing circuit 11 detects a subject, the tracking signal processing circuit 11 drives the pan motor 14 and the tilt motor 15 (driving means) so that the subject is displayed at the center of the image output from the lens block 1. The input unit 17 including various keys, switches, buttons, and the like has a subject setting button 16. This button 16 is operated when the subject setting process is completed.
[0018]
Next, a series of processes performed in the tracking signal processing circuit 11 will be described with reference to the flowchart of FIG. First, in step 1, it is determined whether or not the setting process of the subject to be tracked is completed (whether or not the subject setting button 16 has been operated). When the subject setting process is not yet completed, the user operates the input unit 17 to instruct the tracking signal processing circuit 11 to perform the subject setting process. At this time, the tracking signal processing circuit 11 controls the frame display IC 13 to generate a subject setting frame (designating means), which is output to the monitor 12 for display. Thereby, the subject setting frame D is displayed on the monitor 12, for example, as shown in FIG.
[0019]
On the other hand, light from the subject incident on the CCD 4 via the lens 2 and the iris 3 is photoelectrically converted in the CCD 4, sampled by the sample hold and automatic gain control circuit 5, and input controlled so as to have an appropriate gain. Is input to the A / D converter 6. In the A / D converter 6, the A / D converted signal is input to the digital camera processing circuit 7, and a luminance signal Y and a chroma signal C are generated. The luminance signal Y and chroma signal C are D / A converted by the D / A converter 8 and then output to the monitor 12 for display. Therefore, as shown in FIG. 3, the image of the subject is displayed on the monitor 12 together with the background image. Then, as described above, the subject setting frame D is also superimposed and displayed there.
[0020]
The subject setting frame D shown in FIG. 3 is arranged at a predetermined position of the image captured by the lens block 1 (in this embodiment, the center of the screen), and the user sets the subject to be tracked. The lens block 1 is panned or tilted so that the subject is displayed in the subject setting frame D. That is, the user operates the input unit 17 to instruct the tracking signal processing circuit 11 to pan or tilt the lens block 1 in a predetermined direction. The tracking signal processing circuit 11 outputs control signals to the pan motor 14 and the tilt motor 15 in response to this command. As a result, the pan motor 14 and the tilt motor 15 drive the lens block 1 to a desired pan position and tilt position.
[0021]
After the subject to be tracked is arranged in the subject setting frame D in this way, the user operates the subject setting button 16 in order to input that the subject setting process has been completed.
[0022]
If it is determined in step 1 that the subject setting button 16 has been operated, the image data inside the subject setting frame D is read from the image memory 10 in step 2. For example, in the case of the embodiment of FIG. 3, image data of a predetermined person's face is read from the image memory 10 as subject data.
[0023]
The image data inside the subject setting frame D is a set of points defined by a set of luminance signal Yij and color difference signals Rij, Bij (Yij, Rij, Bij) in order to represent the feature amount. Then, as shown in FIG. 4, this point is the (RY, Y) (R, Y) plane (FIG. 4A) and the (BY, Y) (B, Y) plane (FIG. 4 (B)). In other words, the position (coordinates) on these planes represents the feature amount of the subject.
[0024]
However, noise is included in the set of points (Yij, Rij, Bij), and this set is merely a set of representative points representing the subject. Therefore, in step 3, (Yij, HRij, HBij) and (Yij, LRij, LBij) are generated as subject information in order to give a width to the set of points (Yij, Rij, Bij). Here, HRij, LRij, HBij, and LBij are calculated according to the following equations, respectively.
HRij = Rij × (1 + α)
LRij = Rij × (1-α)
HBij = Bij × (1 + α)
LBij = Bij × (1-α)
[0025]
In the above equation, α is a positive constant and represents an allowable error for recognizing a predetermined pixel as a subject pixel.
[0026]
As described above, by performing the above calculation on the data shown in FIG. 4 (data acquired in step 2), data in consideration of the allowable error α is obtained as shown in FIG. 5A is a plot of points (Yij, HRij) and (Yij, LRij), and FIG. 5B is a plot of points (Yij, HBij) and (Yij, LBij). Are shown respectively. In the present embodiment, a range of −128 to 127 is assigned as values representing R and B.
[0027]
Next, in step 4 (model generation means), for example, a quadratic function related to R or B with Y as an argument for the point set taking into account the allowable error α shown in FIGS. 5 (A) and 5 (B). Create an approximate subject model. In the present embodiment, even when different subjects are approximated, a Y-intercept of a quadratic function (a quadratic function that is a subject model) is obtained so that a subject model having a somewhat similar shape (a quadratic function in this embodiment) can be obtained. Is the point at which the Y axis intersects.
[0028]
Specifically, for each Y-intercept, as shown in FIG. 5A, Rlow and Rhigh (where Rlow <Rhigh) are set in advance for the Y-R coordinate system, and YB For the coordinate system, as shown in FIG. 5B, Blow and Bhigh (however, Blow <Bhigh) are set in advance.
[0029]
In this way, with the Y-intercept fixed, (Yij, HRij) and (Yij, LRij) in FIG. 5A and (Yij, HBij) and (Yij, LBij) in FIG. Is approximated by a quadratic function (for example, least square approximation), and a quadratic function HFr (Y) (upper limit feature model of R with respect to Y) and LFr (Y) (R with respect to Y) as subject models represented by the following equation: LFb (Y) (B's upper limit feature model for Y) and LFb (Y) (B's lower limit feature model for Y).
HFr (Y) = A0 * (Y-Rlow) * (Y-Rhigh)
HFb (Y) = A1 × (Y−Blow) × (Y−Bhigh)
LFr (Y) = A2 * (Y-Rlow) * (Y-Rhigh)
LFb (Y) = A3 × (Y−Blow) × (Y−Bhigh)
[0030]
Here, A0 is a constant obtained by approximating (Yij, HRij), A1 is (Yij, LRij), A2 is (Yij, HBij), and A3 is (Yij, LBij). .
[0031]
On the (R, Y) plane thus determined, it exists between HFr (Y) and LFr (Y), and on the (B, Y) plane, HFb (Y) and LFb ( The pixel at the point existing during Y) is the pixel corresponding to the subject.
[0032]
The details of the method for creating a model as described above have been previously disclosed by the present applicant as Japanese Patent Application No. 8-11655.
[0033]
In step 5 (deformation means), the subject model (FIGS. 6A and 6B) created in step 4 is corrected according to the background. In other words, if there is a region that satisfies the subject model in the region (background) outside the subject setting frame on the screen when the subject is initially set, the subject model is narrowed down (FIGS. 7A and 7B) and present. If not, the subject model is corrected to widen (FIGS. 8A and 8B).
[0034]
That is, the details of the model change process based on the background of step 5 are as shown in the flowchart of FIG. In step 21, the data of all the pixels in the area outside the subject setting frame D in FIG. In step 22 (counting means), the number N of pixels included in the subject model (FIG. 6) generated in step 4 is counted. That is, the number of pixels satisfying the following expression among the pixels (Yij, Rij, Bij) is counted.
LFr (Yij) <Rij <HFr (Yij)
LFb (Yij) <Bij <HFb (Yij)
[0035]
Next, in step 23, it is determined whether or not the number N of pixels counted in step 22 is greater than a predetermined reference value β (β> 0) set in advance. When it is determined that the counted number N is larger than the reference value β, the process proceeds to step 24, where the allowable error α of the subject model generated in step 4 is set to α1 (α1 <α). This narrows the subject model as shown in FIG.
[0036]
On the other hand, if it is determined in step 23 that the counted number N is less than or equal to the reference value β, the process proceeds to step 25 to determine whether or not the counted number N is smaller than the reference value γ (γ> 0). judge. When the number N is smaller than the reference value γ, the process proceeds to step 26, and the allowable error α is set to α2 ( α2> α ). Thereby, as shown in FIG. 8, the width of the subject model (FIG. 6) generated in step 4 is widened.
[0037]
In this way, when there is an image that approximates the subject in the background, correction is made so that the width of the model of the subject is narrowed, and it is possible to suppress erroneous recognition of a non-subject portion as the subject.
[0038]
On the contrary, when there is no image similar to the subject in the background, the subject can be recognized more reliably by increasing the width of the subject model.
[0039]
Next, referring back to FIG. 2, in step 6 (recognition means), the subject model generated in step 5 (corrected subject model) is used to predict a part of the subject from the image memory 10. Extract pixels. That is, among the pixels constituting the image captured by the lens block 1 and stored in the image memory 10, the luminance Yij and the color difference signals Rij and Bij satisfy both of the following two expressions, respectively. Extracted as constituent pixels of the subject.
LFr (Yij) <Rij <HFr (Yij)
LFb (Yij) <Bij <HFb (Yij)
[0040]
That is, it is plotted between the two quadratic functions (however, modified functions) LFr (Yij) and HFr (Yij) shown in FIG. 5 (A), and 2 shown in FIG. 5 (B). Pixels plotted between two quadratic functions (but modified functions) LFb (Yij) and HFb (Yij) are detected as pixels constituting the subject.
[0041]
In step 6, after subject detection pixels are detected from the image stored in the image memory 10, in step 7, it is determined whether or not a subject exists based on the number of the subject configuration pixels. That is, when the number of subject constituent pixels detected in step 6 is larger than the predetermined threshold δ, it is determined that there is a subject in the image stored in the image memory 10, and the number of subject constituent pixels is the predetermined threshold. If it is equal to or less than δ, it is determined that no subject exists in the image stored in the image memory 10.
[0042]
If it is determined in step 7 that a subject is present in the image memory 10, in step 8, among the subject constituent pixels detected in step 7, there is a so-called noise around the area composed of the subject constituent pixels. In order to remove a specific area, a filtering process is performed on an area composed of subject constituent pixels. For example, as shown in FIG. 10A with a shadow, when a subject composing pixel is detected, the subject composing region shown with a shadow in FIG. , As shown in FIG. 10 (B).
[0043]
Thereafter, in step 9, as shown in FIG. 10B, a display frame (a frame indicating a recognized subject) is displayed so as to surround the subject constituent pixel group detected in step 7. For this reason, the tracking signal processing circuit 11 controls the frame display IC 13 to generate a frame pulse at a position where the display frame is displayed. The monitor 12 superimposes the frame pulse on the video signal. For example, when a display frame is displayed as shown in FIG. 11, the frame pulse of the line indicated by the arrow in FIG. 11 is as shown in FIG. 12, and when this frame pulse is superimposed on the video signal of that line, A video signal as shown in FIG. 13 is obtained. By displaying this video signal on the monitor 12, an image as shown in FIG. 11 is displayed.
[0044]
Thereafter, in step 10, the center of gravity of the set of subject constituent pixels filtered in step 8 (for example, the center of gravity on the xy plane with the horizontal direction as the x-axis and the vertical direction as the y-axis) is obtained (FIG. 10B). The position (coordinates) indicated by the X mark is obtained), and this is the position of the subject.
[0045]
Further, in step 11, the pan motor 14 and the tilt motor 15 are rotationally driven so that the position of the subject calculated in step 10 coincides with the center position of the image output from the lens block 1. Block 1 is panned and tilted, and the subject on the monitor 12 is drawn to the center of the display screen.
[0046]
Next, the process proceeds to step 12, where it is determined whether or not the end of the process has been commanded. If the end of the process is instructed, the process ends.
[0047]
If it is determined in step 7 that the subject does not exist, the process proceeds to step 13 to execute a process of deleting the frame. That is, the tracking signal processing circuit 11 controls the frame display IC 13 to stop the generation of the frame pulse. When erasing the frame, the frame pulse may be set to 0V. In addition to this, instead of deleting the frame, the size of the frame may be changed or the size of the frame may be maximized so that the subject is displayed differently.
[0048]
In the above embodiment, the subject recognition circuit 9 is built in the video camera, but it may be provided as a device external to the video camera.
[0049]
Thus, using this video camera, for example, it is possible to realize a system that monitors a predetermined room and automatically tracks a person entering the room or automatically tracks a speaker in a video conference system.
[0050]
Note that the present invention can be applied to other than video cameras.
[0051]
【The invention's effect】
As described above, according to the subject recognition apparatus according to claim 1 and the subject recognition method according to claim 8 , a subject model is adaptively generated with respect to the background, and the subject is surely recognized regardless of the background. It becomes possible to do. In addition, it is possible to prevent a portion that is not a subject from being erroneously recognized as a subject.
Furthermore, according to the subject recognition apparatus according to claim 9 and the subject recognition method according to claim 16, a subject model is generated adaptively with respect to the background, and the subject can be reliably recognized regardless of the background. Is possible. In addition, the subject can be recognized more reliably.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration example of a video camera to which a subject recognition apparatus of the present invention is applied.
FIG. 2 is a flowchart for explaining the operation of the tracking signal processing circuit 11 in the embodiment of FIG. 1;
FIG. 3 is a diagram for explaining setting of a subject.
FIG. 4 is a diagram illustrating a subject image at the time of subject setting.
FIG. 5 is a diagram illustrating a subject model.
FIG. 6 is a diagram illustrating a subject model.
FIG. 7 is a diagram illustrating a narrowed subject model.
FIG. 8 is a diagram illustrating a widened subject model.
FIG. 9 is a flowchart showing details of model change processing based on the background in step 5 of FIG. 2;
FIG. 10 is a diagram for explaining the filtering process in step 8 of FIG. 2;
FIG. 11 is a diagram illustrating display of a frame.
FIG. 12 is a diagram illustrating an example of a frame pulse.
FIG. 13 is a diagram illustrating a signal in which a frame pulse is superimposed on a video signal.
[Explanation of symbols]
1 lens block, 2 lens, 3 iris, 4 CCD, 5 sample hold and automatic gain adjustment circuit, 6 A / D converter, 7 digital camera processing circuit, 8 D / A converter, 9 subject recognition circuit, 10 image memory , 11 Tracking signal processing circuit, 12 monitor, 13 frame display IC, 14 pan motor, 15 tilt motor, 16 subject setting button, 17 input unit

Claims

Storage means for storing image data of an image including a subject and a background;
Model generation means for generating a subject model which is a model of the feature amount of the image of the subject based on the feature amount of the image in the area within a predetermined frame in the image of the image data stored in the storage means When,
Counting means for counting the number of pixels included in the subject model generated by the model generating means in an area outside the frame in the image of the image data stored in the storage means ;
Deformation means for deforming the subject model so as to narrow the range of the subject model when the value counted by the counting means is larger than a reference value;
And a recognition means for recognizing the subject based on the subject model deformed by the deformation means.

Subject setting means for setting an image of an area within the frame in an image including the subject and the background as an image of the subject;
The subject recognition apparatus according to claim 1, wherein the model generation unit generates the subject model from the image set as the subject by the subject setting unit .

The subject recognition apparatus according to claim 1, further comprising a driving unit that pans or tilts the imaging unit that captures the subject in accordance with a recognition result of the recognition unit.

Imaging means for imaging the subject;
The subject recognition apparatus according to claim 1, further comprising: a driving unit configured to perform panning or tilting driving of the imaging unit in accordance with a recognition result of the recognition unit.

Driving means for driving the panning or tilting the imaging means for imaging the subject;
Drive control means for controlling the drive means so that the position of the subject coincides with the center position of the image picked up by the image pickup means corresponding to the recognition result of the recognition means. The subject recognition apparatus according to claim 1, wherein:

Imaging means for imaging the subject;
Driving means for driving the panning or tilting the imaging means;
Drive control means for controlling the drive means so that the position of the subject coincides with the center position of the image picked up by the image pickup means corresponding to the recognition result of the recognition means. The subject recognition apparatus according to claim 1, wherein:

A display control unit that controls display on a display unit so as to display a frame image that corresponds to the frame and that separates the subject and the background in an image including the subject and the background. The subject recognition apparatus according to claim 1.

Store the image data of the image including the subject and background,
Generating a subject model that is a model of the feature amount of the image of the subject based on the feature amount of the image in an area within a predetermined frame in the image of the stored image data ;
Counting the number of pixels included in the generated subject model in the area outside the frame in the image of the stored image data ,
Counted value, is greater than the reference value, so as to narrow the range of the object model, to deform the object model,
A subject recognition method, wherein the subject is recognized based on the deformed subject model .

Storage means for storing image data of an image including a subject and a background;
Said storage means in the image memory is the image data, based on the feature amount of the image of the region of the predetermined frame, the model generation for generating an object model which is a feature quantity of the model image of the object Means,
Counting means for counting the number of pixels included in the subject model generated by the model generating means in an area outside the frame in the image of the image data stored in the storage means ;
Deformation means for deforming the subject model so as to widen the range of the subject model when the value counted by the counting means is smaller than a reference value;
And a recognition means for recognizing the subject based on the subject model deformed by the deformation means.

Subject setting means for setting an image of an area within the frame in an image including the subject and the background as an image of the subject;
The subject recognition apparatus according to claim 9, wherein the model generation unit generates the subject model from the image set as the subject by the subject setting unit .

The subject recognition apparatus according to claim 9, further comprising a driving unit that pans or tilts the imaging unit that captures the subject in accordance with a recognition result of the recognition unit.

Imaging means for imaging the subject;
The subject recognition apparatus according to claim 9, further comprising: a driving unit that pans or tilts the imaging unit according to a recognition result of the recognition unit.

Driving means for driving the panning or tilting the imaging means for imaging the subject;
Drive control means for controlling the drive means so that the position of the subject coincides with the center position of the image picked up by the image pickup means corresponding to the recognition result of the recognition means. The subject recognition apparatus according to claim 9, characterized in that:

Imaging means for imaging the subject;
Driving means for driving the panning or tilting the imaging means;
Drive control means for controlling the drive means so that the position of the subject coincides with the center position of the image picked up by the image pickup means corresponding to the recognition result of the recognition means. The subject recognition apparatus according to claim 9, characterized in that:

A display control unit that controls display on a display unit so as to display a frame image that corresponds to the frame and that separates the subject and the background in an image including the subject and the background. The subject recognition apparatus according to claim 9.

Store the image data of the image including the subject and background,
Generating a subject model that is a model of the feature amount of the image of the subject based on the feature amount of the image in an area within a predetermined frame in the image of the stored image data ;
Counting the number of pixels included in the generated subject model in the area outside the frame in the image of the stored image data ,
Counted value, is smaller than the reference value, so as to widen the range of the object model, to deform the object model,
A subject recognition method, wherein the subject is recognized based on the deformed subject model .