JP4711885B2

JP4711885B2 - Remote control device and method

Info

Publication number: JP4711885B2
Application number: JP2006144857A
Authority: JP
Inventors: 和也高木
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2006-05-25
Filing date: 2006-05-25
Publication date: 2011-06-29
Anticipated expiration: 2026-05-25
Also published as: JP2007316882A

Description

本発明は、操作対象の機器に対し、離れた位置からジェスチャにより遠隔操作を行う遠隔操作方法、及びそれを実現する遠隔操作装置に関する。 The present invention relates to a remote operation method for performing remote operation on a device to be operated from a remote position by a gesture, and a remote operation device that realizes the remote operation method.

一般に、操作対象機器を遠隔操作するためには、赤外線を利用したリモートコントローラ（リモコン）が用いられる。しかしながら、従来のリモコンには、以下のような問題がある。 In general, a remote controller (remote controller) using infrared rays is used to remotely operate an operation target device. However, the conventional remote control has the following problems.

まず、ユーザが、リモコンを紛失しやすいという問題がある。また、近年では、ユーザが複数の機器に対応した複数のリモコンを所有している場合が多いため、リモコンを取り違え易いという問題もある。 First, there is a problem that the user easily loses the remote control. In recent years, there are many problems that a user often owns a plurality of remote controllers corresponding to a plurality of devices, so that the remote controllers can be easily mistaken.

また、リモコンの操作ボタンは、一般に小さく構成されているため、ユーザが間違った操作ボタンを押し易いという問題がある。さらに、リモコンは、一般に電池で駆動されるため、定期的に電池を取り替えなければならないという問題もある。 Further, since the operation buttons of the remote control are generally small, there is a problem that the user can easily press the wrong operation button. Furthermore, since the remote control is generally driven by a battery, there is a problem that the battery must be replaced periodically.

これらの問題を解決するため、ビデオカメラによりユーザのジェスチャを読み取ることにより遠隔操作を行う方法が提案されている（例えば、特許文献１参照）。また、ユーザのジェスチャを認識するための技術としては、撮像画像から肌色部分を抽出してテンプレートマッチングにより顔領域を特定し、さらにパターンマッチングにより動作を認識することが提案されている（例えば、特許文献２参照）。 In order to solve these problems, a method of performing a remote operation by reading a user's gesture with a video camera has been proposed (for example, see Patent Document 1). As a technique for recognizing a user's gesture, it has been proposed to extract a skin color portion from a captured image, specify a face region by template matching, and further recognize an operation by pattern matching (for example, patents). Reference 2).

特開２００３−１８６５９６（第６頁、図１）JP 2003-186596 (Page 6, FIG. 1) 特開２００３−２１６９５５（第１１頁、図１）JP 2003-216955 A (page 11, FIG. 1)

しかしながら、特許文献１には、ユーザのジェスチャを読み取る方法の詳細は開示されていない。また、特許文献２では、処理能力の高いパーソナルコンピュータを用いることを想定して、テンプレートマッチングのような高精度で複雑な処理を行っているため、例えば家庭用電化製品への適用が難しいという問題がある。 However, Patent Document 1 does not disclose details of a method for reading a user's gesture. Further, in Patent Document 2, assuming that a personal computer with high processing capability is used, a complicated process with high accuracy such as template matching is performed, so that it is difficult to apply to, for example, household appliances. There is.

本発明は、上記の課題を解決するためになされたものであり、家庭用電化製品などにも適用できるような簡単な処理で、ユーザのジェスチャによる遠隔操作を実現することを目的とする。 The present invention has been made to solve the above-described problems, and an object of the present invention is to realize a remote operation by a user's gesture with a simple process that can be applied to household appliances and the like.

この発明に係る遠隔操作装置は、
ユーザの像を含む撮像画像を取得する撮像手段と、撮像画像から特徴点を検出する特徴点検出手段と、検出された特徴点に基づき、ユーザのジェスチャを推定するジェスチャ推定手段と、ジェスチャに対応する操作信号を生成し、遠隔操作対象機器に送信する操作信号送信手段とを備え、撮像画像において、特徴点が存在する可能性がある領域として特徴点存在範囲を推定する特徴点存在範囲推定手段をさらに備える。特徴点検出手段は、撮像画像において、複数の特徴点存在範囲が重なり合う領域では、第１の検出処理を行い、複数の特徴点存在範囲が重なり合わず、一つの特徴点存在範囲のみである領域では、第１の検出処理よりも精度は低いが計算量が少ない第２の検出処理を行い、両検出処理の結果を統合することにより特徴点の検出を行う。 The remote control device according to the present invention is:
Image capturing means for acquiring a captured image including a user image, feature point detecting means for detecting a feature point from the captured image, gesture estimation means for estimating a user's gesture based on the detected feature point, and corresponding to the gesture A feature point existence range estimating means for estimating a feature point existence range as a region where a feature point may exist in a captured image. Is further provided. In the captured image, the feature point detection unit performs the first detection process in a region where a plurality of feature point existence ranges overlap , and the plurality of feature point existence ranges do not overlap, and only one feature point existence range exists. Then, the second detection process, which is lower in accuracy than the first detection process but has a smaller calculation amount, is performed, and feature points are detected by integrating the results of both detection processes.

この発明に係る遠隔操作方法は、ユーザの像を含む撮像画像を取得する撮像ステップと、撮像画像から特徴点を検出する特徴点検出ステップと、検出された特徴点に基づき、ユーザのジェスチャを推定するジェスチャ推定ステップと、ジェスチャに対応する操作信号を生成し、遠隔操作対象機器に送信する操作信号送信ステップとを含み、撮像画像において、特徴点が存在する可能性がある領域として特徴点存在範囲を推定する特徴点存在範囲推定ステップをさらに含む。特徴点検出ステップでは、撮像画像において、複数の特徴点存在範囲が重なり合う領域では、第１の検出処理を行い、複数の特徴点存在範囲が重なり合わず、一つの特徴点存在範囲のみである領域では、第１の検出処理よりも精度は低いが計算量が少ない第２の検出処理を行い、両検出処理の結果を統合することにより特徴点の検出を行う。 According to the remote operation method of the present invention, an imaging step for acquiring a captured image including an image of a user, a feature point detecting step for detecting a feature point from the captured image, and estimating a user's gesture based on the detected feature point gesture estimation step of, generating an operation signal corresponding to the gesture, saw including an operation signal transmitting step of transmitting to the remote operation target apparatus, there feature points in the captured image, as an area where there may be minutiae A feature point existence range estimation step for estimating the range is further included. The feature point detection step, Oite the captured image, in a region where a plurality of feature points existing range overlap, performing a first detection process, do not overlap the plurality of feature points existing range, only one of the feature points existing range In a certain area, a second detection process is performed that is less accurate than the first detection process but requires a smaller amount of calculation, and feature points are detected by integrating the results of both detection processes.

本発明によれば、撮像画像において、高精度で計算量が多い（従って処理負荷が大きい）検出処理を行う領域を第１の領域に限定することにより、遠隔操作処理の全体の負荷を軽減することができる。これにより、ジェスチャ入力による遠隔操作処理を、比較的処理能力の高くない制御装置を備えた機器、例えば家庭用電化製品でも実現することが可能になる。 According to the present invention, the entire load of the remote operation processing is reduced by limiting the region for performing detection processing with high accuracy and a large amount of calculation (and thus a large processing load) to the first region in the captured image. be able to. As a result, remote operation processing by gesture input can be realized even in a device having a control device with relatively low processing capacity, such as a household appliance.

本発明は、撮像画像の全体に亘って高精度で複雑な検出処理を行っていた従来技術に対して、高精度で計算量が多い検出処理を行う領域を限定する（すなわち、複数の特徴点を含む可能性のある領域でのみ高精度で計算量が多い検出処理を行う）ことで、ジェスチャの推定に要する処理の負荷を軽減できることに着目したものである。すなわち、遠隔操作処理を、比較的処理能力の高くない制御装置を備えた機器（例えば家庭用電化製品）で実現できるようにしたものである。 The present invention limits a region for performing detection processing with high accuracy and a large amount of calculation, compared to the conventional technique in which high-precision and complicated detection processing is performed over the entire captured image (that is, a plurality of feature points). In other words, it is possible to reduce the processing load required for gesture estimation by performing detection processing with a high accuracy and a large amount of calculation only in an area that may include the. That is, the remote operation processing can be realized by a device (for example, a household appliance) provided with a control device having a relatively high processing capacity.

以下、本発明の各実施の形態について、添付図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

実施の形態１．
図１は、本発明の実施の形態１におけるジェスチャ入力を用いた遠隔操作装置の構成を示すブロック図である。この遠隔操作装置は、ユーザの像を含む撮像画像を取得するカメラ等の撮像部１０と、撮像部１０により取得した撮像画像からジェスチャを読み取るジェスチャ認識部１とを備えている。ジェスチャ認識部１は、撮像部１０による撮像画像から特徴点を検出する画像処理部１１と、画像処理部１１において検出した特徴点の位置の変化からジェスチャを推定し、ジェスチャに対応する操作信号を遠隔操作対象機器１３に送信する認識処理部１２とを備えている。遠隔操作対象機器１３の機器制御部１３ａは、遠隔操作装置からの操作信号を受信し、受信した操作信号に応じた操作を行う。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration of a remote control device using gesture input according to Embodiment 1 of the present invention. The remote control device includes an imaging unit 10 such as a camera that acquires a captured image including a user's image, and a gesture recognition unit 1 that reads a gesture from the captured image acquired by the imaging unit 10. The gesture recognition unit 1 estimates an gesture from an image processing unit 11 that detects a feature point from an image captured by the imaging unit 10, and a change in the position of the feature point detected by the image processing unit 11, and receives an operation signal corresponding to the gesture. And a recognition processing unit 12 that transmits to the remote operation target device 13. The device control unit 13a of the remote operation target device 13 receives an operation signal from the remote operation device and performs an operation according to the received operation signal.

遠隔操作装置の画像処理部１１は、撮像画像において特徴点が存在する可能性のある範囲を推定する特徴点存在範囲推定部１４と、この特徴点存在範囲において特徴点を検出する特徴点検出部１５とを備えている。遠隔操作装置の認識処理部１２は、特徴点に基づいてジェスチャを推定するジェスチャ推定部１６と、推定されたジェスチャに対応する操作信号を生成し、遠隔操作対象機器１３に対して送信する操作信号送信部１７とを備えている。 The image processing unit 11 of the remote control device includes a feature point existence range estimation unit 14 that estimates a range where a feature point may exist in a captured image, and a feature point detection unit that detects a feature point in the feature point existence range. 15. The recognition processing unit 12 of the remote operation device generates a gesture estimation unit 16 that estimates a gesture based on a feature point, an operation signal corresponding to the estimated gesture, and transmits the operation signal to the remote operation target device 13. And a transmission unit 17.

図２は、本発明の実施の形態１における撮像画像の例を示す図である。特徴点存在範囲推定部１４（図１）は、撮像画像の中から、ジェスチャ推定に必要な特徴点の存在する可能性の高い領域を、過去の数フレームとの間の画像の変化、及び、前フレームでの特徴点の位置に基づいて決定する。なお、特徴点とは、撮像画像中に存在する「ある特徴を持った部分」をいい、ここでは、ユーザの手や頭をいう。 FIG. 2 is a diagram illustrating an example of a captured image according to Embodiment 1 of the present invention. The feature point existence range estimation unit 14 (FIG. 1) changes, from the captured image, an area in which there is a high possibility that a feature point necessary for gesture estimation exists, changes in the image between several past frames, and This is determined based on the position of the feature point in the previous frame. The feature point refers to a “part having a certain feature” existing in the captured image, and here refers to a user's hand or head.

特徴点が存在する可能性のある領域は、撮像画像中のユーザが含まれている領域である。例えば、図２に示す撮像画像では、内側にユーザの像が含まれるような領域境界２０を規定すると、その領域境界２０の内側は、特徴点が存在する可能性のある領域２１となり、領域境界２０の外側は、特徴点が存在する可能性の低い領域２２となる。なお、図２に示した領域境界２０は矩形状であるが、他の形状、例えば楕円形等であってもよい。 The region where the feature point may exist is a region including the user in the captured image. For example, in the captured image illustrated in FIG. 2, if a region boundary 20 that includes the user image is defined inside, the region boundary 20 is a region 21 in which a feature point may exist, and the region boundary 20 Outside 20 is a region 22 where there is a low possibility of feature points. The region boundary 20 shown in FIG. 2 has a rectangular shape, but may have another shape such as an ellipse.

特徴点を検出する際には、撮像画像の全体を検出対象とするのではなく、過去の数フレームの間に撮像画像に変化があった領域のうち、特徴点が存在する可能性のある領域を「特徴点存在範囲」と推定し、この特徴点存在範囲のみを検出対象とする。撮像画像に変化がなかった領域は、特徴点の存在する可能性が低いため、検出対象から除外する。 When detecting feature points, the entire captured image is not the target of detection, but the region in which the captured image may change among the regions where the captured image has changed during the past several frames Is estimated as “feature point existence range”, and only this feature point existence range is set as a detection target. The area where the captured image has not changed is excluded from the detection target because there is a low possibility that a feature point exists.

特徴点存在範囲推定部１４は、複数の特徴点存在範囲が重なり合う領域と、重なり合わない領域とを区分することで、特徴点が全く含まれない領域、特徴点が一つだけ含まれる可能性のある領域、複数の特徴点が含まれる可能性のある領域というように、撮像画像を分割している。 The feature point existence range estimation unit 14 divides a region where a plurality of feature point existence ranges overlap from a region where the feature points do not overlap so that only one feature point may be included. The captured image is divided such that there is a region where there is a possibility that a plurality of feature points may be included.

図３は、撮像画像を上記のように分割した例を示す図である。特徴点１（例えば、ユーザの頭）の存在範囲３０は、内部に特徴点１が存在している可能性がある領域である。特徴点２（例えば、ユーザの右手）の存在範囲３１は、内部に特徴点２が存在している可能性がある領域である。これら２つの領域（特徴点存在範囲３０，３１）の重なり合い状態によって、多くても特徴点１のみしか含まない領域３２と、多くても特徴点２のみしか含まない領域３３と、特徴点１，２の両方を含む可能性がある領域３４とに分割することができる。 FIG. 3 is a diagram illustrating an example in which a captured image is divided as described above. The existence range 30 of the feature point 1 (for example, the user's head) is an area where the feature point 1 may exist inside. The existence range 31 of the feature point 2 (for example, the user's right hand) is an area where the feature point 2 may exist inside. Depending on the overlapping state of these two regions (feature point existence ranges 30, 31), a region 32 containing only feature point 1 at most, a region 33 containing only feature point 2 at most, and feature points 1, Can be divided into regions 34 that may include both.

図４は、撮像画像の分割方法の一例を模式的に示す図である。図４のように、特徴点存在範囲推定部１４によって、ユーザの頭である特徴点１を含むように特徴点存在範囲３０が推定され、また、ユーザの右手である特徴点２を含むように特徴点存在範囲３１が推定された場合には、両特徴点存在領域３０，３１が重なり合う部分が生じる。この重なり合い部分を、「特徴点存在範囲が重なり合う領域３４」とする。なお、図４では、特徴点存在範囲が重なり合う領域を分かり易く説明するため、特徴点存在範囲を大きく図示している。 FIG. 4 is a diagram schematically illustrating an example of a method for dividing a captured image. As shown in FIG. 4, the feature point existence range estimation unit 14 estimates the feature point existence range 30 so as to include the feature point 1 that is the user's head, and includes the feature point 2 that is the user's right hand. When the feature point existence range 31 is estimated, a portion where both feature point existence regions 30 and 31 overlap is generated. This overlapping portion is referred to as “region 34 where the feature point existence ranges overlap”. In FIG. 4, the feature point existence range is greatly illustrated in order to easily understand the region where the feature point existence ranges overlap.

特徴点存在範囲３０，３１が重なり合う領域３４は、重なり合っている領域の数だけ特徴点が含まれている可能性がある。例えば、特徴点１の特徴点存在範囲３０と特徴点２の特徴点存在範囲３１とが重なり合っている領域３４内には、特徴点１及び特徴点２の両方が含まれている可能性がある。特徴点検出部１５（図１）は、この重なり合った領域３４内において、高精度で計算量が多い（従って負荷の大きい）特徴点検出処理を行うことで、複数の画素の塊（クラスタ）をそれぞれ区別して検出し、これらを複数の「特徴点候補点」とする。 The region 34 where the feature point existence ranges 30 and 31 overlap may include feature points as many as the number of overlapping regions. For example, in the region 34 where the feature point existence range 30 of the feature point 1 and the feature point existence range 31 of the feature point 2 overlap, there is a possibility that both the feature point 1 and the feature point 2 are included. . The feature point detection unit 15 (FIG. 1) performs a feature point detection process with high accuracy and a large amount of calculation (and therefore a large load) in the overlapping region 34, thereby collecting a plurality of pixel clusters (clusters). These are detected separately, and these are set as a plurality of “feature point candidate points”.

一方、特徴点存在範囲３０，３１が重なり合わない領域３２，３３では、互いに接近した特徴点が存在せず、多くても１つの特徴点しか含まれない可能性が高いため、複数の特徴点を区別して検出する必要性が少ない。そのため、この領域３２，３３では、低精度で計算量が少ない（従って負荷の軽い）特徴点検出処理を行って、ただ１つの画素の塊を検出し、これを「特徴点候補点」とする。なお、この領域３２，３３において、仮に複数の画素の塊が検出された場合は、特徴点候補点である可能性が最も高いもの、例えば面積が最大のもの（最大尤度の特徴点候補点とする。）を採用する。 On the other hand, in the regions 32 and 33 where the feature point existence ranges 30 and 31 do not overlap, there is no feature point close to each other, and there is a high possibility that only one feature point is included at most. There is little need to distinguish and detect. Therefore, in these areas 32 and 33, feature point detection processing with low accuracy and low calculation amount (and thus light load) is performed to detect only one pixel block, which is set as a “feature point candidate point”. . If a plurality of pixel clusters are detected in these areas 32 and 33, the most likely feature point candidate point, for example, the one with the largest area (the maximum likelihood feature point candidate point) ).

図５は、特徴点存在範囲３０，３１が重なり合う領域３４と、重なり合わない領域３２とをまたぐように特徴点が存在する場合を示す図である。特徴点存在範囲が重なり合う領域３４で高精度の検出処理により検出された画素の塊Ｃ１と、重なり合わない領域３２で低精度の検出処理により検出された画素の塊Ｃ２とが境界線を挟んで連続している場合には、これらを合わせて一つの特徴点（図５では特徴点１）と判断する。また、他と連続してない画素の塊Ｃ３は、それだけで一つの特徴点（図５では特徴点２）と判断する。領域３４，３２で検出された画素の塊の連続性を判断することを、「検出結果の統合」という。 FIG. 5 is a diagram showing a case where feature points exist so as to straddle the region 34 where the feature point existence ranges 30 and 31 overlap and the region 32 where they do not overlap. The pixel block C1 detected by the high-precision detection process in the region 34 where the feature point existence ranges overlap and the pixel block C2 detected by the low-precision detection process in the non-overlapping region 32 sandwich the boundary line. If they are continuous, they are combined and determined as one feature point (feature point 1 in FIG. 5). Further, the pixel block C3 which is not continuous with the other is determined as one feature point (feature point 2 in FIG. 5). Determining the continuity of pixel clusters detected in the areas 34 and 32 is referred to as “integration of detection results”.

ジェスチャ推定部１６（図１）は、特徴点検出部１５で検出された各特徴点の数フレームの間の位置変化に基づき、特徴点の現在の動きベクトルを検出し、予め動きベクトルの列として登録しておいたジェスチャパターン列と比較することで、ユーザのジェスチャを推定する。 The gesture estimation unit 16 (FIG. 1) detects the current motion vector of the feature point based on the position change between several frames of each feature point detected by the feature point detection unit 15, and previously stores the motion vector as a sequence of motion vectors. A user's gesture is estimated by comparing with the registered gesture pattern sequence.

操作信号送信部１７（図１）は、ジェスチャ推定部１６により推定されたユーザのジェスチャに対応する操作信号を生成し、遠隔操作対象機器１３（例えば家庭用電化製品）の機器制御部１３ａに送信し、操作信号に対応した操作を遠隔操作対象機器１３に行わせ、これによりジェスチャによる遠隔操作を実現する。 The operation signal transmission unit 17 (FIG. 1) generates an operation signal corresponding to the user's gesture estimated by the gesture estimation unit 16, and transmits the operation signal to the device control unit 13a of the remote operation target device 13 (for example, home appliance). Then, the remote operation target device 13 is made to perform an operation corresponding to the operation signal, thereby realizing the remote operation by the gesture.

図６は、この実施の形態１における遠隔操作処理の全体の流れを示すフローチャートである。図６に示した遠隔操作処理のうち、特徴点存在範囲の推定処理（ステップＳ１１）から検出結果の統合処理（ステップＳ１８）までは、画像処理部１１によって行われる。また、ジェスチャの推定処理（ステップＳ１９）から操作信号の送信処理（ステップＳ２１）までは、認識処理部１２によって行われる。 FIG. 6 is a flowchart showing an overall flow of the remote operation processing in the first embodiment. In the remote operation processing shown in FIG. 6, the processing from the feature point existence range estimation processing (step S <b> 11) to the detection result integration processing (step S <b> 18) is performed by the image processing unit 11. Further, the process from the gesture estimation process (step S19) to the operation signal transmission process (step S21) is performed by the recognition processing unit 12.

この遠隔操作処理では、まず、カメラの撮像部１０により撮像を行い、ユーザの像を含む撮像画像を取得する（ステップＳ１０）。 In this remote operation process, first, an image is captured by the imaging unit 10 of the camera, and a captured image including a user image is acquired (step S10).

次に、画像処理部１１の存在範囲推定部１４が、撮像部１０により撮像された撮像画像の数フレームの履歴等に基づき、撮像画像のうち変化があった部分の中から、特徴点存在範囲を推定する（ステップＳ１１）。撮像画像に全く変化がなく、特徴点存在範囲を推定できなかった場合には（ステップＳ１２）、ステップＳ１０に戻って撮像部１０による画像の撮像を行う。 Next, the existence range estimation unit 14 of the image processing unit 11 selects the feature point existence range from among the changed portions of the captured image based on the history of several frames of the captured image captured by the imaging unit 10. Is estimated (step S11). When there is no change in the captured image and the feature point existence range cannot be estimated (step S12), the process returns to step S10 and the image capturing unit 10 captures an image.

次に、ステップＳ１１で推定した特徴点存在範囲に基づき、複数の特徴点存在範囲が重なり合う領域があるか否かを判断する（ステップＳ１３）。 Next, based on the feature point existence range estimated in step S11, it is determined whether there is a region where a plurality of feature point existence ranges overlap (step S13).

特徴点存在範囲が重なり合う領域がある場合には、その重なり合う領域において、特徴点検出部１５が高精度で計算量が多い検出処理を行い、複数の画素の塊を区別して検出して、複数の「特徴点候補点」を得る（ステップＳ１４）。この検出処理は、例えばラベリング処理である。 When there are regions where the feature point existence ranges overlap, the feature point detection unit 15 performs detection processing with a high accuracy and a large amount of calculation in the overlapping regions, A “feature point candidate point” is obtained (step S14). This detection process is, for example, a labeling process.

特徴点存在範囲が重なり合う領域があれば、必ず、重なり合わない領域も存在するため（図３参照）、その重なり合わない領域においては、低精度で計算量が少ない検出処理を行い、一つの画素の塊を検出して、一つの「特徴点候補点」を得る（ステップＳ１５）。この検出処理は、例えば色ヒストグラムを用いた検出処理（色ヒストグラム法）など、簡単な処理であるため、比較的高速で行うことができる。なお、ステップＳ１４，Ｓ１５の順序は、逆であってもよい。 If there is a region where the feature point existence ranges overlap, there is always a region that does not overlap (see FIG. 3). Therefore, in the non-overlapping region, detection processing with low accuracy and a small amount of calculation is performed, and one pixel Is detected, and one “feature point candidate point” is obtained (step S15). Since this detection process is a simple process such as a detection process using a color histogram (color histogram method), it can be performed at a relatively high speed. Note that the order of steps S14 and S15 may be reversed.

さらに、高精度で計算量が多い検出処理（ステップＳ１４）と低精度で計算量が少ない検出処理（ステップＳ１５）の検出結果を統合する（ステップＳ１６）。すなわち、高精度で計算量が多い検出処理で検出された特徴点候補点と、低精度で計算量が少ない検出処理で検出された特徴点候補点とが連続するかどうかを判断し、連続している場合には、これらをまとめて一つの「特徴点」とする（図５参照）。また、特徴点候補点が連続していない場合には、それぞれの別々の「特徴点」とする。 Further, the detection results of the detection process (step S14) with high accuracy and a large calculation amount and the detection process (step S15) with low accuracy and a small calculation amount are integrated (step S16). In other words, it is determined whether feature point candidate points detected by a detection process with high accuracy and a large amount of calculation and feature point candidate points detected by a detection process with low accuracy and a small amount of calculation are consecutive. If they are, these are collectively referred to as one “feature point” (see FIG. 5). When feature point candidate points are not continuous, each of them is set as a separate “feature point”.

一方、上述したステップＳ１３において、特徴点存在範囲の重なり合う領域がなかった場合には、低精度で計算量が少ない検出処理のみを行い、一つの「特徴点候補点」を得る（ステップＳ１７）。この低精度で計算量が少ない検出処理は、ステップＳ１５と同様、例えば色ヒストグラムを用いた検出処理である。 On the other hand, in step S13 described above, if there is no overlapping region of the feature point existence ranges, only detection processing with low accuracy and a small amount of calculation is performed to obtain one “feature point candidate point” (step S17). This detection process with low accuracy and a small amount of calculation is a detection process using, for example, a color histogram, as in step S15.

続いて、特徴点が検出されたか否かを判断し（ステップＳ１８）、特徴点が検出されていない場合には、ステップＳ１０に戻って撮像部１０による画像の撮像を行う。 Subsequently, it is determined whether or not a feature point has been detected (step S18). If no feature point has been detected, the process returns to step S10 and an image is captured by the imaging unit 10.

特徴点が検出された場合には、ジェスチャ推定部１６が、特徴点の位置情報をもとにユーザのジェスチャを推定する（ステップＳ１９）。ここでは、例えば、複数フレームに亘る撮像画像の変化に基づき、特徴点１（ユーザの頭）の位置に対して、特徴点２（ユーザの手）の位置が上下に変化していると判断した場合には、ユーザが右手を上下に動かしている等のジェスチャを推定する。 When the feature point is detected, the gesture estimation unit 16 estimates the user's gesture based on the position information of the feature point (step S19). Here, for example, based on the change of the captured image over a plurality of frames, it is determined that the position of the feature point 2 (user's hand) is changing up and down with respect to the position of the feature point 1 (user's head). In such a case, a gesture such as the user moving his / her right hand up and down is estimated.

なお、「頭」とは、人間の首より上の部分であって、顔も含むものとする。また、ユーザの手の動きを判断する基準にできる部分であれば、ユーザの頭の代わりに、他の部分を特徴点として用いてもよい。 It should be noted that the “head” is a portion above a human neck and includes a face. Further, as long as it is a part that can be used as a reference for determining the movement of the user's hand, another part may be used as a feature point instead of the user's head.

次に、ジェスチャが推定されたか否かを判断し（ステップＳ２０）、ジェスチャが推定された場合には、操作信号送信部１７が、ジェスチャに対応する操作信号を生成し、遠隔操作対象機器１３に送信する（ステップＳ２１）。ジェスチャをまだ推定できていない場合、及び、操作情報の送信が完了した場合には、ステップＳ１０に戻って撮像部１０による画像の撮像を行い、上述した各処理（ステップＳ１０〜Ｓ２１）を繰り返す。 Next, it is determined whether or not a gesture has been estimated (step S20). If the gesture has been estimated, the operation signal transmission unit 17 generates an operation signal corresponding to the gesture and sends it to the remote operation target device 13. Transmit (step S21). When the gesture has not been estimated yet and when the transmission of the operation information has been completed, the process returns to step S10 to capture an image by the imaging unit 10, and the above-described processes (steps S10 to S21) are repeated.

遠隔操作対象機器１３の機器制御部１３ａは、受信した操作情報に対応した動作を実行するよう遠隔操作対象機器１３を制御する。このようにして、遠隔操作対象機器１３の遠隔操作が実現される。 The device control unit 13a of the remote operation target device 13 controls the remote operation target device 13 so as to execute an operation corresponding to the received operation information. In this way, remote operation of the remote operation target device 13 is realized.

以上説明したように、この実施の形態１では、特徴点存在範囲３０，３１の重なり合う領域３４については、高精度で計算量が多い（負荷が大きい）検出処理を行い、特徴点存在範囲３０，３１の重なり合わない領域３２，３３については、低精度で計算量が少ない（負荷が小さい）検出処理を行っている。このように、高精度で計算量が多い検出処理を行う領域を限定することにより、簡単な処理で遠隔操作を実現することができる。 As described above, in the first embodiment, the region 34 where the feature point existence ranges 30 and 31 overlap is detected with a high accuracy and a large amount of calculation (a large load), and the feature point existence range 30 and 31 are detected. For the non-overlapping areas 32 and 33, detection processing is performed with low accuracy and a small amount of calculation (load is small). In this way, by limiting the area where detection processing with high accuracy and large amount of calculation is performed, remote operation can be realized with simple processing.

このように、簡単な処理でジェスチャによる遠隔操作を行うことができるため、例えば家庭用電化製品のように比較的ハードウェアリソースの少ない機器でも、遠隔操作装置を実現することが可能になる。 As described above, since the remote operation by the gesture can be performed with a simple process, the remote operation device can be realized even with a device having relatively few hardware resources such as a home appliance.

また、特徴点存在範囲３０，３１の重なり合う領域３４では、高精度で計算量が多い検出処理を行うようにしたため、複数の特徴点が接近して存在している場合であっても、これらを正確に区別して認識することができ、その結果、ユーザのジェスチャを正確に読み取ることができる。 In addition, in the region 34 where the feature point existence ranges 30 and 31 overlap, detection processing with high accuracy and a large amount of calculation is performed, so even if a plurality of feature points exist close to each other, Therefore, the user's gesture can be read accurately.

また、特徴点としてユーザの手と頭とを用いることにより、ユーザの手を動かすジェスチャを頭の位置を基準として推定することが容易になる。 Further, by using the user's hand and head as the feature points, it becomes easy to estimate a gesture for moving the user's hand based on the position of the head.

実施の形態２．
図７は、本発明の実施の形態２における特徴点存在範囲の推定処理の流れを示すフローチャートである。この実施の形態２における特徴点存在範囲の推定処理は、実施の形態１で説明した特徴点存在範囲推定部１４（図１）により実行される特徴点存在範囲の推定処理（図６のステップＳ１１）の一具体例に関するものである。 Embodiment 2. FIG.
FIG. 7 is a flowchart showing the flow of the feature point existence range estimation process according to Embodiment 2 of the present invention. The feature point existence range estimation process in the second embodiment is performed by the feature point existence range estimation unit 14 (FIG. 1) described in the first embodiment (step S11 in FIG. 6). ).

特徴点存在範囲の推定処理では、まず、撮像画像と、背景画像又は数フレーム前の画像との差分をとることにより、撮像画像のうち変化のあった部分を特定する（ステップＳ３０）。さらに、その中から、前フレームの撮像画像において特徴点が存在していた位置の近傍を検索する（ステップＳ３１）。フレームの時間間隔は、例えば６０ｍｓ程度と短く、ジェスチャの最中であったとしても１フレーム間における特徴点の位置の変化は僅かである。そのため、前フレームの撮像画像において特徴点が存在していた位置の近傍には、現フレームでも特徴点が存在すると考えることができる。すなわち、撮像画像の変化のあった部分から、特徴点の存在する可能性の有る部分を絞り込むことができる。そこで、「撮像画像の変化のあった部分であって、前フレームでの特徴点の位置の近傍部分」を、特徴点存在範囲として推定する（ステップＳ３２）。 In the estimation process of the feature point existence range, first, a difference between the captured image and the background image or an image several frames before is determined to identify a changed portion of the captured image (step S30). Furthermore, the vicinity of the position where the feature point existed in the captured image of the previous frame is searched from among them (step S31). The time interval of the frames is as short as about 60 ms, for example, and even if the gesture is being performed, the change in the position of the feature point between the frames is slight. Therefore, it can be considered that the feature point exists in the vicinity of the position where the feature point existed in the captured image of the previous frame even in the current frame. That is, it is possible to narrow down a portion where a feature point may exist from a portion where the captured image has changed. Therefore, “the part where the captured image has changed and the vicinity of the position of the feature point in the previous frame” is estimated as the feature point existence range (step S32).

ステップＳ３０において撮像画像との差分をとる対象としては、予め撮影してある背景画像、又は、数フレーム前の撮像画像を用いることができる。差分を求める際には、各画素におけるＲＧＢ空間の値の差分を求めてもよいし、ＲＧＢ空間を別の色空間、例えばＨＳＶ空間の値に変換した値による差分を求めてもよい。 As a target for obtaining a difference from the captured image in step S30, a background image captured in advance or a captured image several frames before can be used. When obtaining the difference, the difference between the values of the RGB space in each pixel may be obtained, or the difference obtained by converting the RGB space into a value of another color space, for example, the HSV space may be obtained.

背景画像との差分（以下、背景差分とする。）を用いた場合には、撮像画像を背景部分と前景部分とに区別することができる。さらに、背景部分は画像がほとんど変化していない部分であり、この部分に特徴点が含まれている可能性は低いため、ステップＳ３２において特徴点存在範囲を推定する際に、特徴点存在範囲を撮像画像内の前景部分のみに絞り込むことができる。 When a difference from the background image (hereinafter referred to as background difference) is used, the captured image can be distinguished into a background portion and a foreground portion. Further, since the background portion is a portion in which the image has hardly changed, and it is unlikely that the feature point is included in this portion, the feature point existence range is determined when estimating the feature point existence range in step S32. Only the foreground part in the captured image can be narrowed down.

なお、背景画像は、遠隔操作装置の起動時に撮像部１０が自動撮影した背景画像であってもよいし、ユーザが装置本体のボタンなどを操作することにより撮像部１０が撮影した背景画像であってもよい。但し、前景画像候補であるユーザが背景画像に含まれないように撮影する必要がある。 The background image may be a background image automatically captured by the image capturing unit 10 when the remote control device is activated, or may be a background image captured by the image capturing unit 10 by a user operating a button or the like of the apparatus main body. May be. However, it is necessary to shoot so that the user who is a foreground image candidate is not included in the background image.

背景画像は、時間経過による照明変化の影響を軽減するために、逐次更新する（ステップＳ３３）。更新は、一定周期で行われるものであってもよいし、急激な照明変化を検出した際に行われるものであってもよい。なお、照明変化が少ない場合の更新は、背景画像全体を一度に更新する代わりに、数フレームに分けて局所的に更新することにより（例えば、画像全体を１ブロック４×４のブロックで分割し、１フレームにつき各ブロック内の１画素のみを更新することにより）、処理を軽減することができる。 The background image is sequentially updated in order to reduce the influence of illumination changes over time (step S33). The update may be performed at a constant cycle, or may be performed when a sudden illumination change is detected. Note that when the lighting change is small, instead of updating the entire background image at once, it is divided into several frames and updated locally (for example, the entire image is divided into 1 block 4 × 4 blocks). Processing can be reduced by updating only one pixel in each block per frame).

一方、ステップＳ３０において、数フレーム前の撮像画像との差分（以下、フレーム間差分）を用いた場合には、この数フレームの間に生じた変化、主に撮影対象の動きを求めることができるため、ステップＳ３２において、特徴点存在範囲を、動きのあった部分の周辺に絞り込むことが可能になる。なお、フレーム間差分を用いる場合は、１フレーム目の処理を行うことができないが、その場合には１フレーム目のみ背景差分を用い、２フレーム目からはフレーム間差分を行うことも可能である。 On the other hand, in step S30, when a difference from a captured image several frames before (hereinafter referred to as inter-frame difference) is used, a change occurring during the several frames, mainly a movement of the photographing target, can be obtained. Therefore, in step S32, the feature point existence range can be narrowed down to the vicinity of the portion where the movement has occurred. Note that when the inter-frame difference is used, the process for the first frame cannot be performed, but in that case, the background difference can be used only for the first frame, and the inter-frame difference can be performed from the second frame. .

なお、２フレーム目以降では、背景差分とフレーム間差分の両方を用いるようにしてもよい。このようにすれば、特徴点存在範囲を、前景部分の中で特に動きのある部分として、さらに絞り込むことが可能になる。 In the second and subsequent frames, both the background difference and the interframe difference may be used. In this way, it is possible to further narrow down the feature point existence range as a particularly moving part in the foreground part.

ステップＳ３０において、背景差分及びフレーム間差分を求める際には、１画素ずつ差分を求めてもよいし、複数の画素、例えば２×２の矩形領域の差分をまとめて求めてもよい。１画素ずつの差分を求める場合は、その１画素ずつの差分値に対して閾値処理を行うことで、前景部分及び動きのある部分を切り出すことができる。複数の画素の差分をまとめて求める場合は、複数の画素、例えば２×２の矩形領域同士の類似度を求め、その矩形領域の類似度に対して閾値処理を行うことで、前景部分及び動きのある部分を切り出すことができる。複数の画素の類似度としては、各１画素ずつの差分の平均値を用いてもよいし、例えば正規化相関のようなパターンマッチングを行うことで得られる相関値を用いてもよい。 In step S30, when obtaining the background difference and the inter-frame difference, the difference may be obtained for each pixel, or the differences of a plurality of pixels, for example, 2 × 2 rectangular areas may be obtained together. When obtaining a difference for each pixel, a foreground portion and a portion with motion can be cut out by performing threshold processing on the difference value for each pixel. When obtaining the difference of a plurality of pixels at once, the similarity between a plurality of pixels, for example, 2 × 2 rectangular areas, is obtained, and threshold processing is performed on the similarity of the rectangular areas, so that the foreground portion and the motion It is possible to cut out a certain part. As the similarity of a plurality of pixels, an average value of differences for each pixel may be used, or a correlation value obtained by performing pattern matching such as normalized correlation may be used.

特徴点存在範囲の形状は、矩形であってもよいし、検出対象の特徴点の形状特徴に応じて定義した形状、例えば楕円形であってもよい。なお、１フレーム目若しくは前フレームで特徴点が検出されなかった場合には、上記ステップＳ３２において特徴点存在範囲を推定することができないため、撮像画像の全画素を対象として特徴点の検出処理を行う必要があるが、撮像部１０の視野外から侵入してくる検出対象に対しては、例えば撮像画像の外周から数画素までの領域に限定して検出処理を行うことで対処することができる。 The shape of the feature point existence range may be a rectangle or a shape defined according to the shape feature of the feature point to be detected, for example, an ellipse. If no feature point is detected in the first frame or the previous frame, the feature point existence range cannot be estimated in step S32. Therefore, feature point detection processing is performed on all pixels of the captured image. Although it is necessary to carry out detection, it is possible to deal with a detection target that enters from outside the field of view of the imaging unit 10 by performing detection processing only in a region from the outer periphery of the captured image to several pixels, for example. .

このように、実施の形態２によれば、撮像画像における特徴点存在範囲を推定することにより、後工程の特徴点検出処理において、特徴点存在範囲が重なり合う領域と重なり合わない領域とで検出処理の精度を分けることが可能になり、処理負担を軽減しつつ、ジェスチャによる遠隔操作を実現することができる。 As described above, according to the second embodiment, by detecting the feature point existence range in the captured image, in the feature point detection process in the subsequent process, the detection process is performed on the region where the feature point existence range overlaps with the region where the feature point existence range does not overlap. Therefore, it is possible to realize remote operation by gestures while reducing the processing load.

特に、実施の形態２においては、撮像部１０により取得した撮像画像と、背景画像又は数フレーム前の画像との差分から、撮像画像の変化した部分を求め、さらに前フレームでの特徴点位置に基づいて特徴点存在範囲を推定するようにしたので、撮像画像に含まれるノイズの影響を低減することができる。また、差分を求めるに際して、数画素をまとめて処理することにより、処理時間を短縮することが可能になる。 In particular, in the second embodiment, the changed part of the captured image is obtained from the difference between the captured image acquired by the imaging unit 10 and the background image or the image several frames before, and further the feature point position in the previous frame is obtained. Since the feature point existence range is estimated based on this, it is possible to reduce the influence of noise included in the captured image. Further, when obtaining the difference, the processing time can be shortened by processing several pixels together.

実施の形態３．
図８は、遠隔操作装置において読み取りの対象となるジェスチャの例を示す図である。ここでは、例えば、ユーザが右腕を左右に振るジェスチャ（Ａ）、右腕を上下に振るジェスチャ（Ｂ）、及び、右腕を回転させるジェスチャ（Ｃ）などが考えられる。これらのジェスチャは、ユーザの両手２，３を特徴点として用いることで検出することができるが、実際に両手２，３のみを特徴点として用いたのでは、腕の振りと体全体の動きとが区別できない場合がある。そこで、ユーザの体の位置を推定するために、さらにユーザの頭（顔も含む）１を特徴点として用いる。 Embodiment 3 FIG.
FIG. 8 is a diagram illustrating an example of a gesture to be read in the remote operation device. Here, for example, a gesture (A) in which the user swings the right arm left and right, a gesture (B) in which the right arm is swung up and down, and a gesture (C) in which the right arm is rotated can be considered. These gestures can be detected by using both hands 2 and 3 of the user as feature points. However, if only both hands 2 and 3 are actually used as feature points, the swing of the arm and the movement of the whole body May be indistinguishable. Therefore, in order to estimate the position of the user's body, the user's head (including face) 1 is further used as a feature point.

上記の特徴点１，２，３は、いずれも「肌色」という特徴があるため、特徴点を検出する際には、まず、上述した特徴点存在範囲内において、撮像画像中から肌色部分を検出する。肌色の検出は、ＲＧＢ空間上で閾値処理を行ってもよいし、ＲＧＢ空間を変換した別の色空間、例えば正規化色空間と呼ばれる色空間上で閾値処理を行ってもよい。さらに、肌色検出の結果を用い、肌色画素の塊を認識することで、特徴点候補点を検出する。 Since each of the feature points 1, 2, and 3 has the feature of “skin color”, when detecting the feature point, first, the skin color portion is detected from the captured image within the above-described feature point existence range. To do. For the detection of skin color, threshold processing may be performed on the RGB space, or threshold processing may be performed on another color space obtained by converting the RGB space, for example, a color space called a normalized color space. Furthermore, feature point candidate points are detected by recognizing a cluster of skin color pixels using the result of skin color detection.

図９は、本発明の実施の形態３における特徴点検出処理の流れを示すフローチャートである。このフローチャートは、実施の形態１で説明した特徴点検出部１５（図１）により実行されるステップＳ１３〜Ｓ１７（図６）の処理の一具体例である。 FIG. 9 is a flowchart showing a flow of feature point detection processing according to Embodiment 3 of the present invention. This flowchart is a specific example of the processing of steps S13 to S17 (FIG. 6) executed by the feature point detection unit 15 (FIG. 1) described in the first embodiment.

特徴点検出処理では、まず、実施の形態１で説明したように、特徴点存在範囲の重なり合いの有無を検出する（ステップＳ１３）。特徴点存在範囲が重なり合う領域がある場合には、その重なり合う領域について、ラベリング法による高精度で計算量が多い検出処理を行い、複数の画素の塊をそれぞれ区別して検出し（ステップＳ５０）、複数の「特徴点候補点」を得る（ステップＳ５１）。なお、ラベリング法による検出処理は、同じ連結図形に属する画素には同じラベルを割り当て、異なった連結図形に属する画素には異なったラベルを割り当てるという処理であり、検出精度は高いが、計算量が多いため処理の負荷が大きいという特徴がある。 In the feature point detection process, first, as described in the first embodiment, presence / absence of overlap of feature point existence ranges is detected (step S13). When there is an area where the feature point existence ranges overlap, a detection process with high accuracy and a large amount of calculation is performed on the overlapping areas by using a labeling method, and a plurality of pixel clusters are distinguished and detected (step S50). “Feature point candidate points” are obtained (step S51). Note that the detection process by the labeling method is a process of assigning the same label to pixels belonging to the same connected figure and assigning different labels to pixels belonging to different connected figures. Since there are many, processing load is large.

一方、特徴点存在範囲の重なり合っていない領域では、低精度で計算量が少ない色ヒストグラム法による検出処理を行い（ステップＳ５２）、一つの画素の塊を検出する。画素の塊が一つのみ検出された場合には、その画素の塊を「特徴点候補点」とする。２つ以上の画素の塊が検出された場合には、例えば面積が最大のものを選択して、一つの「最大尤度特徴点候補点」とする（ステップＳ５３）。なお、色ヒストグラム法による検出処理は、撮像画像における肌色画素の分布から特徴点存在範囲を特定する処理であり、計算量が少ないため処理の負荷は小さいが、検出精度はラベリング法よりも劣るものである。 On the other hand, in the region where the feature point existence ranges do not overlap, detection processing by the color histogram method with low accuracy and a small amount of calculation is performed (step S52), and one pixel block is detected. When only one pixel block is detected, the pixel block is set as a “feature point candidate point”. When two or more pixel clusters are detected, for example, the one having the largest area is selected and set as one “maximum likelihood feature point candidate point” (step S53). The detection process using the color histogram method is a process for identifying the feature point existence range from the skin color pixel distribution in the captured image, and the calculation load is small, so the processing load is small, but the detection accuracy is inferior to the labeling method. It is.

さらに、ステップＳ５１で得られた特徴点候補点と、ステップＳ５３で得られた特徴点候補点（最大尤度特徴点候補点）とが連続するか否かを判断し、連続している場合には、これらを合わせて一つの「特徴点」とし、連続していない場合には、別々の「特徴点」とする（ステップＳ５４）。 Furthermore, it is determined whether or not the feature point candidate points obtained in step S51 and the feature point candidate points (maximum likelihood feature point candidate points) obtained in step S53 are continuous. Are combined into one “feature point”, and if they are not continuous, separate “feature points” (step S54).

また、ステップＳ１３において特徴点存在範囲の重なり合う領域が無かった場合には、低精度で計算量が少ない色ヒストグラム法による検出処理のみを行い、一つの画素の塊を検出し、これにより一つの「特徴点候補点」を得る（ステップＳ５６）。２つ以上の画素の塊が検出された場合には、例えば面積が最大のものを選択して、「最大尤度特徴点候補点」を得る（ステップＳ５７）。 If there is no overlapping region of the feature point existence ranges in step S13, only the detection process by the color histogram method with low accuracy and a small amount of calculation is performed to detect one pixel block, thereby A “feature point candidate point” is obtained (step S56). When two or more pixel clusters are detected, for example, the one having the largest area is selected to obtain the “maximum likelihood feature point candidate point” (step S57).

色ヒストグラム法による検出処理のみを行った場合には、特徴点候補点の統合（ステップＳ５４）は不要であるため、ステップＳ５６〜５７で取得した特徴点候補点（最大尤度特徴点候補点）を、そのまま「特徴点」とする。 When only the detection process by the color histogram method is performed, the integration of the feature point candidate points (step S54) is unnecessary, and thus the feature point candidate points (maximum likelihood feature point candidate points) acquired in steps S56 to S57. Are directly used as “feature points”.

ここでは、肌色情報を用いる特徴点検出処理について説明したが、肌色情報の他に、形状情報（輪郭線等）を用いる特徴点検出処理も可能である。すなわち、例えば、前景領域において輪郭線を抽出し、その中から楕円形のような単純な図形を当てはめることで、特徴点存在範囲を単純な図形内部として推定することができる。 Here, feature point detection processing using skin color information has been described, but feature point detection processing using shape information (such as contour lines) in addition to skin color information is also possible. That is, for example, by extracting a contour line in the foreground region and applying a simple figure such as an ellipse from the contour line, it is possible to estimate the feature point existence range as a simple figure.

この輪郭線を用いた検出処理では、特徴点存在範囲の重なり合う領域において、輪郭線を検出することにより、それぞれ輪郭線で外周を規定された複数の「特徴点候補点」を得る。特徴点存在範囲が重なり合わない領域では、輪郭線で外周を規定された一つの「特徴点候補点」を得て、また、複数の画素の塊が検出された場合には面積の最大のものを選択する等して一つの「最大尤度特徴点候補点」を得る。これにより、単純な図形の輪郭線で外周を規定された部分を特徴点候補点として近似することができる。この場合も、高精度で計算量が多い検出処理で得られた特徴点候補点と、低精度で計算量が少ない検出処理で得られた特徴点候補点（最大尤度特徴点候補点）とを統合することで、「特徴点」を得る。 In the detection process using the contour line, a plurality of “feature point candidate points” each having an outer periphery defined by the contour line are obtained by detecting the contour line in the overlapping region of the feature point existence ranges. In the area where the feature point existence ranges do not overlap, one “feature point candidate point” whose outer periphery is defined by the outline is obtained, and when multiple pixel clusters are detected, the area with the largest area is obtained. For example, one “maximum likelihood feature point candidate point” is obtained. As a result, a portion whose outer periphery is defined by a simple figure outline can be approximated as a feature point candidate point. Also in this case, feature point candidate points obtained by detection processing with high accuracy and a large amount of calculation, and feature point candidate points (maximum likelihood feature point candidate points) obtained by detection processing with a low accuracy and a small amount of computation The “feature point” is obtained by integrating.

また、この実施の形態３における特徴点の検出処理においては、頭や両手といった特徴点の重心位置に関する情報を、特徴点の検出に利用することができる。すなわち、特徴点の重心位置についての情報を予め記憶しておき、頭、右手、左手といった特徴点を相互に区別するために用いることができる。前フレームで特徴点存在範囲における各特徴点の区別が正しく行われていれば、それ以降の特徴点候補点の区別も正しく行われる。これに対し、前フレームの検出されていない特徴点については、例えば、頭は両手の上に存在することが多い、右手は頭や左手に対して右側に存在することが多い、左手は頭や右手に対して左側に存在することが多い、といった情報を予め記憶しておき、この情報に基づいて頭、右手、左手を判別することができる。 Further, in the feature point detection processing according to the third embodiment, information on the gravity center position of the feature points such as the head and both hands can be used for the feature point detection. That is, information about the center of gravity of the feature points can be stored in advance and used to distinguish feature points such as the head, right hand, and left hand from each other. If the feature points are correctly distinguished in the feature point existence range in the previous frame, the feature point candidate points after that are correctly distinguished. On the other hand, for the feature points not detected in the previous frame, for example, the head is often present on both hands, the right hand is often present on the right side of the head and left hand, and the left hand is located on the head or Information such as being often present on the left side with respect to the right hand is stored in advance, and the head, right hand, and left hand can be determined based on this information.

また、特徴点の重心位置に関する情報のほかに、特徴点候補点に接する図形（例えば楕円、矩形など）を用い、この図形の中心位置、形状、傾き等を、特徴点の位置及び姿勢の検出に利用することができる。例えば、特徴点候補領域に接する矩形を用いた場合には、その矩形の形状の変化によって、特徴点の位置だけでなく、手を広げているか閉じているかを検出することもできる。このようにすれば、ジェスチャ入力の種類を多様化することができる。また、特徴点の位置情報だけでは読み取り誤差が生じてしまうような場合に、図形の形状や傾きの情報を誤差の低減に利用することもできる。 In addition to information on the center of gravity of the feature point, a figure (for example, an ellipse, a rectangle, etc.) that touches the feature point candidate point is used, and the center position, shape, inclination, etc. of this figure are detected for the position and orientation of the feature point. Can be used. For example, when a rectangle in contact with the feature point candidate area is used, it is possible to detect not only the position of the feature point but also whether the hand is spread or closed by changing the shape of the rectangle. In this way, the types of gesture input can be diversified. Further, in the case where a reading error occurs only with the position information of the feature point, information on the shape and inclination of the figure can be used for reducing the error.

以上説明したように、この実施の形態３によれば、色情報や形状情報（例えば輪郭線）を用いることにより、テンプレートマッチングのような複雑な処理を行わなくとも、特徴点としての両手や頭を検出することが可能となる。 As described above, according to the third embodiment, by using color information and shape information (for example, contour lines), both hands and heads as feature points can be obtained without performing complicated processing such as template matching. Can be detected.

また、特徴点の検出に、重心位置や特徴点に接する図形を利用することにより、ジェスチャ入力の種類を多様化することができ、また、特徴点の位置情報に含まれる誤差を軽減することができる。 In addition, the type of gesture input can be diversified by using the position of the center of gravity and the figure in contact with the feature point to detect the feature point, and the error included in the position information of the feature point can be reduced. it can.

実施の形態４．
図１０は、実施の形態４に係るジェスチャ推定処理の流れを示すフローチャートである。この実施の形態４は、実施の形態１で説明したジェスチャ推定部１６（図１）によるジェスチャ推定処理（図６のステップＳ１９）の一具体例に関するものである。図１１は、図１０に示した処理で用いられる動きベクトルの一例を示す図である。 Embodiment 4 FIG.
FIG. 10 is a flowchart showing a flow of gesture estimation processing according to the fourth embodiment. The fourth embodiment relates to a specific example of the gesture estimation process (step S19 in FIG. 6) by the gesture estimation unit 16 (FIG. 1) described in the first embodiment. FIG. 11 is a diagram illustrating an example of a motion vector used in the processing illustrated in FIG.

図１０に示すジェスチャ推定処理では、まず、特徴点の検出処理（図６のステップＳ１８）で検出された特徴点の位置情報に基づき、特徴点の動きを検出する（ステップＳ６０）。特徴点の動きが検出されなかった場合には（ステップＳ６１）、特徴点の動きを検出する処理（ステップＳ６０）に戻る。特徴点の動きが検出された場合には、その動きに対応するジェスチャを推定する（ステップＳ６２）。その後の処理は、図６（ステップＳ２０，Ｓ２１）を参照して説明したとおりである。 In the gesture estimation process shown in FIG. 10, first, the movement of the feature point is detected based on the position information of the feature point detected in the feature point detection process (step S18 in FIG. 6) (step S60). If no feature point motion is detected (step S61), the process returns to the feature point motion detection (step S60). When the movement of the feature point is detected, a gesture corresponding to the movement is estimated (step S62). Subsequent processing is as described with reference to FIG. 6 (steps S20 and S21).

上述したステップＳ６２において、特徴点の動きに対応するジェスチャを推定する際には、図１１のように、特徴点のフレーム毎の動きベクトル８０として、上（１）、左上（２）、左（３）、左下（４）、下（５）、右下（６）、右（７）、右上（８）の８方向の単位ベクトルで定義される「動きＩＤ８１」を用い、この動きＩＤを用いて特徴点の動きを近似して表現する。ジェスチャ推定部１６（図１）は、これらの動きＩＤ８１をフレーム毎に組み合わせたものを、ユーザのジェスチャパターンに対応させて登録している。 In the above-described step S62, when a gesture corresponding to the motion of the feature point is estimated, as shown in FIG. 11, the motion vector 80 for each frame of the feature point is represented by up (1), top left (2), left ( 3) Using “motion ID 81” defined by unit vectors in 8 directions of lower left (4), lower (5), lower right (6), right (7), and upper right (8), and using this motion ID To approximate the motion of the feature points. The gesture estimation unit 16 (FIG. 1) registers a combination of these motion IDs 81 for each frame in association with the user's gesture pattern.

図１２（Ａ）及び（Ｂ）は、ジェスチャ推定部１６に記憶された、動きＩＤ８１を組み合わせたジェスチャパターン列の例を示す図である。例えば、ユーザが手を上下に２度繰り返し動かすジェスチャＡは、特徴点の動きが「上，下，上，下」となるため、図１２（Ａ）に示す１，５，１，５のジェスチャパターン列８２と対応付けられる。また、ユーザが手を上下左右に動かすジェスチャＢは、特徴点の動きが「上，下，左，右」となるため、図１２（Ｂ）に示す１，５，３，７のジェスチャパターン列８３が対応付けられる。 12A and 12B are diagrams illustrating examples of gesture pattern sequences that are combined with the motion ID 81 and are stored in the gesture estimation unit 16. For example, in the gesture A in which the user repeatedly moves his / her hand up and down twice, the movements of the feature points are “up, down, up, down”, and thus gestures 1, 5, 1, and 5 shown in FIG. Corresponding to the pattern row 82. In addition, the gesture B in which the user moves his / her hand up / down / left / right has the movements of the feature points “up, down, left, right”, and therefore the gesture pattern strings 1, 5, 3, and 7 shown in FIG. 83 is associated.

このジェスチャ推定処理では、特徴点の動きベクトルを、予め登録したジェスチャパターン列８２，８３と比較することで、ユーザのジェスチャの推定を行う。このジェスチャ推定処理においては、フレーム毎の特徴点の動きベクトルと、ジェスチャパターン列とを比較し、適当でないものをジェスチャパターンの選択肢から順次削除していくことが好ましい。 In this gesture estimation process, the motion vector of a feature point is compared with previously registered gesture pattern sequences 82 and 83 to estimate the user's gesture. In this gesture estimation process, it is preferable to compare the motion vector of the feature point for each frame and the gesture pattern sequence and sequentially delete inappropriate ones from the gesture pattern options.

例えば、最初のフレームでの特徴点の動きベクトルが１（上）で、２フレーム目での特徴点の動きベクトルが５（下）であった場合には、ジェスチャＡ，Ｂの両方の可能性があるが、３フレーム目での特徴点の動きベクトルが１（上）であった場合には、ジェスチャＢ（３フレーム目は３）の可能性がなくなるため、この３フレーム目においてジェスチャＢを選択肢から除外する。４フレーム目で特徴点の動きベクトルが５の場合には、ジェスチャＡのパターン（１，５，１，５）と一致することが確認されるため、ユーザのジェスチャがジェスチャＡであると判断する。このようにすれば、選択肢が徐々に狭まっていくため、ジェスチャ推定処理を簡単に行うことができる。 For example, if the motion vector of the feature point in the first frame is 1 (upper) and the motion vector of the feature point in the second frame is 5 (lower), the possibility of both gestures A and B However, if the motion vector of the feature point in the third frame is 1 (upper), there is no possibility of gesture B (3 in the third frame). Exclude from the choices. When the motion vector of the feature point is 5 in the fourth frame, since it is confirmed that the feature point matches the pattern (1, 5, 1, 5) of the gesture A, it is determined that the user's gesture is the gesture A. . In this way, since the options gradually narrow, the gesture estimation process can be performed easily.

なお、図１２（Ｃ）に示すように、ジェスチャパターン列に、ジェスチャの動きの速さの情報を加えることもできる。ジェスチャの動きの速さは、例えば、前フレームと現フレームとの間の変位の大きさを検出し、予め設定した基準値と比較することで、低速か高速かを判断する。この場合、例えば、ジェスチャパターン列の動きＩＤに続けて、低速及び高速を示す符号Ｈ，Ｌを付すことができる。一例として、ユーザが素早く上下に手を振るジェスチャには、１，Ｈ，５，Ｈ，１，Ｈ，５，Ｈというジェスチャパターン列を対応付けることができる。このようにすれば、定義可能なジェスチャパターンの種類を増やすことができる。 As shown in FIG. 12C, information on the speed of gesture movement can be added to the gesture pattern sequence. The speed of movement of the gesture is determined, for example, by detecting the magnitude of displacement between the previous frame and the current frame and comparing it with a preset reference value. In this case, for example, symbols H and L indicating low speed and high speed can be added following the motion ID of the gesture pattern sequence. As an example, a gesture pattern string of 1, H, 5, H, 1, H, 5, H can be associated with a gesture in which the user quickly shakes his / her hand up and down. In this way, the types of gesture patterns that can be defined can be increased.

また、ユーザの手の動きが曖昧な場合もあるため、ジェスチャパターン列毎にに優先度を設定することが好ましい。例えば、ユーザの手の挙げ方が中途半端で、特徴点の動きが「上，下，上，下」か「右上，下，右上，下」かが曖昧な場合には、ジェスチャパターン列の「１，５，１，５」及び「１，２，１，２」の優先度を比較し、優先度の高い方を選択するようにすることが好ましい。 In addition, since the movement of the user's hand may be ambiguous, it is preferable to set a priority for each gesture pattern sequence. For example, if the user's hand is halfway and the movement of the feature point is ambiguous “upper, lower, upper, lower” or “upper right, lower, upper right, lower”, the gesture pattern string “ It is preferable to compare the priorities of “1, 5, 1, 5” and “1, 2, 1, 2” and select the higher priority.

また、特徴点の動きベクトルとジェスチャパターンとの比較を随時行うためには、ユーザのジェスチャの開始時点を判別する必要がある。そこで、ユーザの手が一旦静止したことを確認してからジェスチャの推定を行うものとする。 In order to compare the motion vector of the feature point with the gesture pattern as needed, it is necessary to determine the start time of the user's gesture. Therefore, it is assumed that the gesture is estimated after confirming that the user's hand has once stopped.

このように、この実施の形態４によると、複数通りの方向を向いた単位ベクトル（動きベクトル）を用いることにより、特徴点の動きの方向を検出するだけで特徴点の動きを判断することができるため、ジェスチャの推定に要する計算量やメモリ消費量を小さくすることができる。 As described above, according to the fourth embodiment, it is possible to determine the motion of a feature point only by detecting the direction of motion of the feature point by using a unit vector (motion vector) directed in a plurality of directions. Therefore, it is possible to reduce the calculation amount and memory consumption required for gesture estimation.

また、前フレームからの変位の大きさと基準値とを比較し、動きの速さに基づくジェスチャ推定を行うことにより、より多様なジェスチャパターンに対応した遠隔操作を実現することができる。 Further, by comparing the magnitude of the displacement from the previous frame with a reference value and performing gesture estimation based on the speed of movement, remote control corresponding to a wider variety of gesture patterns can be realized.

本発明の活用例として、既存の家庭用デジタル電化製品（デジタル家電）に適用することで、既存の赤外線リモコンを用いずその操作が可能になる。 As an application example of the present invention, by applying it to an existing home digital appliance (digital home appliance), the operation can be performed without using an existing infrared remote controller.

この発明の実施の形態１に係る遠隔操作装置の構成を示すブロック図である。It is a block diagram which shows the structure of the remote control apparatus which concerns on Embodiment 1 of this invention. この発明の実施の形態１における撮像画像の一例を示す図である。It is a figure which shows an example of the captured image in Embodiment 1 of this invention. この発明の実施の形態１における撮像画像の分割例を示す図である。It is a figure which shows the example of a division | segmentation of the captured image in Embodiment 1 of this invention. この発明の実施の形態１における撮像画像の分割方法を説明するための模式図である。It is a schematic diagram for demonstrating the division method of the captured image in Embodiment 1 of this invention. この発明の実施の形態１に係る特徴点存在範囲の統合を説明するための模式図である。It is a schematic diagram for demonstrating integration of the feature point presence range which concerns on Embodiment 1 of this invention. この発明の実施の形態１に係る遠隔操作処理の全体の流れを示すフローチャートである。It is a flowchart which shows the whole flow of the remote control process which concerns on Embodiment 1 of this invention. この発明の実施の形態２に係る特徴点存在範囲の推定処理を示すフローチャートである。It is a flowchart which shows the estimation process of the feature point presence range which concerns on Embodiment 2 of this invention. この発明の実施の形態３に係るジェスチャの一例を示す図である。It is a figure which shows an example of the gesture which concerns on Embodiment 3 of this invention. この発明の実施の形態３に係る特徴点検出処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the feature point detection process which concerns on Embodiment 3 of this invention. この発明の実施の形態４に係るジェスチャ推定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the gesture estimation process which concerns on Embodiment 4 of this invention. この発明の実施の形態４に係るジェスチャ推定処理で用いられる動きベクトルを示す図である。It is a figure which shows the motion vector used by the gesture estimation process which concerns on Embodiment 4 of this invention. この発明の実施の形態４に係るジェスチャ推定処理で用いられるジェスチャパターン列の一例を示す図である。It is a figure which shows an example of the gesture pattern row | line | column used by the gesture estimation process which concerns on Embodiment 4 of this invention.

Explanation of symbols

１ジェスチャ認識部、１０撮像部、１１画像処理部、１２認識処理部、１３対象機器制御部、１４特徴点存在範囲推定部、１５特徴点検出部、１６ジェスチャ推定部、１７操作信号送信部、２０領域境界、２１特徴点の存在する可能性のある領域、２２特徴点の存在する可能性のない領域、３０特徴点１の存在領域、３１特徴点２の存在領域、３２特徴点１のみが存在する可能性のある領域、３３特徴点２のみが存在する可能性のある領域、３４特徴点１，２が存在する可能性のある領域、８０動きベクトル、８１動きＩＤ、８２，８３ジェスチャパターン列。
DESCRIPTION OF SYMBOLS 1 Gesture recognition part, 10 Imaging part, 11 Image processing part, 12 Recognition processing part, 13 Target apparatus control part, 14 Feature point existence range estimation part, 15 Feature point detection part, 16 Gesture estimation part, 17 Operation signal transmission part, 20 region boundary, 21 region where feature point may exist, 22 region where feature point may not exist, 30 region where feature point 1 exists, 31 region where feature point 2 exists, 32 only feature point 1 An area that may exist, 33 An area where only feature point 2 may exist, 34 An area where feature points 1 and 2 may exist, 80 Motion vector, 81 Motion ID, 82, 83 Gesture pattern Column.

Claims

Imaging means for acquiring a captured image including an image of the user;
Feature point detecting means for detecting a feature point from the captured image;
Gesture estimation means for estimating the user's gesture based on the detected feature points;
An operation signal transmitting means for generating an operation signal corresponding to the gesture and transmitting the operation signal to a remote operation target device;
In the captured image, further comprising a feature point existence range estimation means for estimating a feature point existence range as a region where the feature point may exist,
The feature point detection means performs a first detection process in a region where a plurality of the feature point existence ranges overlap in the captured image, and a plurality of the feature point existence ranges do not overlap, and one feature point exists. In a region that is only a range, a second detection process that is less accurate than the first detection process but has a smaller amount of calculation is performed, and the feature points are detected by integrating the results of both detection processes. Features remote control device.

The remote control device according to claim 1 , wherein the feature point detection unit uses a user's head and hand as the feature point.

The feature point detection unit, the remote control device according to claim 1 or 2, characterized in that the detection of the feature point based on color information.

The feature point detection unit, the remote control device according to claim 1 or 2, characterized in that the detection of the feature point based on information of the contour line.

The feature point detecting means utilizes the information of the center of gravity of the feature point, the remote operation device according to any one of claims 1 to 4, characterized in that the detection of the feature point.

The feature point detecting means utilizes the shape in contact with the feature point, the remote operation device according to any one of claims 1 to 4, characterized in that the detection of the feature point.

The gesture estimating means, remotely according to the movement of the feature point, in any one of claims 1, characterized in that expressed using the motion vector is a unit vector oriented in the direction of the plurality of types up to 6 Operating device.

The remote operation device according to claim 7 , wherein the gesture estimation unit defines a gesture to be estimated using a pattern configured by the sequence of motion vectors.

The gesture estimation means represents the motion of the feature point using a motion vector that is a unit vector directed in a plurality of directions, and represents the speed of motion of the feature point by a comparison result with a reference value. The remote control device according to any one of claims 1 to 6, wherein:

The gesture estimating means, wherein the sequence of motion vectors, by combining the result of comparison between the reference value of the speed of movement of the feature point, claim 9, characterized in that to define the gesture estimation target The remote control device described in 1.

The gesture estimating means from the gesture choices, by sequentially excluding incompatible choices based on the feature point, in any one of claims 1 to 10, characterized in that an estimate of the gesture The remote control device described.

An imaging step of acquiring a captured image including an image of the user;
A feature point detecting step of detecting a feature point from the captured image;
A gesture estimation step of estimating the user's gesture based on the detected feature points;
An operation signal transmission step of generating an operation signal corresponding to the gesture and transmitting the operation signal to a remote operation target device,
A feature point existence range estimation step for estimating a feature point existence range as an area where the feature points may exist in the captured image;
In the feature point detection step,
The Oite the captured image, the plurality of the regions feature point existing range overlap, performing a first detection process, do not overlap the plurality of the feature points existing range, only the feature points existing range of one region Then, the second detection process, which is less accurate than the first detection process but has a smaller calculation amount, is performed, and the feature points are detected by integrating the results of both detection processes. Method.

The remote operation method according to claim 12 , wherein in the feature point detection step, a user's head and hand are used as the feature points.

The remote operation method according to claim 12 or 13 , wherein, in the feature point detection step, the feature point is detected based on color information.

The remote operation method according to claim 12 or 13 , wherein, in the feature point detection step, the feature points are detected based on contour information.

The remote operation method according to any one of claims 12 to 15, wherein, in the feature point detection step, the feature point is detected using information on a centroid position of the feature point.

16. The remote operation method according to claim 12, wherein, in the feature point detection step, the feature points are detected using a figure in contact with the feature points.

In the gesture estimation step, the remote according to movement of the feature point, in any one of claims 12, characterized in that expressed using the motion vector is a unit vector oriented in the direction of the plurality of types up to 17 Method of operation.

19. The remote operation method according to claim 18 , wherein in the gesture estimation step, a gesture to be estimated is defined using a pattern configured by the sequence of motion vectors.

In the gesture estimation step, the motion of the feature point is represented by using a motion vector that is a unit vector facing a plurality of directions, and the speed of the feature point motion is represented by a comparison result with a reference value. The remote control method according to any one of claims 12 to 17 , characterized by:

In the gesture estimation step, by combining the string of the motion vector, and a comparison result between the reference value of the speed of movement of the feature point, claims, characterized in that to define the gesture estimation target 20 The remote control method described in 1.

The gesture estimation step estimates the gesture by sequentially excluding from the gesture options non-conforming options based on the feature points, according to any one of claims 12 to 21. Remote control method.