JP6113018B2

JP6113018B2 - Object detection device

Info

Publication number: JP6113018B2
Application number: JP2013160843A
Authority: JP
Inventors: 黒川　高晴; 高晴黒川; 匠宗片
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2013-08-01
Filing date: 2013-08-01
Publication date: 2017-04-12
Anticipated expiration: 2033-08-01
Also published as: JP2015032119A

Description

本発明は、入力されたデータから検出対象を検出する対象検出装置に関する。 The present invention relates to a target detection apparatus that detects a detection target from input data.

従来、防犯、写真もしくはビデオの撮影検索または整理等の様々な目的のために、画像、音声またはセンサー信号といったデータから人体等の検出対象を検出する技術が研究されている。特に近年では機械学習により生成した識別器を用いてデータから検出対象を検出する技術が広く研究されている。例えば、画像に人体が写っているか否かを判定する識別器は、人体が写っている多数の学習用画像と人体が写っていない多数の学習用画像のそれぞれから抽出した特徴量を用い、特徴量空間において人体の特徴量が分布する空間とそれ以外の空間とを分ける識別境界を機械学習することによって生成される。この識別器は、画像から抽出した特徴量が入力されると、その特徴量が特徴量空間において識別境界のどちら側に位置するかによりその画像に人体が写っているか否かを判定する。 2. Description of the Related Art Conventionally, techniques for detecting a detection target such as a human body from data such as images, sound, or sensor signals have been studied for various purposes such as crime prevention, photography or video shooting search or organization. In particular, in recent years, techniques for detecting a detection target from data using a classifier generated by machine learning have been widely studied. For example, a discriminator that determines whether or not a human body is shown in an image uses features extracted from each of a large number of learning images in which the human body is reflected and a large number of learning images in which the human body is not captured. It is generated by machine learning of an identification boundary that divides a space in which a human body feature amount is distributed from a space other than the space in which the human body features are distributed. When a feature amount extracted from an image is input, the discriminator determines whether or not a human body is reflected in the image depending on which side of the feature amount the feature amount is located on the identification boundary.

しかし、人体が写っている画像及び人体が写っていない画像は多様であり、一般にこれらを完全に分離できる識別境界を見つけることは困難である。特に、例えばハンガーに掛けた上着等のように人体と類似するものが写っている画像から抽出した特徴量は識別境界に対して人体の特徴量が分布する空間側に位置する可能性が高く、その画像に人体が写っていると誤判定されるおそれがある。 However, there are a variety of images showing the human body and images not showing the human body, and it is generally difficult to find an identification boundary that can completely separate them. In particular, the feature amount extracted from an image showing something similar to the human body, such as a jacket on a hanger, is likely to be located on the space side where the human body feature amount is distributed with respect to the identification boundary. If the human body is reflected in the image, there is a risk of erroneous determination.

そこで、例えば、特許文献１には、装置の設置環境の特性にあわせて識別器を学習させる物体検出装置が提案されている。この物体検出装置は、監視カメラを設置した時に監視カメラからの画像を用いて識別器を学習させる。 Thus, for example, Patent Document 1 proposes an object detection device that learns a discriminator in accordance with the characteristics of the installation environment of the device. This object detection device learns a discriminator using an image from a surveillance camera when the surveillance camera is installed.

特開２００９−２３０２８４号公報JP 2009-230284 A

特許文献１に記載された物体検出装置は、監視カメラを設置後に監視カメラからの画像を用いて識別器を学習させることにより、設置場所に存在するハンガーに掛けた上着のように、検出対象の一例である人体に類似する物を人体と誤判定することを低減できる。しかしながら、特許文献１に記載された物体検出装置は、識別器を学習させた後に新たに検出対象に類似する物が配置された場合には、その物が写っている画像を用いて識別器を学習させていないため、その物を検出対象と誤判定する可能性がある。 The object detection apparatus described in Patent Literature 1 is a detection target like a jacket hung on a hanger existing at an installation location by learning a discriminator using an image from the monitoring camera after the monitoring camera is installed. It is possible to reduce erroneous determination of an object similar to a human body as an example of a human body. However, when an object similar to the detection target is newly arranged after learning the classifier, the object detection device described in Patent Document 1 uses the image in which the object is reflected. Since the learning is not performed, the object may be erroneously determined as a detection target.

物体検出装置は、多くの画像を用いて機械学習を行うことにより検出対象の検出精度を向上させることができる。しかし、検出対象以外の物が写っている画像は多種多様であり、どのような画像が入力されるかを予測することはできない。したがって、入力された画像から抽出した特徴量が特徴量空間において識別境界のどちら側に位置するかによりその画像に検出対象が写っているか否かを判定する技術において、全ての画像について誤判定を完全に防止するように識別器を学習させることは困難である。 The object detection apparatus can improve the detection accuracy of the detection target by performing machine learning using many images. However, there are a variety of images in which objects other than the detection target are shown, and it is impossible to predict what image will be input. Therefore, in the technology for determining whether or not the detection target is reflected in the image based on which side of the identification boundary in the feature amount space the feature amount extracted from the input image is located, erroneous determination is made for all images. It is difficult to train a classifier to completely prevent it.

このような誤判定を生じる問題は、識別器を用いて検出対象を検出する場合のみならず、パターンマッチング法等の他の方法により検出対象を検出する場合にも共通する問題であり、さらに、画像から検出対象を検出する場合のみならず、音声やセンサー信号等の各種データから検出対象を検出する場合にも共通する問題である。 The problem that causes such misjudgment is a problem that is common not only when a detection target is detected using a discriminator, but also when a detection target is detected by another method such as a pattern matching method. This is a common problem not only when detecting a detection target from an image but also when detecting a detection target from various data such as sound and sensor signals.

本発明の目的は、入力されたデータから検出対象を検出する精度を向上することができる対象検出装置を提供することにある。 An object of the present invention is to provide an object detection apparatus that can improve the accuracy of detecting a detection object from input data.

かかる課題を解決するための本発明は、データ入力部から取得した入力データに検出対象が含まれるか否かを判定する対象検出装置を提供する。係る対象検出装置は、入力データから、検出対象を表現する程度である情報レベルを互いに異ならせた変更データを生成するマルチレベルデータ生成部と、変更データのそれぞれに対し、検出対象らしさの度合いを表す評価値を算出する評価値算出部と、情報レベルにて規定される予め設定した順序にて評価値を並べて評価値系列を生成する評価値系列生成部と、評価値系列が検出対象を含む入力データから生成されたか否かを識別するための識別情報により入力データに検出対象が含まれるか否かを判定する対象判定部と、を有する。 This invention for solving this subject provides the target detection apparatus which determines whether a detection target is contained in the input data acquired from the data input part. The target detection apparatus according to the present invention has a multi-level data generation unit that generates change data with different information levels representing the detection target from input data, and the degree of likelihood of detection for each of the change data. An evaluation value calculation unit that calculates an evaluation value to be represented, an evaluation value sequence generation unit that generates an evaluation value sequence by arranging the evaluation values in a preset order prescribed by the information level, and the evaluation value sequence includes a detection target An object determination unit that determines whether or not a detection target is included in the input data based on identification information for identifying whether or not the data is generated from the input data.

また、マルチレベルデータ生成部は、第１変更処理によって入力データから検出対象を表現する程度を互いに異ならせた第１変更データを生成するとともに、第１変更処理とは異なる第２変更処理によって入力データから検出対象を表現する程度を互いに異ならせた第２変更データを生成し、第１変更データおよび第２変更データを変更データとすることが好ましい。 In addition, the multi-level data generation unit generates first change data in which the detection target is expressed from the input data by the first change process, and inputs the second change process different from the first change process. It is preferable to generate second change data having different degrees of expressing the detection target from the data, and use the first change data and the second change data as the change data.

また、評価値系列生成部は、情報レベルにて規定される予め設定した順序にて評価値を並べた原系列を平滑化して評価値系列を生成することが好ましい。 The evaluation value series generation unit preferably generates an evaluation value series by smoothing an original series in which evaluation values are arranged in a preset order defined by an information level.

本発明に係る対象検出装置は、入力されたデータから検出対象を検出する精度を向上することができるという効果を奏する。 The object detection device according to the present invention has an effect of improving the accuracy of detecting a detection object from input data.

情報レベルと評価値の関係の一例を表すグラフである。It is a graph showing an example of the relationship between an information level and an evaluation value. 情報レベルと評価値の関係の一例を表すグラフである。It is a graph showing an example of the relationship between an information level and an evaluation value. 情報レベルと評価値の関係の一例を表すグラフである。It is a graph showing an example of the relationship between an information level and an evaluation value. 情報レベルと評価値の関係の一例を表すグラフである。It is a graph showing an example of the relationship between an information level and an evaluation value. 第１の実施形態による監視システムの概略構成図である。1 is a schematic configuration diagram of a monitoring system according to a first embodiment. 第１の実施形態による監視装置の制御部の概略構成図である。It is a schematic block diagram of the control part of the monitoring apparatus by 1st Embodiment. 第１の実施形態による監視装置の対象検出処理の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the target detection process of the monitoring apparatus by 1st Embodiment. マスキング領域について説明するための模式図である。It is a schematic diagram for demonstrating a masking area | region. 複数の変更データごとに算出された評価値を所定順序に並べた評価値系列の一例を表すグラフである。It is a graph showing an example of the evaluation value series which arranged the evaluation value calculated for every some change data in the predetermined order. 複数の変更データごとに算出された評価値を所定順序に並べた評価値系列の一例を表すグラフである。It is a graph showing an example of the evaluation value series which arranged the evaluation value calculated for every some change data in the predetermined order. 第２の実施形態による監視装置の対象検出処理の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the target detection process of the monitoring apparatus by 2nd Embodiment. 評価値を平滑化した評価値系列について説明するためのグラフである。It is a graph for demonstrating the evaluation value series which smoothed the evaluation value. 評価値を平滑化した評価値系列について説明するためのグラフである。It is a graph for demonstrating the evaluation value series which smoothed the evaluation value.

以下、本発明の一実施形態による対象検出装置について図を参照しつつ説明する。
対象検出装置は、検出処理の対象となる入力データから検出対象に特有の特徴量を求め、求めた特徴量を機械学習により生成した識別器に入力して検出対象らしさの度合いを表す評価値を算出し、算出した評価値から入力データに検出対象が含まれるか否かを判定する。発明者は、鋭意検討を重ね、一つの入力データから情報レベルが互いに異なる複数のデータを生成し、複数のデータのそれぞれから評価値を算出すると、入力データに検出対象が含まれる場合、情報レベルが変化することによる評価値の変化の態様が共通の特徴を有するという知見を得た。さらに、発明者は、検出対象を含む入力データと検出対象を含まない入力データとで、情報レベルが変化することによる評価値の変化の態様が大きく異なるという知見を得た。データの情報レベルは、当該データが検出対象の特徴を表現する程度であり、当該データが検出対象の特徴を表現する詳細さの程度（表現可能な程度）、または当該データが検出対象の特徴表現に適する程度である。 Hereinafter, an object detection apparatus according to an embodiment of the present invention will be described with reference to the drawings.
The target detection device obtains a characteristic amount specific to the detection target from input data that is a target of detection processing, and inputs the calculated feature amount into a discriminator generated by machine learning to obtain an evaluation value that represents the degree of uniqueness of the detection target. It is calculated, and it is determined whether or not a detection target is included in the input data from the calculated evaluation value. The inventor has conducted intensive studies, generates a plurality of data with different information levels from one input data, calculates an evaluation value from each of the plurality of data, and if the detection target is included in the input data, the information level We obtained the knowledge that the aspect of the change in evaluation value due to the change in value has a common feature. Furthermore, the inventor has found that the input data including the detection target and the input data not including the detection target are greatly different in the aspect of the evaluation value change due to the change in the information level. The information level of the data is such that the data expresses the characteristics of the detection target, the level of detail that the data expresses the characteristics of the detection target (the level that can be expressed), or the data that represents the characteristics of the detection target It is a grade suitable for.

図１Ａ、図１Ｂ、図２Ａ及び図２Ｂに、データの情報レベルとデータの検出対象らしさの度合いを表す評価値の関係の一例を表す。図１Ａ、図１Ｂ、図２Ａ及び図２Ｂにおいて、画像１０１、１１１、２０１、２１１が、それぞれ入力データを表し、グラフ１００、１１０、２００、２１０が、各入力データについての情報レベルと評価値の関係を表す。グラフ１００、１１０、２００、２１０において、横軸が情報レベルであり、縦軸が評価値である。情報レベルが低いほどそのデータが検出対象の特徴を表現する詳細さが低くなり、情報レベルが高いほどそのデータが検出対象の特徴を表現する詳細さが高くなる。情報レベルが最も高い値（33）におけるデータは元の入力データである。評価値が0の場合、入力データから求めた検出対象についての特徴量が特徴量空間において識別境界上に位置し、評価値が正値の場合、その特徴量が特徴量空間において識別境界に対して検出対象側に位置し、評価値が負値の場合、その特徴量が特徴量空間において識別境界に対して検出対象でない側に位置していることを表す。そして評価値は、値が高いほど検出対象らしいことを表し、値が低いほど検出対象らしくないことを表す。 FIG. 1A, FIG. 1B, FIG. 2A, and FIG. 2B show an example of the relationship between the information level of data and the evaluation value that represents the degree of likelihood of detection of data. In FIGS. 1A, 1B, 2A, and 2B, images 101, 111, 201, and 211 represent input data, and graphs 100, 110, 200, and 210 represent information levels and evaluation values for each input data. Represents a relationship. In the graphs 100, 110, 200, and 210, the horizontal axis is the information level, and the vertical axis is the evaluation value. The lower the information level, the lower the detail that the data expresses the feature of the detection target, and the higher the information level, the higher the detail that the data expresses the feature of the detection target. The data with the highest information level (33) is the original input data. When the evaluation value is 0, the feature quantity for the detection target obtained from the input data is located on the identification boundary in the feature quantity space. When the evaluation value is positive, the feature quantity is compared to the identification boundary in the feature quantity space. If the evaluation value is a negative value, the feature value is located on the non-detection side with respect to the identification boundary in the feature value space. The evaluation value indicates that the higher the value, the more likely it is to be detected, and the lower the value, the less likely the detection target.

図１Ａ、図１Ｂ、図２Ａ及び図２Ｂにおいて検出対象は人体である。図１Ａは、入力データに検出対象が含まれ、且つ元の入力データに対する評価値が正値である例を示す。図１Ｂは、入力データに検出対象が含まれ、且つ元の入力データに対する評価値が負値である例を示す。図２Ａは、入力データに検出対象が含まれず、且つ元の入力データに対する評価値が負値である例を示す。図２Ｂは、入力データに検出対象が含まれず、且つ元の入力データに対する評価値が正値である例を示す。 In FIGS. 1A, 1B, 2A, and 2B, the detection target is a human body. FIG. 1A shows an example in which a detection target is included in input data and the evaluation value for the original input data is a positive value. FIG. 1B shows an example in which a detection target is included in the input data and the evaluation value for the original input data is a negative value. FIG. 2A shows an example in which a detection target is not included in the input data, and the evaluation value for the original input data is a negative value. FIG. 2B shows an example in which a detection target is not included in the input data and the evaluation value for the original input data is a positive value.

図１Ａのグラフ１００に示すように、入力データに検出対象が含まれる場合、情報レベルが低い領域（1〜18）では情報レベルの変化に関わらず評価値は変化せず、情報レベルが中程度の領域（18〜25）では情報レベルの上昇に従って評価値が下降し、情報レベルが高い領域（25〜32）では情報レベルの上昇に従って評価値が急激に上昇する傾向にある。また、図１Ｂのグラフ１１０に示す例でも、情報レベルが低い領域（1〜15）では評価値は変化せず、情報レベルが中程度の領域（15〜23）では評価値が下降し、情報レベルが高い領域（23〜32）では評価値が急激に上昇する傾向にある。このように、上記の傾向は、入力データに検出対象が含まれる場合、評価値自体の高さに関わらず見られる。すなわち、上記の傾向は、入力データから求めた検出対象についての特徴量が特徴量空間において識別境界に対して検出対象側に位置する場合でも、検出対象でない側に位置する場合でも同様に見られる。 As shown in the graph 100 of FIG. 1A, when the detection target is included in the input data, the evaluation value does not change regardless of the information level change in the low information level region (1 to 18), and the information level is medium. In the area (18 to 25), the evaluation value tends to decrease as the information level increases, and in the area (25 to 32) where the information level is high, the evaluation value tends to increase rapidly as the information level increases. In the example shown in the graph 110 of FIG. 1B, the evaluation value does not change in the low information level region (1 to 15), and the evaluation value decreases in the medium information level region (15 to 23). In the high level region (23 to 32), the evaluation value tends to increase rapidly. As described above, when the detection target is included in the input data, the above-described tendency is seen regardless of the height of the evaluation value itself. That is, the above-mentioned tendency is similarly seen whether the feature amount for the detection target obtained from the input data is located on the detection target side or the non-detection side with respect to the identification boundary in the feature amount space. .

一方、図２Ａのグラフ２００及び図２Ｂのグラフ２１０に示すように、入力データに検出対象が含まれない場合、情報レベルの変化にともない不規則に評価値が変化し、入力データに検出対象が含まれる場合に見られる傾向は見られない。このように、入力データに検出対象が含まれる場合に見られる傾向は、入力データに検出対象が含まれない場合は、評価値自体の高さに関わらず見られない。すなわち、入力データに検出対象が含まれる場合に見られる傾向は、入力データに検出対象が含まれない場合は、入力データから求めた検出対象についての特徴量が特徴量空間において識別境界に対して検出対象側に位置する場合でも、検出対象でない側に位置する場合でも同様に見られない。 On the other hand, as shown in the graph 200 of FIG. 2A and the graph 210 of FIG. 2B, when the detection target is not included in the input data, the evaluation value irregularly changes with the change in the information level, and the detection target is detected in the input data. There is no trend seen when included. As described above, the tendency seen when the detection target is included in the input data is not seen regardless of the height of the evaluation value itself when the detection target is not included in the input data. That is, the tendency seen when the detection target is included in the input data is that the feature amount of the detection target obtained from the input data is compared with the identification boundary in the feature amount space when the detection target is not included in the input data. Neither the case where it is located on the detection target side nor the case where it is located on the non-detection side is seen similarly.

そこで、本発明の一実施形態による対象検出装置は、一つの入力データから情報レベルが互いに異なる複数のデータを生成し、生成したデータごとに検出対象についての特徴量を求める。対象検出装置は、求めた特徴量を、検出対象が含まれる入力データから求めた特徴量と、含まれない入力データから求めた特徴量とを用いて予め学習された識別器に入力して評価値をそれぞれ算出する。そして、対象検出装置は、算出した評価値を情報レベルについて予め設定した順序（例えば、昇順又は降順）に並べた評価値系列を生成する。さらに、対象検出装置は、生成した評価値系列を、検出対象が含まれる入力データについて生成された評価値系列と、含まれない入力データについて生成された評価値系列とを用いて予め学習された識別器に入力して、入力データに検出対象が含まれるか否かを判定する。すなわち、対象検出装置は、検出対象についての特徴量が特徴量空間において識別境界に対してどちら側に位置するかではなく、情報レベルを変更したときの特徴量の識別境界に対する位置の変化によって、入力データに検出対象が含まれるか否かを判定する。これにより、対象検出装置は、入力データから検出対象を検出する精度の向上を図る。 Therefore, the target detection apparatus according to an embodiment of the present invention generates a plurality of data having different information levels from one input data, and obtains a feature amount for the detection target for each generated data. The target detection device evaluates the obtained feature quantity by inputting it to a classifier that has been learned in advance using the feature quantity obtained from the input data including the detection target and the feature quantity obtained from the input data not including the detection target. Each value is calculated. Then, the target detection device generates an evaluation value series in which the calculated evaluation values are arranged in an order set in advance with respect to the information level (for example, ascending order or descending order). Furthermore, the target detection device has previously learned the generated evaluation value series using the evaluation value series generated for the input data including the detection target and the evaluation value series generated for the input data not included. It is input to the discriminator, and it is determined whether or not the detection target is included in the input data. That is, the target detection device is not based on which side the feature quantity for the detection target is located in the feature quantity space with respect to the identification boundary, but by a change in the position of the feature quantity with respect to the identification boundary when the information level is changed, It is determined whether or not a detection target is included in the input data. As a result, the target detection device improves the accuracy of detecting the detection target from the input data.

以下、本発明の第１の実施形態による対象検出装置が実装された監視システムについて図を参照しつつ説明する。
本実施形態による監視システムは、監視領域へ侵入した侵入者を検知して警報を発する。この監視システムは、監視装置とセンタ装置を有する。この監視装置は、監視領域を撮影した画像に人体が写っているか否かを判定し、人体が写っていると判定すると、監視領域に人体が侵入したと判定してセンタ装置に警報を発する。つまり、本実施形態による監視システムは、入力データを画像データとし、検出対象を人体とする。 Hereinafter, a monitoring system in which an object detection apparatus according to a first embodiment of the present invention is mounted will be described with reference to the drawings.
The monitoring system according to the present embodiment detects an intruder who has entered the monitoring area and issues an alarm. This monitoring system has a monitoring device and a center device. This monitoring device determines whether or not a human body is reflected in an image obtained by photographing the monitoring area. If the monitoring device determines that a human body is captured, the monitoring device determines that a human body has entered the monitoring area and issues an alarm to the center device. That is, the monitoring system according to this embodiment uses input data as image data and a detection target as a human body.

図３は、本実施形態による監視システムの概略構成を示す図である。図３に示すように、監視システムは、一つ以上の監視装置１０と、監視装置１０と公衆通信回線を通じて接続されるセンタ装置５０を有する。監視装置１０は、監視領域に侵入者を検知すると、公衆通信回線を介して接続されたセンタ装置５０へ、侵入者が検知されたことを示す異常信号を送信する。監視装置１０は、撮像部１１、インタフェース部１２、通信部１３、記憶部１４及び制御部１５を有する。以下、監視装置１０の各部について詳細に説明する。 FIG. 3 is a diagram showing a schematic configuration of the monitoring system according to the present embodiment. As shown in FIG. 3, the monitoring system includes one or more monitoring devices 10 and a center device 50 connected to the monitoring devices 10 through a public communication line. When the monitoring device 10 detects an intruder in the monitoring area, the monitoring device 10 transmits an abnormal signal indicating that the intruder has been detected to the center device 50 connected via the public communication line. The monitoring device 10 includes an imaging unit 11, an interface unit 12, a communication unit 13, a storage unit 14, and a control unit 15. Hereinafter, each part of the monitoring apparatus 10 will be described in detail.

撮像部１１は、所定の周期（例えば200ms）で監視領域を撮影するカメラであり、例えば、２次元に配列され、受光した光量に応じた電気信号を出力する光電変換素子（例えば、ＣＣＤセンサ、Ｃ−ＭＯＳなど）と、その光電変換素子上に監視領域の像を結像するための結像光学系を有する。撮像部１１は、インタフェース部１２と接続され、撮影した撮影画像を順次インタフェース部１２へ渡す。
撮影画像は、グレースケールまたはカラーの多階調の画像とすることができる。本実施形態では、撮影画像を、横320画素×縦240画素を有し、８ビットの輝度分解能を持つグレースケール画像とした。ただし、撮影画像として、この実施形態以外の解像度及び階調を有するものを使用してもよい。 The imaging unit 11 is a camera that captures a monitoring region at a predetermined cycle (for example, 200 ms), and is, for example, a photoelectric conversion element (for example, a CCD sensor, a two-dimensional array that outputs an electrical signal corresponding to the amount of received light). And an imaging optical system for forming an image of the monitoring region on the photoelectric conversion element. The imaging unit 11 is connected to the interface unit 12 and sequentially transfers captured images to the interface unit 12.
The captured image can be a grayscale or color multi-tone image. In this embodiment, the captured image is a grayscale image having 320 pixels wide × 240 pixels vertically and having 8-bit luminance resolution. However, a photographed image having a resolution and gradation other than this embodiment may be used.

インタフェース部１２は、撮像部１１と接続されるインタフェース回路、例えばビデオインターフェースあるいはユニバーサル・シリアル・バスといったシリアルバスに準じるインタフェース回路を有する。インタフェース部１２は、制御部１５と例えばバスを介して接続され、撮像部１１から受け取った撮影画像を制御部１５へ送る。 The interface unit 12 includes an interface circuit connected to the imaging unit 11, for example, an interface circuit conforming to a serial bus such as a video interface or a universal serial bus. The interface unit 12 is connected to the control unit 15 via, for example, a bus, and sends the captured image received from the imaging unit 11 to the control unit 15.

通信部１３は、監視装置１０を公衆通信回線に接続する通信インタフェース及びその制御回路を有し、例えばバスを介して制御部１５と接続される。通信部１３は、監視領域において侵入者が検知されたことを通知する場合、制御部１５の制御に従って、監視装置１０とセンタ装置５０の間の接続処理を行う。そして、通信部１３は、監視装置１０とセンタ装置５０の間で接続が確立された後、制御部１５から受け取った異常信号をセンタ装置５０に送信する。通信部１３は、異常信号の送信が終わると、監視装置１０とセンタ装置５０の間の接続を開放する処理を行う。 The communication unit 13 includes a communication interface that connects the monitoring device 10 to a public communication line and a control circuit thereof, and is connected to the control unit 15 via, for example, a bus. When notifying that an intruder has been detected in the monitoring area, the communication unit 13 performs a connection process between the monitoring device 10 and the center device 50 according to the control of the control unit 15. Then, the communication unit 13 transmits the abnormality signal received from the control unit 15 to the center device 50 after the connection is established between the monitoring device 10 and the center device 50. When the transmission of the abnormal signal ends, the communication unit 13 performs a process of releasing the connection between the monitoring device 10 and the center device 50.

記憶部１４は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等の半導体メモリ、あるいは磁気記録媒体及びそのアクセス装置若しくは光記録媒体及びそのアクセス装置などを有する。記憶部１４は、監視装置１０を制御するためのコンピュータプログラム及び各種データを記憶し、制御部１５との間でこれらの情報を入出力する。なお、コンピュータプログラムは、ＣＤ−ＲＯＭ（Compact Disk Read Only Memory）、ＤＶＤ−ＲＯＭ（Digital Versatile Disk Read Only Memory）等のコンピュータ読取可能な記憶媒体から記憶部１４にインストールされてもよい。各種データには人体の参照データ及び評価値系列の参照データが含まれる。 The storage unit 14 includes a semiconductor memory such as a ROM (Read Only Memory) and a RAM (Random Access Memory), or a magnetic recording medium and its access device or an optical recording medium and its access device. The storage unit 14 stores a computer program and various data for controlling the monitoring device 10, and inputs and outputs these pieces of information to and from the control unit 15. The computer program may be installed in the storage unit 14 from a computer-readable storage medium such as a CD-ROM (Compact Disk Read Only Memory) or a DVD-ROM (Digital Versatile Disk Read Only Memory). The various data includes human body reference data and evaluation value series reference data.

制御部１５は、対象検出装置の例であり、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＣＵ（Micro Control Unit）等の少なくとも一つのプロセッサ及びその周辺回路を有する。制御部１５は、インタフェース部１２から受け取った撮影画像を記憶部１４に記憶する。そして、制御部１５は、記憶部１４に記憶した撮影画像を読み出してその撮影画像に人体が写っているか否かを判定し、人体が写っていると判定すると、通信部１３を介してセンタ装置５０に警報を発する。 The control unit 15 is an example of a target detection apparatus, and includes at least one processor such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), an MCU (Micro Control Unit), and its peripheral circuits. The control unit 15 stores the captured image received from the interface unit 12 in the storage unit 14. And the control part 15 reads the picked-up image memorize | stored in the memory | storage part 14, determines whether the human body is reflected in the picked-up image, and if it determines with the human body being reflected, if it determines that the human body is reflected, the center apparatus will be via the communication part 13. Alert 50.

図４は、制御部１５の概略構成を示す図である。図４に示すように制御部１５は、プロセッサ上で動作するソフトウェアにより実装される機能モジュールとして、データ入力部１５０、切り出し部１５１、マルチレベルデータ生成部１５２、評価値算出部１５３、評価値系列生成部１５４、対象判定部１５５及び通知制御部１５６を有する。
なお、制御部１５が有するこれらの各部は、独立した集積回路、ファームウェア、マイクロプロセッサなどで構成されてもよい。
以下、制御部１５の各部について詳細に説明する。 FIG. 4 is a diagram illustrating a schematic configuration of the control unit 15. As illustrated in FIG. 4, the control unit 15 includes a data input unit 150, a cutout unit 151, a multilevel data generation unit 152, an evaluation value calculation unit 153, an evaluation value series, as functional modules implemented by software operating on the processor. A generation unit 154, a target determination unit 155, and a notification control unit 156 are included.
Note that these units included in the control unit 15 may be configured by independent integrated circuits, firmware, a microprocessor, and the like.
Hereinafter, each part of the control part 15 is demonstrated in detail.

データ入力部１５０は、記憶部１４に撮影画像が記憶される度に、記憶部１４から撮影画像を読み出し、切り出し部１５１に送る。 Each time a captured image is stored in the storage unit 14, the data input unit 150 reads out the captured image from the storage unit 14 and sends it to the cutout unit 151.

切り出し部１５１は、データ入力部１５０から撮影画像を受け取る度に、データ入力部１５０から受け取った撮影画像から検出処理の対象とする所定サイズの画像を順次切り出し、切り出した画像をマルチレベルデータ生成部１５２に送る。切り出し部１５１は、撮影画像内に切り出し位置を複数設定するとともに、撮像部１１の設置状態と監視領域に侵入する人体の個人差から想定される撮影画像上の人体のサイズの範囲内で切り出す画像のサイズを複数設定し、設定した位置とサイズの組み合わせの数の部分画像を順次切り出す。例えば、切り出す画像のサイズは、水平方向64画素×垂直方向128画素に設定することができる。また、切り出し位置は、撮影画像の左上端から水平方向に切り出す画像の水平方向の長さの半分ずつずらしていった位置、及びそれらの位置からさらに垂直方向に切り出す画像の垂直方向の長さの半分ずつずらしていった位置に設定することができる。なお、例えば撮影領域が広範囲にわたる場合には、切り出し位置に応じて切り出す画像のサイズを変更してもよい。その場合、撮像部１１に近い領域が写っている撮影画像の下側に切り出し位置が位置するほど切り出す画像のサイズを大きくし、撮像部１１から離れた領域が写っている撮影画像の上側に切り出し位置が位置するほど切り出す画像のサイズを小さくすることができる。また、処理時間に余裕があれば水平方向および垂直方向にそれぞれ１画素ずつずらして切り出してもよく、ずらし幅は適宜定めることができる。以下、切り出し部１５１が切り出した画像を部分画像と称する。本実施形態では部分画像が入力データの例である。 Each time the cutout unit 151 receives a captured image from the data input unit 150, the cutout unit 151 sequentially cuts out an image of a predetermined size that is a target of detection processing from the captured image received from the data input unit 150, and the cutout image is a multilevel data generation unit. 152. The cutout unit 151 sets a plurality of cutout positions in the photographed image, and cuts out an image within the range of the human body size on the photographed image that is assumed from the installation state of the image pickup unit 11 and the individual difference of the human body entering the monitoring area. A plurality of sizes are set, and partial images corresponding to the set position and size combinations are sequentially cut out. For example, the size of the image to be cut out can be set to 64 pixels in the horizontal direction × 128 pixels in the vertical direction. The cutout position is a position shifted by half of the horizontal length of the image cut out in the horizontal direction from the upper left corner of the photographed image, and the vertical length of the image cut out further vertically from those positions. It can be set to a position shifted by half. For example, when the shooting area covers a wide range, the size of the image to be cut out may be changed according to the cutout position. In that case, the size of the clipped image is increased as the cutout position is located below the captured image in which the region close to the image capturing unit 11 is captured, and the image is clipped above the captured image in which the region distant from the image capturing unit 11 is captured. The size of the image to be cut out can be reduced as the position is located. Further, if there is a margin for processing time, it may be cut out by shifting one pixel each in the horizontal direction and the vertical direction, and the shift width can be determined as appropriate. Hereinafter, the image cut out by the cutout unit 151 is referred to as a partial image. In this embodiment, a partial image is an example of input data.

マルチレベルデータ生成部１５２は、一つの入力データから、情報レベルを互いに異ならせた複数の変更データを生成し、生成した変更データを情報レベルと対応付けて評価値算出部１５３に送る。以下、情報レベルを互いに異ならせた複数の変更データをマルチレベルデータと称する。本実施形態によるマルチレベルデータ生成部１５２は、入力データである撮影画像から切り出された部分画像から、情報レベルを互いに異ならせた画像（以下、個別レベル画像と称する）を複数生成し、元の部分画像及び生成した個別レベル画像をマルチレベルデータとして生成する。
本実施形態では、データの情報レベルが、当該データが検出対象の特徴を表現する詳細さの程度（表現可能な程度）である例について説明する。本実施形態によるマルチレベルデータ生成部１５２は、マルチレベルデータとして、情報レベルが高いほど鮮鋭度が高く、情報レベルが低いほど鮮鋭度が低い、鮮鋭度が互いに異なる複数の画像を生成する。マルチレベルデータ生成部１５２は、部分画像に平均化処理を実施することにより鮮鋭度を低くした個別レベル画像を生成する。例えば、マルチレベルデータ生成部１５２は、部分画像に、情報レベルが高いほどフィルタサイズが小さく、情報レベルが低いほどフィルタサイズが大きい平均化フィルタをそれぞれ適用することにより、鮮鋭度が互いに異なる複数の個別レベル画像を生成する。平均化フィルタのフィルタサイズを(4(n-1)+1)（例えばnは2〜33の整数）とすると、個別レベル画像の各画素には部分画像において対応する画素を中心とする(4(n-1)+1)画素×(4(n-1)+1)画素の範囲にある画素の画素値の平均値が設定される。つまり、フィルタサイズを大きくするほど、その平均化フィルタの通過帯域は狭くなり、その個別レベル画像はぼけた画像となって鮮鋭度が低くなるので、その個別レベル画像が人体の特徴を表現可能な程度は低くなって情報レベルは低くなる。例えば、情報レベルは、1から33まで33段階に定められる。そして、情報レベルが1のときのフィルタサイズは129（n=33）に設定され、情報レベルが1大きくなるごとにフィルタサイズは4小さい値に（nは1小さい値に）設定される。また、情報レベルが最も高い値である33のときは、平均化フィルタが適用されていない元の部分画像が利用される。 The multilevel data generation unit 152 generates a plurality of pieces of change data having different information levels from one input data, and sends the generated change data to the evaluation value calculation unit 153 in association with the information level. Hereinafter, a plurality of change data having different information levels are referred to as multi-level data. The multi-level data generation unit 152 according to the present embodiment generates a plurality of images (hereinafter referred to as individual level images) having different information levels from partial images cut out from a captured image that is input data. The partial image and the generated individual level image are generated as multilevel data.
In the present embodiment, an example will be described in which the information level of data is the level of detail (the level that can be expressed) that represents the characteristics of the detection target. The multilevel data generation unit 152 according to the present embodiment generates, as multilevel data, a plurality of images having higher sharpness as the information level is higher, lower sharpness as the information level is lower, and different sharpness from each other. The multi-level data generation unit 152 generates an individual level image with reduced sharpness by performing an averaging process on the partial image. For example, the multi-level data generation unit 152 applies a plurality of averaging filters having different sharpness levels to the partial image by applying averaging filters having smaller filter sizes as the information level is higher and larger filter sizes as the information level is lower. Generate individual level images. When the filter size of the averaging filter is (4 (n-1) +1) (for example, n is an integer of 2 to 33), each pixel of the individual level image is centered on the corresponding pixel in the partial image (4 An average value of the pixel values in the range of (n−1) +1) pixels × (4 (n−1) +1) pixels is set. In other words, the larger the filter size, the narrower the passband of the averaging filter becomes, and the individual level image becomes a blurred image and the sharpness decreases, so that the individual level image can express the characteristics of the human body. The degree is lower and the information level is lower. For example, the information level is defined in 33 levels from 1 to 33. When the information level is 1, the filter size is set to 129 (n = 33), and every time the information level increases by 1, the filter size is set to 4 smaller values (n is smaller by 1). When the information level is 33, which is the highest value, the original partial image to which the averaging filter is not applied is used.

評価値算出部１５３は、マルチレベルデータに含まれる各変更データごとに、検出対象らしさの度合いを表す評価値を算出し、各評価値を情報レベルと対応付けて評価値系列生成部１５４に送る。本実施形態による評価値算出部１５３は、人体の識別に有用な一つ以上の特徴量である人体特徴量としてＨＯＧ（Histograms of Oriented Gradients）特徴量を用いたリアルアダブースト（Real-Adaboost）識別器を有する。評価値算出部は、入力された画像を複数のブロックに分割し、さらに各ブロックを複数のセルに分割する。例えば、各セルは5画素×5画素の矩形領域であり、各ブロックは3セル×3セルの矩形領域である。そして、評価値算出部は、入力された画像内の各画素における画素値の勾配方向及び勾配強度を算出する。なお、勾配方向は向きを考慮する必要がないため0°〜180°の範囲で算出され、例えば22.5°ごとに分割された8方向に定められる。評価値算出部は、セルごとに、各勾配方向について各勾配方向の勾配強度の総和を度数としたヒストグラムを求め求めたヒストグラムをブロックごとに正規化したものをＨＯＧ特徴量とする。
リアルアダブースト識別器は、複数の弱識別器と、各弱識別器の判定結果を統合して判定する強識別器とから構成される。各弱識別器には、各弱識別器ごとに予め決定されたハールライク特徴量を用いて画像から特徴量を算出する。各弱識別器は、対応する部分画像に人体が写っている可能性が高いほど高く、可能性が低いほど低い値を出力する。一方、強識別器は、各弱識別器による出力値の総和を評価値として出力する。なお、この評価値は、少なくとも三値以上の値を取り、例えば連続値である。 The evaluation value calculation unit 153 calculates an evaluation value representing the degree of likelihood of detection for each change data included in the multilevel data, and sends each evaluation value to the evaluation value series generation unit 154 in association with the information level. . The evaluation value calculation unit 153 according to the present embodiment performs Real-Adaboost identification using HOG (Histograms of Oriented Gradients) feature quantities as human body feature quantities that are one or more feature quantities useful for identifying a human body. Has a vessel. The evaluation value calculation unit divides the input image into a plurality of blocks, and further divides each block into a plurality of cells. For example, each cell is a rectangular area of 5 pixels × 5 pixels, and each block is a rectangular area of 3 cells × 3 cells. Then, the evaluation value calculation unit calculates the gradient direction and gradient strength of the pixel value at each pixel in the input image. In addition, since it is not necessary to consider the direction of the gradient direction, the gradient direction is calculated in a range of 0 ° to 180 °, and is set to, for example, 8 directions divided every 22.5 °. For each cell, the evaluation value calculation unit obtains a histogram obtained by obtaining a histogram with the sum of the gradient intensities in each gradient direction as the frequency for each gradient direction, and sets the normalized value for each block as the HOG feature amount.
The real Adaboost classifier includes a plurality of weak classifiers and a strong classifier that is determined by integrating the determination results of the weak classifiers. For each weak classifier, a feature quantity is calculated from the image using a Haar-like feature quantity predetermined for each weak classifier. Each weak classifier outputs a higher value as the possibility that a human body is reflected in the corresponding partial image is higher, and a lower value as the possibility is lower. On the other hand, the strong classifier outputs the sum of the output values from the weak classifiers as an evaluation value. In addition, this evaluation value takes the value of at least 3 values or more, for example, is a continuous value.

なお、どの隣接矩形領域に関するＨＯＧ特徴量を各弱識別器として利用するかは、人体が写っている複数の学習用人体画像及び人体が写っていない複数の学習用非人体画像（以下、学習用人体画像及び学習用非人体画像をまとめて学習用画像と称する）から事前学習により決定される。学習手順の概略は以下の通りである。
（１）事前学習を行うコンピュータは、各学習用画像について、画像領域中に複数のブロックを設定し、設定した各ブロックに関するＨＯＧ特徴量をそれぞれ算出する。
（２）コンピュータは、各学習用画像に対する重みを決定する。重みの初期値は、各学習用画像に対して同じ値とする。
（３）コンピュータは、設定したブロックのそれぞれについて、そのブロックに関するＨＯＧ特徴量ごとに、学習用人体画像について設定された重みから学習用人体画像の確率密度分布W₊ ^jを算出する。なお、jはＨＯＧ特徴量の値に対応する番号である。同様に、コンピュータは、設定したブロックのそれぞれについて、そのブロックに関するＨＯＧ特徴量の値ごとに、学習用非人体画像について設定された重みから学習用非人体画像の確率密度分布W_- ^jを算出する。なお、コンピュータは、各ＨＯＧ特徴量をその値の取り得る範囲を複数に分割することによって量子化し、確率密度分布W₊ ^j及び確率密度分布W_- ^jを算出してもよい。
（４）コンピュータは、各ＨＯＧ特徴量について、学習用人体画像の確率密度分布W₊ ^jと学習用非人体画像の確率密度分布W_- ^jから評価値Zを以下の式より算出する。

なお、この結合度Zが小さいほど学習用人体画像の分布と学習用非人体画像の分布が分離していることを意味する。そのため、コンピュータは、評価値Zが最小となるブロックに関するＨＯＧ特徴量を一つの弱識別器として選択する。弱識別器の出力h(x)は、以下の式で表される。

ここで、xは弱識別器に入力するＨＯＧ特徴量の値であり、εは分母が0となることを防ぐための定数（例えば10^-10）である。式（２）に表されるように、弱識別器は、入力値（ＨＯＧ特徴量）に対応する、学習用人体画像の確率密度分布W₊ ^jが大きいほど大きい値を出力し、入力値に対応する、学習用非人体画像の確率密度分布W_- ^jが大きいほど小さい値を出力する。そして、弱識別器は、学習用人体画像の確率密度分布W₊ ^jが学習用非人体画像の確率密度分布W_- ^jより大きいとき正の値を出力し、小さいとき負の値を出力し、同一であるとき0を出力する。
（５）コンピュータは、選択したＨＯＧ特徴量を用いた弱識別器が識別に失敗した学習用画像の重みを大きくし、識別に成功した学習用画像の重みを小さくする。そして、全ての学習用画像の重みの和が１となるよう学習用画像の重みを正規化する。
（６）コンピュータは、（３）〜（５）の手順を所定回数繰り返す。
このようにして決定された各弱識別器として用いるＨＯＧ特徴量の情報とブロックを表す情報と、各弱識別器の出力関数を表す情報とは、人体の参照データとして記憶部１４に記憶される。 It should be noted that the HOG feature value for each adjacent rectangular region to be used as each weak classifier depends on a plurality of learning human body images in which a human body is photographed and a plurality of learning non-human body images in which a human body is not photographed (hereinafter referred to as learning The human body image and the learning non-human body image are collectively referred to as a learning image). The outline of the learning procedure is as follows.
(1) The computer that performs the pre-learning sets a plurality of blocks in the image area for each learning image, and calculates the HOG feature amount for each set block.
(2) The computer determines a weight for each learning image. The initial value of the weight is the same value for each learning image.
(3) For each set block, the computer calculates the probability density distribution W ₊ ^j of the learning human body image from the weights set for the learning human body image for each HOG feature amount related to the block. Note that j is a number corresponding to the value of the HOG feature value. Similarly, the computer, for each block set, for each value of the HOG features for that block, the probability density distribution W of the non-human body image for learning from weight set for the non-human body image for learning _- to calculate the ^j . The computer may quantize each HOG feature value by dividing the range of values that can be taken into a plurality of values, and calculate the probability density distribution W ₊ ^j and the probability density distribution W ₋ ^j .
(4) For each HOG feature amount, the computer calculates an evaluation value Z from the following equation using the probability density distribution W ₊ ^j of the learning human body image and the probability density distribution W ₋ ^j of the learning non-human body image.

It is to be noted that the smaller the coupling degree Z is, the more the learning human body image distribution and the learning non-human body image distribution are separated. Therefore, the computer selects the HOG feature amount related to the block having the smallest evaluation value Z as one weak classifier. The output h (x) of the weak classifier is expressed by the following equation.

Here, x is the value of the HOG feature value input to the weak classifier, and ε is a constant (for example, 10 ⁻¹⁰ ) for preventing the denominator from becoming zero. As expressed in Expression (2), the weak classifier outputs a larger value as the probability density distribution W ₊ ^{j of the} human body image for learning corresponding to the input value (HOG feature amount) is larger, and The smaller the probability density distribution W _- ^{j of the} corresponding non-human body image for learning is, the smaller the value is output. Then, the weak classifier, a probability density distribution W ₊ ^j of the learning human body image probability density distribution W of the non-human body image for learning _- outputs Toki ^j greater positive value, and outputs a negative value is smaller, Outputs 0 when they are the same.
(5) The computer increases the weight of the learning image in which the weak classifier using the selected HOG feature amount has failed to be identified, and decreases the weight of the learning image that has been successfully identified. Then, the weights of the learning images are normalized so that the sum of the weights of all the learning images is 1.
(6) The computer repeats the steps (3) to (5) a predetermined number of times.
Information on the HOG feature amount used as each weak classifier determined in this way, information representing a block, and information representing an output function of each weak classifier are stored in the storage unit 14 as reference data of the human body. .

評価値算出部１５３は、（４）で選択された各弱識別器の出力値の総和を評価値とする。この評価値は、ＨＯＧ特徴量が特徴量空間において識別境界に対して人体側の領域に位置するときに正値となり、非人体側の領域に位置するときに負値となり、その位置が識別境界から遠いほどその絶対値は大きくなり、近いほど絶対値は小さくなる。 The evaluation value calculation unit 153 uses the sum of the output values of the weak classifiers selected in (4) as an evaluation value. This evaluation value is a positive value when the HOG feature amount is located in a region on the human body side with respect to the identification boundary in the feature amount space, and a negative value when the HOG feature amount is located in a region on the non-human body side. The absolute value increases as it is farther away, and the absolute value decreases as it is closer.

以下、前述した図１Ａ、図１Ｂ、図２Ａ及び図２Ｂを用いて、評価値の算出処理について説明する。図１Ａ、図１Ｂ、図２Ａ及び図２Ｂに示した入力データ１０１、１１１、２０１、２１１は部分画像の例を示している。図１Ａ、図１Ｂ、図２Ａ及び図２Ｂに示したグラフ１００、１１０、２００、２１０は、情報レベルと、その情報レベルに対応した平均化フィルタを部分画像に適用して生成した個別レベル画像から算出した評価値の関係の例を示している。 Hereinafter, the evaluation value calculation process will be described with reference to FIGS. 1A, 1B, 2A, and 2B. The input data 101, 111, 201, and 211 shown in FIGS. 1A, 1B, 2A, and 2B are examples of partial images. Graphs 100, 110, 200, and 210 shown in FIGS. 1A, 1B, 2A, and 2B are obtained from individual level images generated by applying an information level and an averaging filter corresponding to the information level to a partial image. The example of the relationship of the calculated evaluation value is shown.

図１Ａに示す部分画像１０１は人体が写っていることが明確な画像であり、図２Ａに示す部分画像２０１は人体が写っていないことが明確な画像である。図１Ａに示すグラフ１００及び図２Ａに示すグラフ２００では、情報レベルが最も高い値である33のときの評価値、すなわち平均化フィルタが適用されていない元の部分画像に対する評価値の符号は、正解の符号を示している。そのため、部分画像１０１及び部分画像２０１については、元の部分画像に対する評価値を用いても人体が写っているか否かを正しく判別することができる。 The partial image 101 shown in FIG. 1A is a clear image that the human body is shown, and the partial image 201 shown in FIG. 2A is a clear image that the human body is not shown. In the graph 100 shown in FIG. 1A and the graph 200 shown in FIG. 2A, the evaluation value when the information level is the highest value 33, that is, the sign of the evaluation value for the original partial image to which the averaging filter is not applied is The correct sign is shown. Therefore, with respect to the partial image 101 and the partial image 201, it is possible to correctly determine whether or not a human body is captured even if the evaluation value for the original partial image is used.

一方、図１Ｂに示す部分画像１１１には人体が写っているが、背景に他の人物及び複雑な模様が写っており、部分画像１１１は人体が写っていることが明確でない画像である。また、図２Ｂに示す部分画像２１１には人体が写っていないが、人体と誤りやすい建物２１２が写っており、部分画像２１１は人体が写っていないことが明確でない画像である。図１Ｂに示すグラフ１１０及び図２Ｂに示すグラフ２１０では、情報レベルが最も高い値である33のときの評価値、すなわち平均化フィルタが適用されていない元の部分画像に対する評価値の符号は、不正解の符号を示している。そのため、部分画像１１１及び部分画像２１１については、元の部分画像に対する評価値から人体が写っているか否かを正しく判別することは困難である。 On the other hand, although the human body is shown in the partial image 111 shown in FIG. 1B, other people and complex patterns are shown in the background, and the partial image 111 is an image in which it is not clear that the human body is shown. In addition, although the human body is not shown in the partial image 211 shown in FIG. 2B, the human body and the building 212 that is likely to be mistaken are shown, and the partial image 211 is an image in which it is not clear that the human body is not shown. In the graph 110 shown in FIG. 1B and the graph 210 shown in FIG. 2B, the evaluation value when the information level is the highest value 33, that is, the sign of the evaluation value for the original partial image to which the averaging filter is not applied is The incorrect answer code is shown. Therefore, it is difficult for the partial image 111 and the partial image 211 to correctly determine whether or not a human body is captured from the evaluation value for the original partial image.

一方、図１Ａのグラフ１００では、情報レベルが低い領域（1〜18）では情報レベルの変化に関わらず評価値は変化せず、情報レベルが中程度の領域（18〜25）で情報レベルの上昇に従って評価値が下降し、情報レベルが高い領域（25〜32）で情報レベルの上昇に従って評価値が急激に上昇している。また、図１Ｂのグラフ１１０でも、情報レベルが低い領域（1〜15）では評価値は変化せず、情報レベルが中程度の領域（15〜23）で評価値が下降し、情報レベルが高い領域（23〜32）で評価値が急激に上昇している。それに対して、図２Ａのグラフ２００及び図２Ｂのグラフ２１０では、そのような傾向は見られない。
したがって、制御部１５は、情報レベルに応じた評価値の変化が、部分画像に人体が含まれる場合に見られる傾向を示すか否かを判定すれば、各部分画像１０１、１１１、２０１及び２１１について人体が写っているか否かを全て正しく判定することができる。 On the other hand, in the graph 100 of FIG. 1A, the evaluation value does not change regardless of the change in the information level in the low information level region (1-18), and the information level in the medium information region (18-25). The evaluation value decreases as the information level increases, and the evaluation value increases rapidly as the information level increases in the high information level region (25 to 32). Also, in the graph 110 of FIG. 1B, the evaluation value does not change in the region (1-15) where the information level is low, and the evaluation value decreases in the region (15-23) where the information level is medium, and the information level is high. The evaluation value rises rapidly in the region (23 to 32). On the other hand, such a tendency is not seen in the graph 200 of FIG. 2A and the graph 210 of FIG. 2B.
Therefore, if the control unit 15 determines whether or not the change in the evaluation value according to the information level indicates a tendency seen when the partial image includes a human body, each partial image 101, 111, 201, and 211 is determined. It is possible to correctly determine whether or not a human body is captured.

評価値系列生成部１５４は、マルチレベルデータに含まれる各変更データごとに算出された評価値を情報レベルについて予め設定した順序にて並べた評価値系列を生成し、生成した評価値系列を対象判定部１５５に渡す。この順序は、情報レベルにて規定される。例えば、順序を情報レベルの昇順とし、情報レベルが1〜33の33段階である場合、評価値系列生成部１５４は、各情報レベルの個別レベル画像について算出された評価値V₁ 〜V₃₃を昇順に並べた33元ベクトル（V₁,V₂,V₃,...,V₃₃）を評価値系列として生成する。なお、順序は情報レベルの降順又は別の順序とすることもでき、事前学習時の順序と対象検出時の順序が同一であればよい。 The evaluation value series generation unit 154 generates an evaluation value series in which evaluation values calculated for each change data included in the multi-level data are arranged in a preset order with respect to the information level, and the generated evaluation value series is targeted It passes to the determination part 155. This order is defined at the information level. For example, when the order is the ascending order of the information level and the information level is 33 stages of 1 to 33, the evaluation value series generation unit 154 uses the evaluation values V _{1 to} V ₃₃ calculated for the individual level images of each information level. A 33-element vector (V ₁ , V ₂ , V ₃ ,..., V ₃₃ ) arranged in ascending order is generated as an evaluation value series. Note that the order may be a descending order of information levels or another order, and the order at the time of prior learning and the order at the time of object detection need only be the same.

対象判定部１５５は、評価値系列が入力されると、その評価値系列が検出対象を含む入力データから生成されたか否かを識別するための識別情報を出力する識別器（以下、系列識別器と称する）を有する。対象判定部１５５は、その系列識別器に、評価値系列生成部１５４から受け取った評価値系列を入力したときに出力される識別情報により、入力データに検出対象が含まれるか否かを判定し、判定結果を出力する。本実施形態では、系列識別器が、識別情報として評価値系列が検出対象を含む入力データについて生成されたか否かの識別結果を出力する例について説明する。 When an evaluation value series is input, the object determination unit 155 outputs an identification information for identifying whether or not the evaluation value series is generated from input data including a detection target (hereinafter, a sequence identifier). Called). The object determination unit 155 determines whether or not a detection target is included in the input data based on the identification information output when the evaluation value sequence received from the evaluation value sequence generation unit 154 is input to the sequence identifier. , Output the judgment result. In the present embodiment, an example will be described in which the sequence discriminator outputs an identification result as to whether or not an evaluation value sequence is generated as identification information for input data including a detection target.

本実施形態による対象判定部１５５は、系列識別器としてサポートベクトルマシンを用いる。事前学習を行うコンピュータは、複数の学習用人体画像から生成された評価値系列と、複数の学習用非人体画像から生成された評価値系列とを用いて事前学習を行う。このコンピュータは、特定の画像から生成された評価値系列が入力されたときに、その画像に人体が写っているか否かを判別するための超平面識別関数を算出する。超平面識別関数g(x)は、以下の式で表される。
g(x)=w^tx+b （３）
ここで、xは系列識別器に入力する評価値系列のベクトルであり、wは重みベクトルであり、bはバイアス項である。重みベクトルwは、人体側のクラスと非人体側のクラスの間のマージンを最大化するように算出される。本実施形態の超平面識別関数は、入力された評価値系列が、特徴量空間において、識別境界に対して人体側の領域に位置するときに正値となり、非人体側の領域に位置するときに負値となり、その位置が識別境界から離れるほど絶対値が大きく、識別境界に近いほど絶対値が小さい値を出力する。つまり、この超平面識別関数の出力値は、評価値系列が検出対象を含む入力データから生成された確からしさを表し、入力データから算出された各評価値の情報レベルに応じた変化が、前述した、入力データに検出対象が含まれる場合に見られる傾向を表している度合いを表している。 The object determination unit 155 according to the present embodiment uses a support vector machine as a sequence identifier. A computer that performs pre-learning performs pre-learning using an evaluation value sequence generated from a plurality of learning human body images and an evaluation value sequence generated from a plurality of learning non-human body images. When an evaluation value series generated from a specific image is input, the computer calculates a hyperplane identification function for determining whether or not a human body is reflected in the image. The hyperplane discriminant function g (x) is expressed by the following equation.
g (x) = w ^t x + b (3)
Here, x is a vector of evaluation value series input to the sequence discriminator, w is a weight vector, and b is a bias term. The weight vector w is calculated so as to maximize the margin between the human body class and the non-human body class. The hyperplane discriminant function of the present embodiment has a positive value when the input evaluation value series is located in a region on the human body side with respect to the discrimination boundary in the feature amount space, and is located in a region on the non-human body side. The absolute value increases as the position moves away from the identification boundary, and the absolute value decreases as the position is closer to the identification boundary. That is, the output value of this hyperplane identification function represents the probability that the evaluation value series is generated from the input data including the detection target, and the change according to the information level of each evaluation value calculated from the input data is described above. The degree representing the tendency seen when the detection target is included in the input data.

系列識別器は、超平面識別関数の出力値を予め設定された判定閾値と比較する。系列識別器は、超平面識別関数の出力値が判定閾値より大きければ、入力された評価値系列が検出対象を含む入力データから生成されたことを表す識別結果（例えば１）を出力する。一方、系列識別器は、超平面識別関数の出力値が判定閾値以下であれば、入力された評価値系列が検出対象を含む入力データから生成されていないこと、つまり入力された評価値系列が検出対象を含まない入力データから生成されたことを表す識別結果（例えば０）を出力する。
なお、判定閾値として、入力データに検出対象が含まれる場合における超平面識別関数の出力値の下限値が設定される。例えば、事前の実験により人体が写った複数のテスト用人体画像に対して算出された出力値の平均値と人体が写っていない複数のテスト用非人体画像に対して算出された出力値の平均値との平均値を判定閾値とすることができる。または、テスト用非人体画像に対して算出された出力値の最大値、もしくはテスト用人体画像に対して算出された出力値の最小値を判定閾値としてもよい。つまり、判定閾値より大きい値は、評価値系列が検出対象を含む入力データから生成されたことを示す値となり、判定閾値以下の値は、評価値系列が検出対象を含む入力データから生成されていないことを示す値となる。
事前学習により決定された超平面識別関数の重みベクトルw及びバイアス項bを表す情報と判定閾値は、評価値系列の参照データとして記憶部１４に記憶される。 The sequence discriminator compares the output value of the hyperplane discrimination function with a preset determination threshold value. If the output value of the hyperplane discriminant function is greater than the determination threshold, the sequence discriminator outputs an identification result (for example, 1) indicating that the input evaluation value sequence is generated from the input data including the detection target. On the other hand, if the output value of the hyperplane discriminant function is equal to or less than the determination threshold, the series discriminator indicates that the input evaluation value series is not generated from the input data including the detection target, that is, the input evaluation value series is An identification result (for example, 0) indicating that the data is generated from input data not including the detection target is output.
Note that the lower limit value of the output value of the hyperplane identification function when the detection target is included in the input data is set as the determination threshold value. For example, the average of the output values calculated for a plurality of test human body images in which a human body is captured by a prior experiment and the average of output values calculated for a plurality of test non-human body images in which a human body is not captured An average value with the value can be used as a determination threshold value. Alternatively, the maximum output value calculated for the test non-human body image or the minimum output value calculated for the test human body image may be used as the determination threshold. That is, a value larger than the determination threshold is a value indicating that the evaluation value series is generated from input data including the detection target, and a value equal to or lower than the determination threshold is generated from the input data including the detection target. It is a value indicating that there is no.
Information representing the weight vector w and bias term b of the hyperplane discrimination function determined by the prior learning and the determination threshold are stored in the storage unit 14 as reference data of the evaluation value series.

対象判定部１５５は、評価値系列生成部１５４から受け取った評価値系列を予め学習された系列識別器に入力する。対象判定部１５５は、系列識別器から出力された識別結果が、評価値系列が検出対象を含む入力データから生成されたことを表す場合、部分画像に人体が写っていると判定し、評価値系列が検出対象を含む入力データから生成されていないことを表す場合、部分画像に人体が写っていないと判定する。 The object determination unit 155 inputs the evaluation value sequence received from the evaluation value sequence generation unit 154 to a sequence identifier that has been learned in advance. The object determination unit 155 determines that the human body is reflected in the partial image when the identification result output from the sequence identifier indicates that the evaluation value series is generated from the input data including the detection target, and the evaluation value When it represents that the series is not generated from the input data including the detection target, it is determined that the human body is not shown in the partial image.

なお、系列識別器は、識別情報として評価値系列が検出対象を含む入力データから生成されたか否かの識別結果を出力するのではなく、超平面識別関数の出力値を出力するようにしてもよい。その場合、対象判定部１５５は、系列識別器から出力された出力値が、判定閾値より大きければ、部分画像に人体が写っていると判定し、判定閾値以下であれば、部分画像に人体が写っていないと判定する。 The series discriminator may output the output value of the hyperplane discriminant function instead of outputting the discrimination result as to whether or not the evaluation value series is generated from the input data including the detection target as the discriminating information. Good. In that case, the object determination unit 155 determines that the human body is reflected in the partial image if the output value output from the sequence discriminator is larger than the determination threshold, and if the output value is equal to or smaller than the determination threshold, the human body is included in the partial image. Judge that it is not reflected.

通知制御部１５６は、対象判定部１５５によりいずれかの部分画像に人体が写っていると判定されると、異常信号を通信部１３を介してセンタ装置５０に送信する。 When the target determination unit 155 determines that a human body is reflected in any partial image, the notification control unit 156 transmits an abnormal signal to the center device 50 via the communication unit 13.

以下、図５に示したフローチャートを参照しつつ、本実施形態による監視装置１０による対象検出処理の動作を説明する。なお、以下に説明する動作のフローは、記憶部１４に記憶され、制御部１５に読み込まれたプログラムに従って、制御部１５により制御される。 Hereinafter, the operation of the object detection process by the monitoring apparatus 10 according to the present embodiment will be described with reference to the flowchart shown in FIG. An operation flow described below is controlled by the control unit 15 according to a program stored in the storage unit 14 and read by the control unit 15.

最初に、制御部１５は、撮像部１１に監視領域を撮影させて、撮影画像をインタフェース部１２を介して取得し、記憶部１４に記憶する。そして、データ入力部１５０は、撮影画像を記憶部１４から読み出し、切り出し部１５１に送る（ステップＳ５０１）。次に、切り出し部１５１は、取得した撮影画像から部分画像を切り出し、切り出した部分画像をマルチレベルデータ生成部１５２に送る（ステップＳ５０２）。なお、制御部１５は、切り出し部１５１が切り出す部分画像の数だけステップＳ５０２〜Ｓ５１２の処理を実行する。 First, the control unit 15 causes the imaging unit 11 to capture a monitoring area, acquires a captured image via the interface unit 12, and stores the captured image in the storage unit 14. Then, the data input unit 150 reads the captured image from the storage unit 14 and sends it to the cutout unit 151 (step S501). Next, the cutout unit 151 cuts out a partial image from the acquired captured image, and sends the cutout partial image to the multilevel data generation unit 152 (step S502). Note that the control unit 15 performs the processes of steps S502 to S512 by the number of partial images cut out by the cutout unit 151.

次に、制御部１５は、情報レベルを設定する（ステップＳ５０３）。なお、制御部１５は予め定められた情報レベルを低い方から順に設定し、設定する情報レベルの数だけステップＳ５０３〜Ｓ５０７の処理を実行する。 Next, the control unit 15 sets an information level (step S503). The control unit 15 sets predetermined information levels in order from the lowest, and executes the processes of steps S503 to S507 for the number of information levels to be set.

マルチレベルデータ生成部１５２は、ステップＳ５０３で設定された情報レベルに対応する個別レベル画像を生成し、生成した個別レベル画像を情報レベルと対応付けて評価値算出部１５３に送る（ステップＳ５０４）。なお、情報レベルが最大値でない場合、マルチレベルデータ生成部１５２は、部分画像からその情報レベルに対応する個別レベル画像を生成して評価値算出部１５３に送る。一方、情報レベルが最大値である場合、マルチレベルデータ生成部１５２は、部分画像をそのまま評価値算出部１５３に送る。 The multilevel data generation unit 152 generates an individual level image corresponding to the information level set in step S503, and sends the generated individual level image to the evaluation value calculation unit 153 in association with the information level (step S504). If the information level is not the maximum value, the multi-level data generation unit 152 generates an individual level image corresponding to the information level from the partial image and sends it to the evaluation value calculation unit 153. On the other hand, when the information level is the maximum value, the multi-level data generation unit 152 sends the partial image as it is to the evaluation value calculation unit 153.

次に、評価値算出部１５３は、マルチレベルデータ生成部１５２から受け取った個別レベル画像から、人体特徴量を抽出する（ステップＳ５０５）。次に、評価値算出部１５３は、抽出した人体特徴量から評価値を算出し、算出した評価値を情報レベルと対応付けて評価値系列生成部１５４に送る（ステップＳ５０６）。 Next, the evaluation value calculation unit 153 extracts a human body feature amount from the individual level image received from the multilevel data generation unit 152 (step S505). Next, the evaluation value calculation unit 153 calculates an evaluation value from the extracted human body feature amount, and sends the calculated evaluation value to the evaluation value series generation unit 154 in association with the information level (step S506).

次に、制御部１５は、全ての情報レベルについてステップＳ５０３〜Ｓ５０６の処理を実行したか否かを判定する（ステップＳ５０７）。全ての情報レベルについてステップＳ５０３〜Ｓ５０６の処理を実行していなければ（ステップＳ５０７のＮＯ）、制御部１５は、処理をステップＳ５０３に戻してステップＳ５０３〜Ｓ５０６の処理を繰り返す。一方、全ての情報レベルについてステップＳ５０３〜Ｓ５０６の処理を実行していれば（ステップＳ５０７のＹＥＳ）、評価値系列生成部１５４は、それまでに評価値算出部１５３から受け取った評価値を所定順序に並べた評価値系列を生成する（ステップＳ５０８）。 Next, the control unit 15 determines whether or not the processing of steps S503 to S506 has been executed for all information levels (step S507). If the processing in steps S503 to S506 has not been executed for all information levels (NO in step S507), the control unit 15 returns the processing to step S503 and repeats the processing in steps S503 to S506. On the other hand, if the processing of steps S503 to S506 has been executed for all information levels (YES in step S507), evaluation value series generation unit 154 outputs evaluation values received from evaluation value calculation unit 153 so far in a predetermined order. The evaluation value series arranged in (1) is generated (step S508).

次に、対象判定部１５５は、予め学習された系列識別器に、評価値系列生成部１５４が生成した評価値系列を入力し、系列識別器から出力される識別結果を取得する（ステップＳ５０９）。次に、対象判定部１５５は、系列識別器から出力された識別結果が、評価値系列が検出対象を含む入力データから生成されたことを表すか否かを判定する（ステップＳ５１０）。識別結果が、評価値系列が検出対象を含む入力データから生成されたことを表す場合（ステップＳ５１０のＹＥＳ）、対象判定部１５５は、部分画像に人体が含まれると判定し、通知制御部１５６は、異常信号を通信部１３を介してセンタ装置５０に送信する（ステップＳ５１１）。異常信号が出力されると、制御部１５は、一連のステップを終了する。 Next, the target determination unit 155 inputs the evaluation value sequence generated by the evaluation value sequence generation unit 154 to a previously learned sequence discriminator, and acquires the identification result output from the sequence discriminator (step S509). . Next, the target determination unit 155 determines whether or not the identification result output from the sequence discriminator indicates that the evaluation value sequence is generated from input data including the detection target (step S510). When the identification result indicates that the evaluation value series is generated from the input data including the detection target (YES in step S510), the target determination unit 155 determines that the partial image includes a human body, and the notification control unit 156 Transmits an abnormal signal to the center apparatus 50 via the communication unit 13 (step S511). When the abnormal signal is output, the control unit 15 ends the series of steps.

一方、識別結果が、評価値系列が検出対象を含む入力データから生成されていないことを表す場合（ステップＳ５１０のＮＯ）、対象判定部１５５は、部分画像に人体が含まれると判定しない。この場合、制御部１５は、予め定められた全ての位置及び大きさの部分画像を全て切り出したか否かを判定する（ステップＳ５１２）。部分画像を全て切り出していなければ（ステップＳ５１２のＮＯ）、制御部１５は、処理をステップＳ５０２に戻してステップＳ５０２〜Ｓ５１２の処理を繰り返す。一方、部分画像を全て切り出していれば（ステップＳ５１２のＹＥＳ）、制御部１５は、侵入者は検出されなかったものとして、一連のステップを終了する。 On the other hand, when the identification result indicates that the evaluation value series is not generated from the input data including the detection target (NO in step S510), the target determination unit 155 does not determine that the partial image includes a human body. In this case, the control unit 15 determines whether all partial images having all predetermined positions and sizes have been cut out (step S512). If all partial images have not been cut out (NO in step S512), the control unit 15 returns the process to step S502 and repeats the processes in steps S502 to S512. On the other hand, if all the partial images have been cut out (YES in step S512), the control unit 15 ends the series of steps assuming that no intruder has been detected.

なお、例えば、撮像部１１が撮影する撮影画像に人体がちょうど収まるように撮像部１１が設置されるような場合、制御部１５は、撮影画像から部分画像を切り出す必要がないため、撮影画像から直接個別レベル画像を生成してもよい。その場合、制御部１５から切り出し部１５１が省略され、図５のフローチャートにおいて、ステップＳ５０２及びＳ５１２の処理が省略される。 For example, when the imaging unit 11 is installed so that the human body just fits in the captured image captured by the imaging unit 11, the control unit 15 does not need to cut out a partial image from the captured image. Direct individual level images may be generated. In that case, the cutout unit 151 is omitted from the control unit 15, and the processes in steps S502 and S512 are omitted in the flowchart of FIG.

以上説明してきたように、本実施形態による監視装置は、監視領域を撮影した撮影画像から切り出した部分画像または撮影画像から、情報レベルを互いに異ならせた複数の個別レベル画像を生成し、生成した各個別レベル画像から評価値を算出する。そして、監視装置は、算出した評価値を情報レベルに従った所定順序に並べた評価値系列を生成し、生成した評価値系列を予め学習された系列識別器に入力して、入力データに検出対象が含まれるか否かを判定する。これにより、監視システムは、撮影画像から人体を検出する精度を向上することができる。 As described above, the monitoring apparatus according to the present embodiment generates and generates a plurality of individual level images with different information levels from partial images or captured images cut out from captured images obtained by capturing the monitoring region. An evaluation value is calculated from each individual level image. Then, the monitoring device generates an evaluation value sequence in which the calculated evaluation values are arranged in a predetermined order according to the information level, and inputs the generated evaluation value sequence to a pre-learned sequence discriminator to detect input data It is determined whether the target is included. Thereby, the monitoring system can improve the precision which detects a human body from a picked-up image.

また、本実施形態による監視装置は、評価値自体に基づいて部分画像に人体が写っているか否かを判定するのではなく、情報レベルの変化に応じた評価値の変化の度合いに基づいて部分画像に人体が写っているか否かを判定する。つまり、監視装置は、部分画像から抽出した特徴量が特徴量空間において識別境界に対して人体側に位置するか人体でない側に位置するかにより部分画像に人体が写っているか否かを判定するのではなく、情報レベルの変化に応じた、識別境界に対する特徴量の位置の変化に基づいて人体を検出している。したがって、識別境界自体を高精度に学習する必要がないため、大量の学習用画像を収集する必要がなくなり、装置の開発効率を向上できる。 In addition, the monitoring device according to the present embodiment does not determine whether a human body is shown in the partial image based on the evaluation value itself, but instead determines based on the degree of change in the evaluation value according to the change in the information level. It is determined whether or not a human body is shown in the image. That is, the monitoring apparatus determines whether the human body is reflected in the partial image based on whether the feature amount extracted from the partial image is located on the human body side or the non-human body side with respect to the identification boundary in the feature amount space. Instead, the human body is detected based on the change in the position of the feature amount with respect to the identification boundary according to the change in the information level. Therefore, since it is not necessary to learn the identification boundary itself with high accuracy, it is not necessary to collect a large amount of learning images, and the development efficiency of the apparatus can be improved.

第１の実施形態における第１の変形例において、評価値算出部は、人体特徴量としてＨＯＧ特徴量の代わりにハールライク（Haar-Like）特徴量を使用する。ハールライク特徴量は、画像領域中に任意に設定された複数の隣接矩形領域間の輝度差である。ハールライク特徴量の詳細については、例えば、Paul Viola and Michael Jones, "Rapid Object Detection using a Boosted Cascade of Simple Features", IEEE CVPR, vol.1, pp.511-518, 2001に開示されている。
人体特徴量としてハールライク特徴量を使用する場合、評価値算出部の各弱識別器には、所定の隣接矩形領域について求められたハールライク特徴量がそれぞれ入力され、各弱識別器は、入力されたハールライク特徴量に基づいて、対応する部分画像に人体が写っている可能性が高いほど高く、可能性が低いほど低い値を出力する。なお、どの隣接矩形領域のハールライク特徴量を各弱識別器に入力するかは、事前学習により決定される。学習手順は人体特徴量としてＨＯＧ特徴量を使用する場合と同様であるため、説明を省略する。事前学習により決定された各弱識別器として用いるハールライク特徴量を表す情報と、隣接矩形領域を表す情報と、各弱識別器の出力関数を表す情報とは、人体の参照データとして記憶部に記憶される。 In the first modification example of the first embodiment, the evaluation value calculation unit uses a Haar-Like feature quantity instead of the HOG feature quantity as the human body feature quantity. The Haar-like feature amount is a luminance difference between a plurality of adjacent rectangular areas arbitrarily set in the image area. Details of the Haar-like features are disclosed in, for example, Paul Viola and Michael Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, IEEE CVPR, vol.1, pp.511-518, 2001.
When using a Haar-like feature quantity as a human body feature quantity, each weak classifier of the evaluation value calculation unit is input with the Haar-like feature quantity obtained for a predetermined adjacent rectangular area, and each weak classifier is input Based on the Haar-like feature value, the higher the possibility that a human body is reflected in the corresponding partial image, the lower the possibility. Note that which adjacent rectangular area the Haar-like feature value to be input to each weak classifier is determined by prior learning. Since the learning procedure is the same as that in the case of using the HOG feature value as the human body feature value, the description is omitted. Information representing a Haar-like feature value used as each weak classifier determined by prior learning, information representing an adjacent rectangular area, and information representing an output function of each weak classifier are stored in the storage unit as reference data of the human body Is done.

第１の実施形態における第２の変形例において、マルチレベルデータ生成部は、平均化フィルタのフィルタ処理回数を変更することにより鮮鋭度が互いに異なる複数の変更データを生成する。その場合、マルチレベルデータ生成部は、固定サイズ（例えば3×3）の平均化フィルタを用いて、フィルタ処理を1回実施した画像、2回実施した画像、…、n回実施した画像をそれぞれ生成する。生成される画像はフィルタ処理を繰り返すほどよりぼけた画像となるので、フィルタ処理の回数が多いほど鮮鋭度が低くなり、フィルタ処理の回数が少ないほど鮮鋭度が高くなる。
あるいは、マルチレベルデータ生成部は、平均化フィルタのフィルタ係数を変更することにより鮮鋭度が互いに異なる複数の画像を生成してもよい。その場合、マルチレベルデータ生成部は、固定サイズ（例えば5×5）のフィルタについてフィルタの中央部に近いほど重みが大きくなるように重み付けをし、その重み付けが互いに異なるフィルタを用いて鮮鋭度が互いに異なる複数の画像を生成する。適用するフィルタの縁部から中央部への重みの変化の度合いが緩やかなほど生成される画像の鮮鋭度は低くなり、縁部から中央部への重みの変化の度合いが急峻なほどその鮮鋭度は高くなる。 In the second modification example of the first embodiment, the multi-level data generation unit generates a plurality of pieces of change data having different sharpness levels by changing the number of times of the filtering process of the averaging filter. In that case, the multi-level data generation unit uses a fixed size (for example, 3 × 3) averaging filter, and performs an image that has been filtered once, an image that has been performed twice, an image that has been performed n times, respectively. Generate. Since the generated image becomes more blurred as the filtering process is repeated, the sharpness decreases as the number of times of filtering is increased, and the sharpness increases as the number of times of filtering is decreased.
Alternatively, the multi-level data generation unit may generate a plurality of images having different sharpness levels by changing the filter coefficient of the averaging filter. In that case, the multi-level data generation unit weights the fixed size (for example, 5 × 5) filter so that the weight becomes larger as it is closer to the center of the filter, and the sharpness is increased by using filters having different weights. A plurality of different images are generated. The sharper the image, the lower the weight change from the edge to the center of the applied filter, and the sharper the weight change from the edge to the center. Becomes higher.

第１の実施形態における第３の変形例は、上記例とは異なる変更処理によって変更データを生成する例である。第３の変形例においてマルチレベルデータ生成部は、入力された画像に、情報レベルが高いほど少ない量のノイズを、情報レベルが低いほど多い量のノイズをそれぞれ重畳することにより鮮鋭度が互いに異なる複数の変更データを生成する。その場合、マルチレベルデータ生成部は、入力された画像内の所定数の画素をランダムに選択し、選択した画素の値をランダムな値に変更することによりノイズを重畳する。マルチレベルデータ生成部は、値を変更する画素の数を複数通りに変更することにより、互いに異なる量のノイズをそれぞれ重畳した複数の画像を生成する。重畳するノイズ量が多いほど生成される画像のＳＮ（signal to noise）比は低くなって鮮鋭度は低くなり、重畳するノイズ量が少ないほどＳＮ比は高くなって鮮鋭度は高くなる。
あるいは、マルチレベルデータ生成部は、画像内の各画素を画素値が互いに類似する隣接画素のまとまり（セグメント）に分割し、各セグメントごとに、そのセグメントを構成する画素の画素値を同一の値に置換することにより鮮鋭度が互いに異なる複数の画像を生成してもよい。その場合、マルチレベルデータ生成部は、画素値の差の絶対値が閾値以下である隣接画素のまとまりをセグメントとし、各セグメントを構成する画素の画素値をそのセグメントにおける平均画素値で置換する。マルチレベルデータ生成部は、この閾値を複数通りに変更することにより、鮮鋭度が互いに異なる複数の画像を生成する。隣接画素をまとめるための閾値が高いほど生成される画像の鮮鋭度は低くなり、隣接画素をまとめるための閾値が低いほどその鮮鋭度は高くなる。 The third modification example in the first embodiment is an example in which change data is generated by a change process different from the above example. In the third modified example, the multi-level data generation unit superimposes a smaller amount of noise on an input image as the information level is higher, and a higher amount of noise on the input image so that the sharpness differs from each other. Generate multiple change data. In that case, the multi-level data generation unit randomly selects a predetermined number of pixels in the input image, and superimposes noise by changing the value of the selected pixel to a random value. The multi-level data generation unit generates a plurality of images in which different amounts of noise are respectively superimposed by changing the number of pixels whose values are changed in a plurality of ways. The larger the amount of noise to be superimposed, the lower the SN (signal to noise) ratio of the generated image and the lower the sharpness. The smaller the amount of noise to be superimposed, the higher the SN ratio and the higher the sharpness.
Alternatively, the multi-level data generation unit divides each pixel in the image into a group (segment) of adjacent pixels whose pixel values are similar to each other, and the pixel value of the pixels constituting the segment is the same value for each segment. A plurality of images having different sharpness levels may be generated by replacing with. In this case, the multi-level data generation unit sets a group of adjacent pixels whose absolute value of pixel value difference is equal to or less than a threshold value as a segment, and replaces the pixel value of the pixels constituting each segment with the average pixel value in the segment. The multi-level data generation unit generates a plurality of images having different sharpness levels by changing the threshold value in a plurality of ways. The sharpness of the generated image is lower as the threshold for grouping adjacent pixels is higher, and the sharpness is higher as the threshold for grouping adjacent pixels is lower.

第１の実施形態における第４の変形例は、上記例とは異なる変更処理によって変更データを生成する別の例であり、検出対象の特徴を表現する詳細さの程度（表現可能な程度）である明瞭度を情報レベルとする例である。第４の変形例において、マルチレベルデータ生成部は、情報レベルが互いに異なる複数の変更データとして、情報レベルが高いほど画像の明瞭度が高く、情報レベルが低いほど画像の明瞭度が低い、画像の明瞭度が互いに異なる複数の変更データを生成する。その場合、例えば、マルチレベルデータ生成部は、情報レベルが高いほど画像内の画素値の階調数を多くし、情報レベルが低いほど画像内の画素値の階調数を少なくした複数の画像を生成する。
あるいは、マルチレベルデータ生成部は、情報レベルが高いほどコントラストを高くし、情報レベルが低いほどコントラストを低くした複数の画像を生成してもよい。その場合、マルチレベルデータ生成部は、入力された画像に対して、画像内の全画素の輝度値の標準偏差が小さくなるように各画素の輝度値を変換した画像を生成する。 The fourth modified example in the first embodiment is another example of generating change data by a change process different from the above example, and the degree of detail expressing the feature of the detection target (the degree that can be expressed). This is an example in which a certain level of clarity is used as an information level. In the fourth modified example, the multi-level data generation unit is configured as a plurality of pieces of change data having different information levels. The higher the information level, the higher the clarity of the image, and the lower the information level, the lower the clarity of the image. A plurality of change data having different intelligibility are generated. In that case, for example, the multi-level data generation unit increases the number of gradations of the pixel values in the image as the information level is higher, and reduces the number of gradations of the pixel values in the image as the information level is lower. Is generated.
Alternatively, the multi-level data generation unit may generate a plurality of images with higher contrast as the information level is higher and lower contrast as the information level is lower. In that case, the multi-level data generation unit generates an image obtained by converting the luminance value of each pixel so that the standard deviation of the luminance values of all the pixels in the image becomes small with respect to the input image.

第１の実施形態における第５の変形例は、情報レベルを検出対象の特徴表現に適する程度とする例である。第５の変形例において、マルチレベルデータ生成部は、情報レベルが互いに異なる複数の変更データとして、情報レベルが高いほど入力データにおいて検出対象が表される部分をマスキングする度合い（以下、マスキング度合いと称する）を低くし、情報レベルが低いほどマスキング度合いを高くした、マスキング度合いを互いに異ならせた複数の変更データを生成する。入力データが画像データである場合、マルチレベルデータ生成部は、入力された画像において検出対象である人体が表される部分についてのマスキング度合いが互いに異なる複数の画像を生成する。その場合、マルチレベルデータ生成部は、画像の一部をマスキングする所定サイズのマスキング領域をその画像内に設定し、その画像におけるマスキング領域内の画素値を固定値に置換する。そして、マルチレベルデータ生成部は、マスキング度合いが互いに異なる複数の画像として、マスキング領域のサイズが互いに異なる複数の画像を生成する。
図６にマスキング領域について説明するための模式図を示す。図６に示す例では、サイズが互いに異なるマスキング領域６００、６０１、６０２は、各マスキング領域の中央位置が、それぞれ画像６０３の中央位置６０４に一致するように設定される。
マスキング領域のサイズが大きいほど生成される画像のマスキング度合いが高くなってその画像に表れる人体の特徴の正確性が低くなるので情報レベルは低くなり、マスキング領域のサイズが小さいほどそのマスキング度合いが低くなってその情報レベルは高くなる。なお、マスキング度合いが最も低い画像として、マスキング領域のサイズが0の画像、すなわち元の画像を利用することができる。 The fifth modification in the first embodiment is an example in which the information level is set to a level suitable for the feature expression of the detection target. In the fifth modification, the multi-level data generation unit masks a portion where the detection target is represented in the input data as the information level is higher as a plurality of pieces of change data having different information levels (hereinafter referred to as a masking degree). A plurality of pieces of change data having different masking levels, and a higher masking level as the information level is lower. When the input data is image data, the multi-level data generation unit generates a plurality of images having different masking degrees for a portion representing a human body to be detected in the input image. In that case, the multi-level data generation unit sets a masking area of a predetermined size for masking a part of the image in the image, and replaces a pixel value in the masking area in the image with a fixed value. The multi-level data generation unit generates a plurality of images having different masking area sizes as a plurality of images having different masking degrees.
FIG. 6 is a schematic diagram for explaining the masking region. In the example shown in FIG. 6, the masking areas 600, 601, and 602 having different sizes are set so that the center position of each masking area matches the center position 604 of the image 603.
The larger the size of the masking area, the higher the degree of masking of the generated image and the lower the accuracy of human features appearing in the image, so the information level is lower, and the smaller the size of the masking area, the lower the degree of masking. The information level becomes higher. As an image with the lowest masking degree, an image with a masking area size of 0, that is, an original image can be used.

あるいは、マルチレベルデータ生成部は、マスキング度合いが互いに異なる複数の画像として、マスキング領域の位置が互いに異なる複数の画像を生成してもよい。例えば、検出対象が立位の人物の場合、人体の特徴的な部位は、撮影画像の上部付近により多く存在する可能性が高い。そのため、この場合は、マスキング領域の位置を上部から下部に変更していくことにより、その画像が人体の特徴表現に適する程度を高くしていくことができる。 Alternatively, the multilevel data generation unit may generate a plurality of images having different masking region positions as a plurality of images having different masking degrees. For example, when the detection target is a standing person, there is a high possibility that more characteristic parts of the human body exist near the upper part of the captured image. Therefore, in this case, by changing the position of the masking region from the upper part to the lower part, it is possible to increase the degree that the image is suitable for human body feature expression.

第１の実施形態における第６の変形例は、上記例とは異なる変更処理によって、検出対象の特徴表現に適する程度が異なる変更データを生成する例である。第６の変形例において、マルチレベルデータ生成部は、検出対象についての特徴量を生成し、情報レベルが互いに異なる複数の変更データとして、情報レベルが高いほど特徴量が表す情報量のレベル（以下、分析レベルと称する）が大きく（分析レベルが高く）、情報レベルが低いほど情報量のレベルが小さく（分析レベルが低く）なるように、特徴量について情報量を互いに異ならせた複数のデータを用いる。この場合、事前学習により決定された、評価値算出部の各弱識別器に入力する特徴量は、評価値算出部でなく、マルチレベルデータ生成部が求め、マルチレベルデータ生成部は、求めた特徴量の分析レベルを複数通りに変更する。
例えば、人体特徴量としてＨＯＧ特徴量を用いる場合、マルチレベルデータ生成部は、分析レベルを互いに異ならせた複数の変更データとして、分析レベルが高いほど特徴量を表現する量子化数を多くし、分析レベルが低いほど特徴量を表現する量子化数を少なくした複数の変更データを生成することができる。上述したように、ＨＯＧ特徴量は画像内の各セルにおける各勾配方向の勾配強度の総和のヒストグラムとして求められる。マルチレベルデータ生成部は、ヒストグラムの量子化数、すなわち勾配方向の数を2、3、4、…、9と、予め設定された範囲で複数通りに変更することにより、分析レベルを変更する。量子化数が少ないほど人体の特徴が失われていくため抽出される特徴量の情報レベルは低くなり、逆に量子化数が多いほどその情報レベルは高くなる。
あるいは、マルチレベルデータ生成部は、各特徴量に対して重み付けを行うことにより、分析レベルを変更してもよい。例えば、特徴量としてＨＯＧ特徴量を用いる場合、マルチレベルデータ生成部は、部分画像の中心位置に近いセルと部分画像の中心位置から離れている縁部のセルとで異なる重み係数を設定し、各セルのヒストグラムにこの重み係数を乗じることにより分析レベルを変更する。この場合、部分画像の中心位置のセルと縁部のセルで重み係数の差が大きいほど人体の特徴が失われていくため抽出される特徴量の情報レベルは低くなり、逆に重み係数の差が小さいほどその情報レベルは高くなる。
また、例えば、マルチレベルデータ生成部は、分析レベルを互いに異ならせた複数の変更データとして、特徴量を構成する各要素に分析レベルが高いほど大きく、分析レベルが低いほど小さい1以下の正の係数α（0＜α≦1.0）をそれぞれ乗じた複数の変更データを生成する。マルチレベルデータ生成部は、係数αを例えば0.1、0.2、0.3、…、1.0と、予め設定された範囲で複数通りに変更する。例えば、ハールライク特徴量を用いる場合、ハールライク特徴量として算出された複数の隣接矩形領域間の輝度差のそれぞれにαを乗じて変更データを生成する。αが小さいほど人体の特徴が失われていくため求められる特徴量の情報レベルは低くなり、逆にαが大きいほどその情報レベルは高くなる。 The sixth modified example in the first embodiment is an example in which change data different in degree suitable for the feature expression of the detection target is generated by a change process different from the above example. In the sixth modified example, the multi-level data generation unit generates a feature amount for the detection target, and as a plurality of pieces of change data having different information levels, the level of information amount represented by the feature amount as the information level is high , Referred to as the analysis level), a plurality of data with different amounts of information with respect to the feature amount so that the information level becomes smaller (the analysis level is lower) as the information level is lower (the analysis level is higher). Use. In this case, the feature quantity input to each weak classifier of the evaluation value calculation unit determined by the prior learning is obtained by the multilevel data generation unit, not the evaluation value calculation unit, and the multilevel data generation unit is obtained Change the analysis level of feature quantities in several ways.
For example, when using an HOG feature amount as a human body feature amount, the multi-level data generation unit increases the number of quantizations representing the feature amount as the analysis level is higher as a plurality of change data having different analysis levels. As the analysis level is lower, it is possible to generate a plurality of pieces of change data in which the number of quantizations representing the feature amount is reduced. As described above, the HOG feature amount is obtained as a histogram of the sum of the gradient intensities in each gradient direction in each cell in the image. The multi-level data generation unit changes the analysis level by changing the quantization number of the histogram, that is, the number of gradient directions, in a predetermined range of 2, 3, 4,. As the number of quantizations decreases, the human features are lost, so the information level of the extracted feature quantity decreases. Conversely, as the quantization number increases, the information level increases.
Alternatively, the multi-level data generation unit may change the analysis level by weighting each feature amount. For example, when using a HOG feature value as a feature value, the multi-level data generation unit sets different weighting factors for cells close to the center position of the partial image and for cells at the edge portion away from the center position of the partial image, The analysis level is changed by multiplying the histogram of each cell by this weighting factor. In this case, the greater the difference in weighting coefficient between the cell at the center of the partial image and the cell at the edge, the more the characteristics of the human body are lost. The smaller the is, the higher the information level is.
In addition, for example, the multi-level data generation unit, as a plurality of change data with different analysis levels, each element constituting the feature amount is larger as the analysis level is higher and lower as the analysis level is lower than 1 positive value. A plurality of pieces of change data each multiplied by a coefficient α (0 <α ≦ 1.0) are generated. The multilevel data generation unit changes the coefficient α in a plurality of ways within a preset range, for example, 0.1, 0.2, 0.3,. For example, when a Haar-like feature value is used, change data is generated by multiplying each luminance difference between a plurality of adjacent rectangular areas calculated as the Haar-like feature value by α. As α is smaller, the characteristics of the human body are lost, so the information level of the required feature amount is lower. Conversely, as α is larger, the information level is higher.

分析レベルを互いに異ならせた複数の変更データを生成する場合、図５のフローチャートのステップＳ５０４において、マルチレベルデータ生成部は、部分画像から、評価値算出部の各弱識別器に入力する特徴量を求め、特徴量の各要素に対してステップＳ５０３で設定された情報レベルに対応する係数αを乗じて、評価値算出部に送る。そして、ステップＳ５０５は省略され、ステップＳ５０６において、評価値算出部は、マルチレベルデータ生成部から受け取った特徴量から評価値を算出する。 When generating a plurality of pieces of change data having different analysis levels, in step S504 in the flowchart of FIG. 5, the multilevel data generation unit inputs a feature amount from each partial image to each weak classifier of the evaluation value calculation unit. Is multiplied by a coefficient α corresponding to the information level set in step S503, and sent to the evaluation value calculation unit. Then, step S505 is omitted, and in step S506, the evaluation value calculation unit calculates an evaluation value from the feature amount received from the multilevel data generation unit.

第１の実施形態における第７の変形例は、上記例とは異なる変更処理によって、検出対象の特徴表現に適する程度が異なる変更データを生成する別の例である。第７の変形例において、マルチレベルデータ生成部は、入力された画像に対して複数通りに幾何変換を行うことにより、又は幾何変換に相当する変換処理を特徴量に対して行うことにより、情報レベルを互いに異ならせた複数の変更データを生成する。
例えば、マルチレベルデータ生成部は、部分画像を回転させることにより幾何変換を行う。画像に写っている検出対象は回転角度が大きいほど多くの学習用画像における検出対象の傾きと異なるものとなるため、回転後の画像が検出対象の特徴表現に適する程度は低下して、情報レベルは低くなる。この場合、マルチレベルデータ生成部は、情報レベルが高いほど小さい回転角度で情報レベルが低いほど大きい回転角度で部分画像をそれぞれ回転させた複数の個別レベル画像を生成する。 The seventh modified example in the first embodiment is another example in which change data different in degree suitable for the feature expression of the detection target is generated by a change process different from the above example. In the seventh modification, the multilevel data generation unit performs information conversion by performing geometric transformation on the input image in a plurality of ways, or by performing transformation processing corresponding to geometric transformation on the feature amount. A plurality of change data having different levels are generated.
For example, the multilevel data generation unit performs geometric transformation by rotating the partial image. Since the detection target shown in the image is different from the inclination of the detection target in many learning images as the rotation angle is larger, the degree to which the rotated image is suitable for the feature expression of the detection target decreases, and the information level Becomes lower. In this case, the multi-level data generation unit generates a plurality of individual level images obtained by rotating the partial images respectively with a smaller rotation angle with a higher information level and with a larger rotation angle with a lower information level.

また、例えば、マルチレベルデータ生成部は、画像の回転に相当する処理を特徴量に施すことにより幾何変換を行う。この場合、評価値算出部の各弱識別器に入力する特徴量は、評価値算出部でなく、マルチレベルデータ生成部が求める。例えば、特徴量がＨＯＧ特徴量である場合、マルチレベルデータ生成部は、部分画像を複数のセルに分割し、セルごとに、セル内の各画素における画素値の勾配方向及び勾配強度を算出し、各勾配方向について各勾配方向の勾配強度の総和を度数としたヒストグラムを求める。そして、マルチレベルデータ生成部は、求めたヒストグラムの各勾配方向について、度数、すなわち各勾配方向の勾配強度の総和を所定の段階だけ循環させたものをＨＯＧ特徴量として求める。マルチレベルデータ生成部は、循環させる段階数を互いに異ならせた複数のＨＯＧ特徴量を求め、情報レベルを互いに異ならせた複数の変更データとして用いる。
また、特徴量がハールライク特徴量である場合、マルチレベルデータ生成部は、事前学習により決定された各弱識別器に入力するハールライク特徴量に対応する隣接矩形領域ごとに、その隣接矩形領域を回転させて、回転させた隣接矩形領域間の輝度差をその弱識別器に入力するハールライク特徴量として求める。この場合、マルチレベルデータ生成部は、事前学習により決定された各弱識別器に入力するハールライク特徴量に対応する隣接矩形領域ごとに、その隣接矩形領域を回転させる回転角度を0°〜180°の間で段階的に回転させながら複数のハールライク特徴量を求め、情報レベルを互いに異ならせた複数の変更データとして用いる。
特徴量を回転させるこれらの処理は、画像を回転させるよりも低い負荷で行うことができる。 Further, for example, the multilevel data generation unit performs geometric conversion by performing processing corresponding to image rotation on the feature amount. In this case, the feature value input to each weak classifier of the evaluation value calculation unit is obtained by the multilevel data generation unit, not the evaluation value calculation unit. For example, when the feature amount is an HOG feature amount, the multi-level data generation unit divides the partial image into a plurality of cells, and calculates the gradient direction and gradient strength of the pixel value in each pixel in the cell for each cell. For each gradient direction, a histogram with the sum of the gradient intensities in each gradient direction as the frequency is obtained. Then, the multi-level data generation unit obtains, as the HOG feature amount, a value obtained by circulating the frequency, that is, the sum of the gradient intensities in each gradient direction by a predetermined level, for each gradient direction of the obtained histogram. The multi-level data generation unit obtains a plurality of HOG feature amounts having different numbers of stages to be circulated and uses them as a plurality of pieces of change data having different information levels.
When the feature quantity is a Haar-like feature quantity, the multi-level data generation unit rotates the adjacent rectangular area for each adjacent rectangular area corresponding to the Haar-like feature quantity input to each weak classifier determined by pre-learning. Thus, the luminance difference between the rotated adjacent rectangular areas is obtained as a Haar-like feature value input to the weak classifier. In this case, the multi-level data generation unit sets a rotation angle for rotating the adjacent rectangular area for each adjacent rectangular area corresponding to the Haar-like feature input to each weak classifier determined by the prior learning from 0 ° to 180 °. A plurality of Haar-like feature quantities are obtained while rotating in a stepwise manner, and used as a plurality of change data having different information levels.
These processes for rotating the feature amount can be performed with a load lower than that for rotating the image.

あるいは、マルチレベルデータ生成部は、入力された画像を台形又は平行四辺形に変換することにより幾何変換を行ってもよい。画像に写っている検出対象は変形の度合いが大きいほど本来の形状と異なるものとなるため、変形後の画像が検出対象の特徴表現に適する程度は低下して、情報レベルは低くなる。
例えば、マルチレベルデータ生成部は、入力された画像の四辺のうち何れか一辺を短くする台形変換を行い、情報レベルが高いほど当該一辺とその対辺の比が１に近く、情報レベルが低いほど上記比が１から離れた複数の変更データを生成する。
あるいは、マルチレベルデータ生成部は、入力された画像における四つの内角の角度を変更する幾何変換を行い、情報レベルが高いほど変換後の各角度が90°に近く、情報レベルが低いほど変換後の各角度が90°から離れた複数のデータを生成する。 Alternatively, the multilevel data generation unit may perform geometric conversion by converting the input image into a trapezoid or a parallelogram. Since the detection target shown in the image becomes different from the original shape as the degree of deformation increases, the degree of suitability of the deformed image for the feature expression of the detection target decreases and the information level decreases.
For example, the multi-level data generation unit performs trapezoidal conversion that shortens one of the four sides of the input image. The higher the information level, the closer the ratio of the one side to its opposite side is to 1, and the lower the information level. A plurality of change data in which the ratio is separated from 1 is generated.
Alternatively, the multi-level data generation unit performs geometric transformation that changes the angles of the four interior angles in the input image, and the higher the information level, the closer the converted angle is to 90 °, and the lower the information level, the more converted A plurality of data in which each angle of is separated from 90 ° is generated.

第１の実施形態における第８の変形例において、評価値算出部は、リアルアダブースト識別器の代わりにアダブースト（Adaboost）識別器を有する。この識別器は、複数の弱識別器と、各弱識別器の判定結果を統合して判定する強識別器とから構成される。各弱識別器には、各弱識別器ごとに予め決定された特徴量がそれぞれ入力され、各弱識別器は、入力された特徴量に基づいて、対応する部分画像に人体が写っていると判定した場合、1を出力し、人体が写っていないと判定した場合、-1を出力する。一方、強識別器は、各弱識別器による出力値をそれぞれ重み付けして、その重み付け和を求めて評価値として出力する。なお、どの特徴量を各弱識別器に入力するか、及び各弱識別器に対する重みは、人体が写っている複数の学習用人体画像と人体が写っていない複数の学習用非人体画像とから算出された特徴量を用いた事前学習により決定される。 In the eighth modification example of the first embodiment, the evaluation value calculation unit includes an Adaboost discriminator instead of the real Adaboost discriminator. This classifier includes a plurality of weak classifiers and a strong classifier that is determined by integrating the determination results of the weak classifiers. Each weak discriminator is inputted with a feature amount determined in advance for each weak discriminator, and each weak discriminator is based on the inputted feature amount and the human body is reflected in the corresponding partial image. If it is determined, 1 is output, and if it is determined that the human body is not captured, -1 is output. On the other hand, the strong classifier weights the output value from each weak classifier, obtains the weighted sum, and outputs it as an evaluation value. It should be noted that which feature quantity is input to each weak classifier and the weight for each weak classifier is based on a plurality of learning human body images in which a human body is photographed and a plurality of learning non-human body images in which a human body is not photographed. It is determined by prior learning using the calculated feature amount.

あるいは、評価値算出部は、サポートベクトルマシン、３層以上の層を持つパーセプトロンまたはランダムフォレスト等を用いて評価値を算出してもよい。その場合、事前学習を行うコンピュータは、予め複数の学習用人体画像と複数の学習用非人体画像からそれぞれ一つ以上の特徴量を抽出し、抽出した特徴量を用いて事前学習を行う。この事前学習は、特定の画像から抽出された特徴量が入力されたときに、特定の画像に人体が写っているか否かを判別するように行われる。サポートベクトルマシンを用いる場合、評価値算出部は、特徴量空間において、特徴量が、事前学習により求めた識別境界に対して人体側の領域に位置するときに正値となり、非人体側の領域に位置するときに負値となり、その特徴量の位置と識別境界との距離に相当する値を絶対値とする値を評価値として算出する。また、３層以上の層を持つパーセプトロンを用いる場合、評価値算出部は、出力層のニューロンへの入力の総和を評価値とする。また、ランダムフォレストを用いる場合、評価値算出部は、事前学習により生成した各決定木の出力を、画像に人体が写っている確からしさが高いほど高くなるように結合して評価値とする。 Alternatively, the evaluation value calculation unit may calculate the evaluation value using a support vector machine, a perceptron having three or more layers, a random forest, or the like. In this case, the computer that performs the pre-learning extracts one or more feature amounts from the plurality of learning human body images and the plurality of learning non-human body images in advance, and performs the pre-learning using the extracted feature amounts. This pre-learning is performed so as to determine whether or not a human body is shown in a specific image when a feature amount extracted from the specific image is input. In the case of using a support vector machine, the evaluation value calculation unit has a positive value when the feature amount is located in the region on the human body side with respect to the identification boundary obtained by the prior learning in the feature amount space, and the region on the non-human body side. A value that becomes a negative value when positioned at the position and has an absolute value corresponding to the distance between the position of the feature amount and the identification boundary is calculated as the evaluation value. When a perceptron having three or more layers is used, the evaluation value calculation unit uses the sum of inputs to neurons in the output layer as an evaluation value. When a random forest is used, the evaluation value calculation unit combines the outputs of the decision trees generated by the prior learning so that the higher the probability that the human body is reflected in the image, the higher the evaluation value.

あるいは、評価値算出部は、線形判別分析法を用いて評価値を出力してもよい。その場合、事前学習を行うコンピュータは、予め複数の学習用人体画像と複数の学習用非人体画像からそれぞれ一つ以上の特徴量を抽出し、抽出した特徴量を用いて線形判別関数を作成する。コンピュータは、特定の画像から抽出された特徴量が入力されたときに、その画像に人体が写っている確からしさが高いほど高い値を出力するように、線形判別関数を作成する。そして評価値算出部は、その線形判別関数の出力値を評価値とする。 Alternatively, the evaluation value calculation unit may output the evaluation value using a linear discriminant analysis method. In this case, the computer that performs the pre-learning extracts one or more feature amounts from the plurality of learning human body images and the plurality of learning non-human body images in advance, and creates a linear discriminant function using the extracted feature amounts. . When a feature amount extracted from a specific image is input, the computer creates a linear discriminant function so that a higher value is output as the probability that the human body is reflected in the image is higher. Then, the evaluation value calculation unit uses the output value of the linear discriminant function as the evaluation value.

あるいは、評価値算出部は、混合正規分布を用いて評価値を出力してもよい。その場合、事前学習を行うコンピュータは、予め複数の学習用人体画像からそれぞれ一つ以上の特徴量を抽出し、抽出した特徴量を用いて混合正規分布を作成する。評価値算出部は、作成された混合正規分布に、特定の画像から抽出された特徴量を入力したときに得られる確率を評価値とする。混合正規分布を用いる場合は、検出対象の学習用データのみを用いて事前学習を行うため、検出対象以外の学習用データ、つまり人体が写っていない学習用非人体画像を収集する必要がなくなる。 Alternatively, the evaluation value calculation unit may output the evaluation value using a mixed normal distribution. In this case, a computer that performs pre-learning extracts one or more feature amounts from a plurality of learning human body images in advance, and creates a mixed normal distribution using the extracted feature amounts. The evaluation value calculation unit uses a probability obtained when a feature amount extracted from a specific image is input to the created mixed normal distribution as an evaluation value. When the mixed normal distribution is used, pre-learning is performed using only the learning data to be detected, so that it is not necessary to collect learning data other than the detection target, that is, learning non-human body images in which no human body is shown.

また、例えば、評価値算出部は、異なる学習データを用いて機械学習を行った複数の識別器を用いて評価値を算出してもよい。その場合、評価値算出部は、各識別器を直列に接続し、第１段目の識別器から順番に識別処理を実行し、何れかの識別器が画像に人体が写っていないと判定するまで識別処理を繰り返す。なお、評価値算出部は、各識別器からの出力値が閾値以下である場合に、その識別器が画像に人体が写っていないと判定したと判断する。この閾値は、事前の実験により人体が写った複数のテスト用人体画像に対して算出された出力値と人体が写っていない複数のテスト用非人体画像に対して算出された出力値に基づきこれらを識別可能な値に設定しておくことができる。そして、評価値算出部は、画像に人体が写っていると判定した識別器の数を評価値とする。 For example, the evaluation value calculation unit may calculate the evaluation value using a plurality of classifiers that have performed machine learning using different learning data. In that case, the evaluation value calculation unit connects each discriminator in series, executes discrimination processing in order from the first discriminator, and determines that any human discriminator is not shown in the image. Repeat the identification process until. The evaluation value calculation unit determines that the discriminator determines that the human body is not shown in the image when the output value from each discriminator is equal to or less than the threshold value. This threshold is based on output values calculated for a plurality of test human body images in which a human body is captured in a prior experiment and output values calculated for a plurality of test non-human body images in which a human body is not captured. Can be set to an identifiable value. Then, the evaluation value calculation unit uses the number of discriminators determined to have a human body in the image as the evaluation value.

第１の実施形態における第９の変形例において、評価値算出部は、機械学習を行った識別器により評価値を算出する代わりに、パターンマッチングにより評価値を算出する。この場合、制御部は、検出対象を表すデータであることが分かっている複数の学習用データに平均化処理等を実施したデータのパターンを予め生成して参照データとして記憶部に記憶しておく。評価値算出部は、マルチレベルデータ生成部から受け取った画像のそれぞれと、参照データとして記憶しておいたデータパターンの類似する度合いを評価値として算出する。類似する度合いは、例えば、各画像と参照データの内積とすることができる。 In the ninth modification example of the first embodiment, the evaluation value calculation unit calculates the evaluation value by pattern matching instead of calculating the evaluation value by the discriminator that performed machine learning. In this case, the control unit generates in advance a pattern of data obtained by performing an averaging process or the like on a plurality of learning data that is known to be detection target data, and stores the data pattern in the storage unit as reference data. . The evaluation value calculation unit calculates the degree of similarity between each of the images received from the multilevel data generation unit and the data pattern stored as the reference data as an evaluation value. The degree of similarity can be, for example, the inner product of each image and reference data.

第１の実施形態における第１０の変形例において、評価値算出部は、機械学習を行った識別器により評価値を算出すること、またはパターンマッチングにより評価値を算出することに代えて、入力データから検出対象に特有のデータが抽出される抽出度合いを評価値とする。例えば、検出対象が顔である場合、抽出するデータは肌色を表す画素（以下、肌色画素と称する）とすることができる。その場合、制御部は、抽出する肌色画素の画素値の範囲と、顔とみなせる肌色画素の標準割合とを設定して記憶部に予め記憶しておく。評価値算出部は、マルチレベルデータ生成部から受け取った画像からそれぞれ肌色画素を抽出する。評価値算出部は、各画像内の全画素数に対する肌色画素の画素数の割合と、記憶部に予め記憶しておいた標準割合との差の絶対値を求め、求めた絶対値の逆数を評価値とする。 In the tenth modification of the first embodiment, the evaluation value calculation unit replaces the input data by calculating the evaluation value by a discriminator that has performed machine learning, or calculating the evaluation value by pattern matching. The degree of extraction from which data specific to the detection target is extracted is taken as the evaluation value. For example, when the detection target is a face, the data to be extracted can be pixels representing skin color (hereinafter referred to as skin color pixels). In this case, the control unit sets a range of pixel values of the skin color pixels to be extracted and a standard ratio of the skin color pixels that can be regarded as a face, and stores them in the storage unit in advance. The evaluation value calculation unit extracts skin color pixels from the image received from the multilevel data generation unit. The evaluation value calculation unit obtains the absolute value of the difference between the ratio of the number of skin color pixels to the total number of pixels in each image and the standard ratio stored in advance in the storage unit, and calculates the reciprocal of the obtained absolute value. The evaluation value.

第１の実施形態における第１１の変形例において、評価値算出部は、評価値として、検出対象らしさを表す度合いを出力する代わりに、検出対象でないことの確からしさを表す度合いを出力する。この場合、対象系列判定部は、検出対象でないことの確からしさを表す評価値から生成された評価値系列を用いて学習された系列識別器により、入力データに検出対象が含まれるか否かを判定する。 In the eleventh modification of the first embodiment, the evaluation value calculation unit outputs a degree representing the likelihood of being not a detection target, instead of outputting the degree representing the likelihood of detection as an evaluation value. In this case, the target sequence determination unit determines whether or not the detection target is included in the input data by the sequence identifier learned using the evaluation value sequence generated from the evaluation value that represents the probability that it is not the detection target. judge.

第１の実施形態における第１２の変形例において、対象系列判定部は、サポートベクトルマシンによる系列識別器の代わりにアダブーストによる系列識別器を有する。各弱識別器には、各弱識別器ごとに予め決定された、評価値系列の要素のうち連続した所定数の要素がそれぞれ入力される。各弱識別器は、入力された評価値系列の要素に基づいて、対応する部分画像に人体が写っていると判定した場合、1を出力し、人体が写っていないと判定した場合、-1を出力する。一方、強識別器は、各弱識別器による出力値をそれぞれ重み付けして、その重み付け和を求める。そして強識別器は、求めた重み付け和が判定閾値より大きければ、入力された評価値系列が検出対象を含む入力データから生成されたことを表す識別結果を出力し、判定閾値以下であれば、入力された評価値系列が検出対象を含む入力データから生成されていないことを表す識別結果を出力する。なお、評価値系列の要素のうちどの要素を各弱識別器に入力するか、及び各弱識別器に対する重みは、人体が写っている複数の学習用人体画像と人体が写っていない複数の学習用非人体画像とから算出された評価値系列を用いた事前学習により決定される。 In the twelfth modification of the first embodiment, the target sequence determination unit has a sequence identifier based on Adaboost instead of a sequence identifier based on a support vector machine. Each weak classifier is input with a predetermined number of consecutive elements among the elements of the evaluation value series determined in advance for each weak classifier. Each weak classifier outputs 1 when it is determined that the human body is reflected in the corresponding partial image based on the elements of the input evaluation value series, and when it is determined that the human body is not captured, -1 Is output. On the other hand, the strong classifier weights the output value from each weak classifier, and obtains the weighted sum. The strong discriminator outputs an identification result indicating that the input evaluation value series is generated from the input data including the detection target if the calculated weighted sum is larger than the determination threshold, An identification result indicating that the input evaluation value series is not generated from the input data including the detection target is output. Which element of the evaluation value series is input to each weak classifier and the weight for each weak classifier is a plurality of learning human images in which a human body is shown and a plurality of learnings in which a human body is not shown. It is determined by prior learning using the evaluation value series calculated from the non-human body image.

あるいは、対象系列判定部は、サポートベクトルマシンによる系列識別器の代わりに混合正規分布による系列識別器を有してもよい。その場合、事前学習を行うコンピュータは、予め複数の学習用人体画像から生成された評価値系列を用いて混合正規分布を作成する。この系列識別器は、作成された混合正規分布に、特定の画像から生成された評価値系列を入力し、出力される確率を取得する。そして系列識別器は、取得した確率が判定閾値より大きければ、入力された評価値系列が検出対象を含む入力データから生成されたことを表す識別結果を出力し、判定閾値以下であれば、入力された評価値系列が検出対象を含む入力データから生成されていないことを表す識別結果を出力する。混合正規分布を用いる場合は、検出対象の学習用データのみを用いて事前学習を行うため、検出対象以外の学習用データ、つまり人体が写っていない学習用非人体画像を収集する必要がなくなる。 Alternatively, the target sequence determination unit may include a sequence discriminator based on a mixed normal distribution instead of a sequence discriminator based on a support vector machine. In that case, a computer that performs pre-learning creates a mixed normal distribution using evaluation value sequences generated in advance from a plurality of learning human body images. This series discriminator inputs an evaluation value series generated from a specific image to the created mixed normal distribution, and acquires the probability of output. The sequence discriminator outputs an identification result indicating that the input evaluation value sequence is generated from the input data including the detection target if the acquired probability is larger than the determination threshold, and if it is equal to or less than the determination threshold, the input An identification result indicating that the evaluated evaluation value series is not generated from the input data including the detection target is output. When the mixed normal distribution is used, pre-learning is performed using only the learning data to be detected, so that it is not necessary to collect learning data other than the detection target, that is, learning non-human body images in which no human body is shown.

次に、本発明の第２の実施形態による対象検出装置が実装された監視システムについて図を参照しつつ説明する。
本実施形態による監視装置は、複数の変更処理によって、入力データから検出対象を表現する程度を互いに異ならせた複数の変更データを生成し、生成した複数の変更データについて算出された評価値をまとめて評価値系列を生成する。監視装置は、異なる変更処理によって生成された変更データに基づく評価値系列を用いて入力データに検出対象が含まれるか否かを判定することにより、入力データから検出対象を検出する精度のさらなる向上を図る。 Next, a monitoring system in which the object detection device according to the second embodiment of the present invention is mounted will be described with reference to the drawings.
The monitoring apparatus according to the present embodiment generates a plurality of change data having different degrees of expressing the detection target from the input data by a plurality of change processes, and summarizes the evaluation values calculated for the generated plurality of change data. To generate an evaluation value series. The monitoring device further improves the accuracy of detecting the detection target from the input data by determining whether the detection target is included in the input data using the evaluation value series based on the change data generated by the different change processing Plan.

第２の実施形態による監視システムの概略構成は、図３に示した第１の実施形態による監視システムの概略構成と同様である。なお、第２の実施形態では、各監視装置を監視装置２０として説明する。監視装置２０は、図３に示した第１の実施形態による監視装置１０と同様に、撮像部、インタフェース部、通信部、記憶部及び制御部を有する。監視装置２０の撮像部、インタフェース部、通信部及び記憶部は、監視装置１０の撮像部１１、インタフェース部１２、通信部１３及び記憶部１４と同様であるため、それぞれ撮像部１１、インタフェース部１２、通信部１３及び記憶部１４として説明する。一方、監視装置２０の制御部は、監視装置１０の制御部１５と一部機能が相違するため、制御部２５として説明する。 The schematic configuration of the monitoring system according to the second embodiment is the same as the schematic configuration of the monitoring system according to the first embodiment shown in FIG. In the second embodiment, each monitoring device will be described as the monitoring device 20. The monitoring device 20 includes an imaging unit, an interface unit, a communication unit, a storage unit, and a control unit, similarly to the monitoring device 10 according to the first embodiment illustrated in FIG. Since the imaging unit, interface unit, communication unit, and storage unit of the monitoring device 20 are the same as the imaging unit 11, interface unit 12, communication unit 13, and storage unit 14 of the monitoring device 10, the imaging unit 11 and interface unit 12 respectively. The communication unit 13 and the storage unit 14 will be described. On the other hand, the control unit of the monitoring device 20 is described as the control unit 25 because the control unit 15 of the monitoring device 10 is partially different in function.

制御部２５は、対象検出装置の例であり、図４に示した第１の実施形態による制御部１５と同様に、プロセッサ上で動作するソフトウェアにより実装される機能モジュールとして、データ入力部、切り出し部、マルチレベルデータ生成部、評価値算出部、評価値系列生成部、対象判定部及び通知制御部を有する。第２の実施形態では、制御部２５のデータ入力部、切り出し部、マルチレベルデータ生成部、評価値算出部、評価値系列生成部、対象判定部及び通知制御部をデータ入力部２５０、切り出し部２５１、マルチレベルデータ生成部２５２、評価値算出部２５３、評価値系列生成部２５４、対象判定部２５５及び通知制御部２５６とする。制御部２５のデータ入力部２５０、切り出し部２５１及び通知制御部２５６は、制御部１５のデータ入力部１５０、切り出し部１５１及び通知制御部１５６と同様であるため、説明を省略し、以下では、マルチレベルデータ生成部２５２、評価値算出部２５３、評価値系列生成部２５４及び対象判定部２５５について詳細に説明する。
なお、制御部２５が有するこれらの各部は、独立した集積回路、ファームウェア、マイクロプロセッサなどで構成されてもよい。 The control unit 25 is an example of a target detection device, and, similar to the control unit 15 according to the first embodiment illustrated in FIG. 4, as a functional module implemented by software operating on a processor, a data input unit, clipping Unit, a multi-level data generation unit, an evaluation value calculation unit, an evaluation value series generation unit, a target determination unit, and a notification control unit. In the second embodiment, the data input unit 250, the cutout unit, the data input unit, the cutout unit, the multilevel data generation unit, the evaluation value calculation unit, the evaluation value series generation unit, the target determination unit, and the notification control unit of the control unit 25 are used. 251, multi-level data generation unit 252, evaluation value calculation unit 253, evaluation value series generation unit 254, object determination unit 255, and notification control unit 256. Since the data input unit 250, the cutout unit 251 and the notification control unit 256 of the control unit 25 are the same as the data input unit 150, the cutout unit 151 and the notification control unit 156 of the control unit 15, the description thereof will be omitted. The multilevel data generation unit 252, the evaluation value calculation unit 253, the evaluation value series generation unit 254, and the target determination unit 255 will be described in detail.
Note that these units included in the control unit 25 may be configured by independent integrated circuits, firmware, a microprocessor, and the like.

マルチレベルデータ生成部２５２は、複数の変更処理によって、一つの入力データから検出対象を表現する程度を互いに異ならせた複数の変更データをそれぞれ生成し、生成した変更データを各情報レベルと対応付けて評価値算出部２５３に送る。
以下、マルチレベルデータ生成部２５２が二つの変更処理によって変更データを生成する例について説明する。マルチレベルデータ生成部２５２は、一つの入力データから、第１変更処理によって検出対象を表現する程度を互いに異ならせた複数の第１変更データを生成するとともに、第１変更処理とは異なる第２変更処理によって検出対象を表現する程度を互いに異ならせた複数の第２変更データを生成する。例えば、マルチレベルデータ生成部２５２は、第１変更処理を平均化フィルタにより検出対象を表現する程度を変更する処理とし、第２変更処理をノイズ付加により検出対象を表現する程度を変更する処理とする。
なお、第１変更処理および第２変更処理の組み合わせはこれ以外の組み合わせとすることもでき、第１の実施形態における変形例として上述した各種変更処理の中から互いに異なる任意の変更処理を組み合わせることができる。また、組み合わせる数を三種類以上とすることもできる。
マルチレベルデータ生成部２５２は、部分画像に、情報レベルが高いほどフィルタサイズが小さく、情報レベルが低いほどフィルタサイズが大きい平均化フィルタをそれぞれ適用することにより、情報レベルが互いに異なる33個の第１変更データを生成する。また、マルチレベルデータ生成部２５２は、部分画像に、情報レベルが高いほど少ない量のノイズを、情報レベルが低いほど多い量のノイズをそれぞれ重畳することにより、情報レベルが互いに異なる10個の第２変更データを生成する。
マルチレベルデータ生成部２５２は、生成した第１変更データおよび第２変更データを一つのグループの変更データとして評価値算出部２５３に送る。 The multi-level data generation unit 252 generates a plurality of change data with different degrees of expressing the detection target from one input data by a plurality of change processes, and associates the generated change data with each information level To the evaluation value calculation unit 253.
Hereinafter, an example in which the multilevel data generation unit 252 generates change data by two change processes will be described. The multi-level data generation unit 252 generates a plurality of first change data having different degrees of expressing the detection target by the first change process from one input data, and a second different from the first change process. A plurality of second change data is generated by varying the degree of expressing the detection target by the change process. For example, the multi-level data generation unit 252 sets the first change process as a process for changing the degree of expressing the detection target by the averaging filter, and the second change process as a process for changing the degree of expressing the detection target by adding noise. To do.
Note that the first change process and the second change process may be combined in any other combination, and different change processes may be combined from the various change processes described above as modifications of the first embodiment. Can do. Also, the number of combinations can be three or more.
The multi-level data generation unit 252 applies 33 averaging filters having different information levels to the partial image by applying averaging filters having smaller filter sizes as the information level is higher and larger filter sizes as the information level is lower. 1 Change data is generated. Further, the multi-level data generation unit 252 superimposes a smaller amount of noise as the information level is higher and a larger amount of noise as the information level is lower on the partial image, so that the tenth different information levels are obtained. 2. Generate change data.
The multilevel data generation unit 252 sends the generated first change data and second change data to the evaluation value calculation unit 253 as change data of one group.

評価値算出部２５３は、変更データごとに評価値を算出し、各評価値を各情報レベルと対応付けて評価値系列生成部２５４に送る。例えば、変更データが二種類である場合、評価値算出部２５３は、複数の第１変更データ及び複数の第２変更データごとに評価値を算出し、各評価値を第１情報レベル及び第２情報レベルと対応付ける。 The evaluation value calculation unit 253 calculates an evaluation value for each change data, and sends each evaluation value to the evaluation value series generation unit 254 in association with each information level. For example, when there are two types of change data, the evaluation value calculation unit 253 calculates an evaluation value for each of the plurality of first change data and the plurality of second change data, and sets each evaluation value to the first information level and the second information. Correlate with information level.

評価値系列生成部２５４は、変更データごとに算出された評価値を情報レベルにて規定される予め定めた順序にて並べた評価値系列を生成し、生成した評価値系列を対象判定部２５５に渡す。例えば、33個の第１変更データが生成され、10個の第２変更データが生成されている場合、評価値系列生成部２５４は、第１変更データについて算出された33個の評価値を第１変更処理における情報レベルの昇順に並べた第１系列を生成する。さらに、評価値系列生成部２５４は、第２変更データについて算出された10個の評価値を第２変更処理における情報レベルの昇順に並べた第２系列を生成する。そして、評価値系列生成部２５４は、第１系列に続けて第２系列を並べて評価値系列を生成する。
なお、第１系列および第２系列における順序は対応する情報レベルの降順又は別の順序とすることもでき、事前学習時の順序と対象検出時の順序が同一であればよい。 The evaluation value series generation unit 254 generates an evaluation value series in which the evaluation values calculated for each change data are arranged in a predetermined order specified by the information level, and the generated evaluation value series is the target determination unit 255. To pass. For example, when 33 pieces of first change data are generated and 10 pieces of second change data are generated, the evaluation value series generation unit 254 uses the 33 evaluation values calculated for the first change data as the first change data. A first sequence arranged in ascending order of information levels in one change process is generated. Furthermore, the evaluation value series generation unit 254 generates a second series in which 10 evaluation values calculated for the second change data are arranged in ascending order of information levels in the second change process. Then, the evaluation value series generation unit 254 generates an evaluation value series by arranging the second series after the first series.
Note that the order in the first series and the second series can be the descending order of the corresponding information levels or another order, and the order at the time of prior learning and the order at the time of object detection need only be the same.

図７Ａ及び図７Ｂに、複数の第１変更データ及び複数の第２変更データごとに算出された評価値をそれぞれ情報レベルの昇順に並べて連結した評価値系列の一例を表す。図７Ａのグラフ７００は、検出対象が含まれる入力データについて生成された評価値系列の例を表し、図７Ｂのグラフ７１０は、検出対象が含まれない入力データについて生成された評価値系列の例を表す。グラフ７００、７１０において、横軸はレベルを表し、縦軸は評価値を表す。レベル1〜33は第１情報レベル1〜33に対応する。レベル1〜32の評価値は、フィルタサイズが(4(n-1)+1)（nは2〜33の整数であり、それぞれ第１情報レベル32〜1に対応する）である平均化フィルタを適用した第１変更データについて算出された評価値を表し、レベル33の評価値は元の入力データについて算出された評価値を表す。一方、レベル34〜43は第２情報レベル1〜10に対応する。レベル34〜42の評価値は、画像全体の画素数に対して45%,40%,…,5%のノイズを付加した第２変更データについて算出された評価値を表し、レベル43の評価値は元の入力データについて算出された評価値を表す。
図７Ａのグラフ７００に示すように、入力データに検出対象が含まれる場合、レベルが25〜32の領域と、34〜43の領域の二箇所で、レベルの上昇に従って評価値が急激に上昇する傾向が見られる。一方、図７Ｂのグラフ７１０に示すように、入力データに検出対象が含まれない場合、そのような傾向は見られない。このように、複数の種類の情報レベルについて評価値系列を生成すると、入力データに検出対象が含まれる場合は、情報レベルの種類の数だけ特定の傾向が表れ、入力データに検出対象が含まれる場合はその傾向は表れないため、監視装置２０は、検出対象を検出する精度をより向上することができる。 7A and 7B show an example of an evaluation value series in which evaluation values calculated for each of a plurality of first change data and a plurality of second change data are arranged and connected in ascending order of information levels. A graph 700 in FIG. 7A represents an example of an evaluation value series generated for input data including a detection target, and a graph 710 in FIG. 7B is an example of an evaluation value series generated for input data not including a detection target. Represents. In the graphs 700 and 710, the horizontal axis represents the level, and the vertical axis represents the evaluation value. Levels 1 to 33 correspond to the first information levels 1 to 33. The evaluation values of levels 1 to 32 are averaged filters whose filter size is (4 (n-1) +1) (n is an integer of 2 to 33, corresponding to the first information levels 32 to 1, respectively) Represents the evaluation value calculated for the first change data to which is applied, and the evaluation value of level 33 represents the evaluation value calculated for the original input data. On the other hand, levels 34-43 correspond to second information levels 1-10. The evaluation values of levels 34 to 42 represent the evaluation values calculated for the second changed data with 45%, 40%,..., 5% noise added to the number of pixels of the entire image. Represents an evaluation value calculated for the original input data.
As shown in the graph 700 of FIG. 7A, when the detection target is included in the input data, the evaluation value increases rapidly as the level increases in two locations, the level 25 to 32 region and the 34 to 43 region. There is a trend. On the other hand, as shown in the graph 710 of FIG. 7B, such a tendency is not seen when the detection target is not included in the input data. As described above, when the evaluation value series is generated for a plurality of types of information levels, when the detection target is included in the input data, a specific tendency appears as many as the number of types of information levels, and the detection target is included in the input data. In this case, since the tendency does not appear, the monitoring device 20 can further improve the accuracy of detecting the detection target.

対象判定部２５５は、評価値系列生成部２５４が情報レベルについて生成する評価値系列と同じ順序に並べられた学習用の評価値系列を用いて予め学習された系列識別器を有する。対象判定部２５５は、評価値系列生成部２５４により生成された評価値系列をその系列識別器に入力し、出力された識別結果により入力データに検出対象が含まれるか否かを判定し、判定結果を出力する。 The object determination unit 255 includes a sequence discriminator that has been learned in advance using evaluation value sequences for learning arranged in the same order as the evaluation value sequence generated by the evaluation value sequence generation unit 254 for the information level. The object determination unit 255 inputs the evaluation value series generated by the evaluation value series generation unit 254 to the series discriminator, and determines whether or not the detection target is included in the input data based on the output identification result. Output the result.

以下、図８に示したフローチャートを参照しつつ、監視装置２０による対象検出処理の動作を説明する。このフローチャートは、監視装置２０において、前述した図５に示すフローチャートの代りに実行することが可能である。なお、以下に説明する動作のフローは、記憶部１４に記憶され、制御部２５に読み込まれたプログラムに従って、制御部２５により制御される。図８に示すフローチャートのステップＳ８０１〜Ｓ８０７、Ｓ８１１〜Ｓ８１３の処理は、図５に示すフローチャートのステップＳ５０１〜Ｓ５０７、Ｓ５１０〜Ｓ５１２の処理と同じであるため、説明を省略し、以下では、ステップＳ８０８〜Ｓ８１０の処理についてのみ説明する。 Hereinafter, the operation of the object detection process by the monitoring device 20 will be described with reference to the flowchart shown in FIG. This flowchart can be executed in the monitoring apparatus 20 instead of the flowchart shown in FIG. The operation flow described below is controlled by the control unit 25 in accordance with a program stored in the storage unit 14 and read into the control unit 25. The processes in steps S801 to S807 and S811 to S813 in the flowchart shown in FIG. 8 are the same as the processes in steps S501 to S507 and S510 to S512 in the flowchart shown in FIG. Only the process of S810 will be described.

ステップＳ８０３〜ＳＳ８０７の処理は、変更処理ごと（第１変更処理、第２変更処理、…）に実行される。ステップＳ８０８において、制御部２５は、全ての変更処理についてステップＳ８０３〜Ｓ８０７の処理を実行したか否かを判定する。全ての変更処理についてステップＳ８０３〜Ｓ８０７の処理を実行していなければ（ステップＳ８０８のＮＯ）、制御部２５は、処理をステップＳ８０３に戻してステップＳ８０３〜Ｓ８０７の処理を繰り返す。一方、全ての変更処理についてステップＳ８０３〜Ｓ８０７の処理を実行していれば（ステップＳ８０８のＹＥＳ）、評価値系列生成部２５４は、それまでに評価値算出部２５３から受け取った評価値を用いて評価値系列を生成する（ステップＳ８０９）。 The processes in steps S803 to SS807 are executed for each change process (first change process, second change process,...). In step S808, the control unit 25 determines whether or not the processes in steps S803 to S807 have been executed for all the change processes. If the processes in steps S803 to S807 are not executed for all the changing processes (NO in step S808), the control unit 25 returns the process to step S803 and repeats the processes in steps S803 to S807. On the other hand, if the processes in steps S803 to S807 are executed for all the change processes (YES in step S808), evaluation value series generation unit 254 uses the evaluation values received from evaluation value calculation unit 253 so far. An evaluation value series is generated (step S809).

次に、対象判定部２５５は、評価値系列生成部２５４が生成する評価値系列と同じ順序に並べられた学習用の評価値系列を用いて予め学習された系列識別器に、評価値系列生成部２５４が生成した評価値系列を入力し、系列識別器から出力される識別結果を取得する（ステップＳ８１０）。 Next, the object determination unit 255 generates an evaluation value sequence in a sequence discriminator previously learned using the evaluation value sequence for learning arranged in the same order as the evaluation value sequence generated by the evaluation value sequence generation unit 254. The evaluation value series generated by the unit 254 is input, and the identification result output from the series discriminator is acquired (step S810).

以上説明してきたように、本実施形態による監視装置は、複数の変更処理によって、入力データから検出対象を表現する程度を互いに異ならせた複数の変更データを生成し、生成した複数の変更データについて算出された評価値をまとめて評価値系列を生成する。監視装置は、異なる変更処理によって生成された変更データに基づく評価値系列を用いて入力データに検出対象が含まれるか否かを判定することにより、入力データから検出対象を検出する精度をさらに向上することができる。 As described above, the monitoring apparatus according to the present embodiment generates a plurality of change data with different degrees of expressing the detection target from the input data by a plurality of change processes, and the generated plurality of change data An evaluation value series is generated by combining the calculated evaluation values. The monitoring device further improves the accuracy of detecting the detection target from the input data by determining whether the detection target is included in the input data using the evaluation value series based on the change data generated by different change processing can do.

次に、本発明の第３の実施形態による対象検出装置が実装された監視システムについて図を参照しつつ説明する。
本実施形態による監視装置は、情報レベルにて規定される予め設定した順序にて評価値を並べた原系列を平滑化して評価値系列を生成し、平滑化された評価値系列を用いて入力データに検出対象が含まれるか否かを判定する。これにより、監視装置は、情報レベルに応じた評価値の細かな変化の影響を除去して、入力データから検出対象を検出する精度のさらなる向上を図る。 Next, a monitoring system in which an object detection device according to a third embodiment of the present invention is mounted will be described with reference to the drawings.
The monitoring apparatus according to the present embodiment generates an evaluation value sequence by smoothing an original sequence in which evaluation values are arranged in a preset order specified by an information level, and inputs the smoothed evaluation value sequence. It is determined whether the detection target is included in the data. Thereby, the monitoring apparatus removes the influence of the fine change of the evaluation value according to the information level, and further improves the accuracy of detecting the detection target from the input data.

第３の実施形態による監視システムの概略構成は、図３に示した第１の実施形態による監視システムの概略構成と同様である。なお、第２の実施形態では、各監視装置を監視装置３０として説明する。監視装置３０は、図３に示した第１の実施形態による監視装置１０と同様に、撮像部、インタフェース部、通信部、記憶部及び制御部を有する。監視装置２０の撮像部、インタフェース部、通信部及び記憶部は、監視装置１０の撮像部１１、インタフェース部１２、通信部１３及び記憶部１４と同様であるため、それぞれ撮像部１１、インタフェース部１２、通信部１３及び記憶部１４として説明する。一方、監視装置３０の制御部は、監視装置１０の制御部１５と一部機能が相違するため、制御部３５として説明する。 The schematic configuration of the monitoring system according to the third embodiment is the same as the schematic configuration of the monitoring system according to the first embodiment shown in FIG. In the second embodiment, each monitoring device will be described as the monitoring device 30. The monitoring device 30 includes an imaging unit, an interface unit, a communication unit, a storage unit, and a control unit, similarly to the monitoring device 10 according to the first embodiment shown in FIG. Since the imaging unit, interface unit, communication unit, and storage unit of the monitoring device 20 are the same as the imaging unit 11, interface unit 12, communication unit 13, and storage unit 14 of the monitoring device 10, the imaging unit 11 and interface unit 12 respectively. The communication unit 13 and the storage unit 14 will be described. On the other hand, the control unit of the monitoring device 30 is described as the control unit 35 because a part of the functions of the control unit 15 of the monitoring device 10 is different.

制御部３５は、対象検出装置の例であり、図４に示した第１の実施形態による監視装置１０が有する制御部１５と同様に、プロセッサ上で動作するソフトウェアにより実装される機能モジュールとして、データ入力部、切り出し部、マルチレベルデータ生成部、評価値算出部、評価値系列生成部、対象判定部及び通知制御部を有する。第３の実施形態では、制御部３５のデータ入力部、切り出し部、マルチレベルデータ生成部、評価値算出部、評価値系列生成部、対象判定部及び通知制御部をデータ入力部３５０、切り出し部３５１、マルチレベルデータ生成部３５２、評価値算出部３５３、評価値系列生成部３５４、対象判定部３５５及び通知制御部３５６とする。制御部３５のデータ入力部３５０、切り出し部３５１、マルチレベルデータ生成部３５２、評価値算出部３５３、対象判定部３５５及び通知制御部３５６は、制御部１５のデータ入力部１５０、切り出し部１５１、マルチレベルデータ生成部１５２、評価値算出部１５３、対象判定部１５５及び通知制御部１５６と同様であるため、説明を省略し、以下では、評価値系列生成部３５４について詳細に説明する。
なお、制御部３５が有するこれらの各部は、独立した集積回路、ファームウェア、マイクロプロセッサなどで構成されてもよい。 The control unit 35 is an example of a target detection device, and as a functional module implemented by software operating on a processor, like the control unit 15 included in the monitoring device 10 according to the first embodiment illustrated in FIG. A data input unit, a cutout unit, a multi-level data generation unit, an evaluation value calculation unit, an evaluation value series generation unit, a target determination unit, and a notification control unit; In the third embodiment, the data input unit 350, the cutout unit, the data input unit, the cutout unit, the multilevel data generation unit, the evaluation value calculation unit, the evaluation value series generation unit, the target determination unit, and the notification control unit of the control unit 35 351, a multi-level data generation unit 352, an evaluation value calculation unit 353, an evaluation value series generation unit 354, a target determination unit 355, and a notification control unit 356. The data input unit 350, the cutout unit 351, the multilevel data generation unit 352, the evaluation value calculation unit 353, the target determination unit 355, and the notification control unit 356 of the control unit 35 are the data input unit 150, the cutout unit 151, Since it is the same as that of the multilevel data generation part 152, the evaluation value calculation part 153, the object determination part 155, and the notification control part 156, description is abbreviate | omitted and below, the evaluation value series generation part 354 is demonstrated in detail.
Note that these units included in the control unit 35 may be configured by independent integrated circuits, firmware, a microprocessor, and the like.

評価値系列生成部３５４は、マルチレベルデータに含まれる各変更データごとに算出された評価値を情報レベルの昇順または降順に並べた原系列を生成し、原系列を平滑化して評価値系列を生成し、対象判定部１５５に渡す。例えば、評価値系列生成部３５４は、各評価値を情報レベルの昇順に並べた原系列において、各評価値について、その評価値と、その評価値より情報レベルが低い側において情報レベルが近い順に所定数の評価値とを用いて加重移動平均を取ることにより平滑化する。または、評価値系列生成部３５４は、各評価値について、その評価値と、その評価値より情報レベルが低い側において情報レベルが近い順に所定数の評価値とを用いて単純移動平均を取ることにより平滑化してもよい。 The evaluation value series generation unit 354 generates an original series in which evaluation values calculated for each change data included in the multi-level data are arranged in ascending or descending order of information levels, and smoothes the original series to generate the evaluation value series. Generate and pass to the target determination unit 155. For example, the evaluation value series generation unit 354 has, in the original series in which the evaluation values are arranged in ascending order of information levels, for each evaluation value, the evaluation value and the information level closer to the information level on the side where the information level is lower than the evaluation value. Smoothing is performed by taking a weighted moving average using a predetermined number of evaluation values. Alternatively, the evaluation value series generation unit 354 takes, for each evaluation value, a simple moving average using the evaluation value and a predetermined number of evaluation values in order of closer information level on the side where the information level is lower than the evaluation value. May be smoothed.

図９Ａ及び図９Ｂに、平滑化した評価値系列について説明するためのグラフを表す。図９Ａのグラフ９００は、平滑化前の原系列の例を表し、図９Ｂのグラフ９１０は、グラフ９００の原系列を平滑化した評価値系列の例を表す。グラフ９００、９１０において、横軸は情報レベルを表し、縦軸は評価値を表す。
図９Ａのグラフ９００に示すように、原系列では、情報レベルが16〜18、22〜23の領域のように情報レベルが中程度の領域であるにも関わらず情報レベルの上昇に従って評価値が上昇したり、情報レベルが29〜30の領域のように情報レベルが高い領域であるにも関わらず情報レベルの上昇に従って評価値が下降する場合がある。一方、図９Ｂのグラフ９１０に示すように、グラフ９００の原系列を平滑化することにより、評価値は、情報レベルが中程度の領域では情報レベルの上昇に従って安定して下降し、情報レベルが高い領域では情報レベルの上昇に従って安定して上昇する。したがって、監視装置１０は、検出対象を検出する精度をより向上することができる。 9A and 9B show graphs for explaining the smoothed evaluation value series. A graph 900 in FIG. 9A represents an example of the original series before smoothing, and a graph 910 in FIG. 9B represents an example of an evaluation value series obtained by smoothing the original series in the graph 900. In the graphs 900 and 910, the horizontal axis represents the information level, and the vertical axis represents the evaluation value.
As shown in the graph 900 of FIG. 9A, in the original series, the evaluation value increases as the information level increases even though the information level is a medium area such as the area where the information level is 16 to 18 and 22 to 23. In some cases, the evaluation value decreases or the evaluation value decreases as the information level increases, even though the information level is high, such as the region where the information level is 29 to 30. On the other hand, as shown in the graph 910 of FIG. 9B, by smoothing the original sequence of the graph 900, the evaluation value stably decreases as the information level increases in the region where the information level is medium. In the high region, it rises stably as the information level rises. Therefore, the monitoring device 10 can further improve the accuracy of detecting the detection target.

対象判定部３５５は、評価値系列生成部３５４が生成する評価値系列と同様に原系列を平滑化した学習用の評価値系列を用いて予め学習された系列識別器を有する。対象判定部３５５は、評価値系列生成部３５４により生成された評価値系列をその系列識別器に入力し、出力された識別結果により入力データに検出対象が含まれるか否かを判定し、判定結果を出力する。 The object determination unit 355 includes a sequence discriminator that has been learned in advance using a learning evaluation value sequence obtained by smoothing the original sequence in the same manner as the evaluation value sequence generated by the evaluation value sequence generation unit 354. The object determining unit 355 inputs the evaluation value series generated by the evaluation value series generating unit 354 to the series discriminator, and determines whether or not the detection target is included in the input data based on the output identification result. Output the result.

以下、監視装置３０による対象検出処理の動作を説明する。監視装置３０による対象検出処理のフローチャートは、図５に示したフローチャートと同様である。ただし、ステップＳ５０８において、評価値系列生成部３５４は、それまでに評価値算出部３５３から受け取った評価値を所定順序に並べた原系列を生成し、生成した原系列を平滑化して評価値系列を生成する。そして、ステップＳ５０９において、対象判定部３５５は、評価値系列生成部３５４が生成する評価値系列と同様に平滑化した学習用の評価値系列を用いて予め学習された系列識別器に、評価値系列生成部３５４が生成した評価値系列を入力し、系列識別器から出力される識別結果を取得する。 Hereinafter, the operation of the object detection process performed by the monitoring device 30 will be described. The flowchart of the object detection process by the monitoring apparatus 30 is the same as the flowchart shown in FIG. However, in step S508, the evaluation value sequence generation unit 354 generates an original sequence in which the evaluation values received from the evaluation value calculation unit 353 are arranged in a predetermined order, smoothes the generated original sequence, and evaluates the evaluation value sequence. Is generated. In step S509, the object determination unit 355 applies the evaluation value to the sequence classifier previously learned using the learning evaluation value sequence smoothed in the same manner as the evaluation value sequence generated by the evaluation value sequence generation unit 354. The evaluation value sequence generated by the sequence generation unit 354 is input, and the identification result output from the sequence discriminator is acquired.

以上説明してきたように、本実施形態による監視装置は、情報レベルにて規定される予め設定した順序にて評価値を並べた原系列を平滑化して評価値系列を生成し、平滑化された評価値系列を用いて入力データに検出対象が含まれるか否かを判定する。これにより、監視装置は、情報レベルに応じた評価値の細かな変化の影響を除去することができ、入力データから検出対象を安定して精度よく検出することができる。 As described above, the monitoring apparatus according to the present embodiment generates an evaluation value sequence by smoothing an original sequence in which evaluation values are arranged in a preset order defined by an information level, and is smoothed. It is determined whether the detection target is included in the input data using the evaluation value series. Thereby, the monitoring apparatus can remove the influence of the fine change of the evaluation value according to the information level, and can detect the detection target stably and accurately from the input data.

なお、第１〜３の実施形態における監視システムにおいて、センタ装置が、監視装置の制御部と同様の制御部を備え、撮影画像に人体が含まれるか否かを判定してもよい。その場合、監視装置は、監視領域を撮影した画像をセンタ装置へ送信し、センタ装置が、受信した画像に人体が含まれるか否かを判定し、監視領域に人物が侵入したか否かを判定する。この場合も、監視システムは、撮影画像から人体を検出する精度を向上することができる。 In the monitoring system in the first to third embodiments, the center device may include a control unit similar to the control unit of the monitoring device, and determine whether or not a captured image includes a human body. In that case, the monitoring device transmits an image obtained by capturing the monitoring region to the center device, the center device determines whether or not a human body is included in the received image, and determines whether or not a person has entered the monitoring region. judge. Also in this case, the monitoring system can improve the accuracy of detecting the human body from the captured image.

また、第１〜３の実施形態の対象検出装置において、検出対象は、人体に限定されず、例えば人物の顔、マスク等としてもよい。この場合、対象検出装置は、撮影画像に人物の顔が含まれるか否か、又はマスクを着用した顔が含まれるか否かを判定する。
さらに、入力データは、画像データに限定されず、例えば音響信号としてもよい。その場合、検出対象は、例えば悲鳴とすることができる。対象検出装置は、悲鳴の発生の有無を監視する監視空間における音から生成した音響信号から有音区間の信号を切り出し、切り出した信号から、情報レベルが互いに異なる複数の信号を生成し、生成した各信号から悲鳴らしさの度合いを表す評価値を算出する。そして、対象検出装置は、算出した評価値を所定順序に並べた評価値系列を生成する。さらに、対象検出装置は、生成した評価値系列を、悲鳴が含まれる音響信号について生成された評価値系列と、含まれない音響信号について生成された評価値系列とを用いて予め学習された系列識別器に入力して、音響信号に悲鳴が含まれるか否かを判定する。
さらに、入力データは、ドップラ信号としてもよい。その場合、検出対象は、例えば人体の移動とすることができる。対象検出装置は、監視領域に電磁波を送信して得られる反射波に含まれるドップラ成分のみを抽出し、ドップラ信号とする。対象検出装置は、ドップラ信号から切り出した信号から、情報レベルが互いに異なる複数の信号を生成し、生成した各信号から人体らしさの度合いを表す評価値を算出する。そして、対象検出装置は、算出した評価値を所定順序に並べた評価値系列を生成する。さらに、対象検出装置は、生成した評価値系列を、人体の移動に起因するドップラ信号について生成された評価値系列と、人体の移動に起因しないドップラ信号について生成された評価値系列とを用いて予め学習された系列識別器に入力して、ドップラ信号が人体の移動に起因するか否かを判定する。 Moreover, in the target detection apparatus of the first to third embodiments, the detection target is not limited to the human body, and may be a human face, a mask, or the like, for example. In this case, the target detection device determines whether or not a person's face is included in the captured image, or whether or not a face wearing a mask is included.
Furthermore, the input data is not limited to image data, and may be, for example, an acoustic signal. In that case, the detection target can be, for example, a scream. The target detection device cuts out a signal in a sound section from an acoustic signal generated from sound in a monitoring space for monitoring the occurrence of scream, generates a plurality of signals having different information levels from the cut out signal, and generates the signal An evaluation value representing the degree of screaming is calculated from each signal. Then, the target detection device generates an evaluation value series in which the calculated evaluation values are arranged in a predetermined order. Further, the target detection apparatus uses the evaluation value series generated by using the evaluation value series generated for the acoustic signal including scream and the evaluation value series generated for the acoustic signal not included in advance. It inputs into a discriminator and determines whether a scream is included in an acoustic signal.
Further, the input data may be a Doppler signal. In this case, the detection target can be, for example, the movement of a human body. The target detection device extracts only the Doppler component included in the reflected wave obtained by transmitting the electromagnetic wave to the monitoring area and sets it as a Doppler signal. The target detection device generates a plurality of signals having different information levels from the signal cut out from the Doppler signal, and calculates an evaluation value representing the degree of human bodyness from the generated signals. Then, the target detection device generates an evaluation value series in which the calculated evaluation values are arranged in a predetermined order. Further, the target detection device uses the evaluation value series generated for the Doppler signal caused by the movement of the human body and the evaluation value series generated for the Doppler signal not caused by the movement of the human body. It is input to a sequence discriminator learned in advance, and it is determined whether or not the Doppler signal is caused by the movement of the human body.

以上のように、当業者は、本発明の範囲内で、実施される形態に合わせて様々な変更を行うことができる。 As described above, those skilled in the art can make various modifications in accordance with the embodiment to be implemented within the scope of the present invention.

１０、２０、３０監視装置
１５、２５、３５制御部
１５０、２５０、３５０データ入力部
１５１、２５１、３５１切り出し部
１５２、２５２、３５２マルチレベルデータ生成部
１５３、２５３、３５３評価値算出部
１５４、２５４、３５４評価値系列生成部
１５５、２５５、３５５対象判定部
１５６、２５６、３５６通知制御部 10, 20, 30 Monitoring device 15, 25, 35 Control unit 150, 250, 350 Data input unit 151, 251, 351 Cutout unit 152, 252, 352 Multi-level data generation unit 153, 253, 353 Evaluation value calculation unit 154, 254, 354 Evaluation value series generation unit 155, 255, 355 Target determination unit 156, 256, 356 Notification control unit

Claims

A target detection device that determines whether or not a detection target is included in input data acquired from a data input unit,
A multi-level data generation unit that generates change data with different information levels that represent the detection target from the input data;
For each of the change data, an evaluation value calculation unit that calculates an evaluation value representing the degree of likelihood of detection,
An evaluation value sequence generation unit that generates an evaluation value sequence by arranging the evaluation values in a preset order prescribed by the information level;
A target determination unit that determines whether or not the detection target is included in the input data based on identification information for identifying whether or not the evaluation value series is generated from input data including the detection target;
An object detection apparatus comprising:

The multi-level data generation unit generates first change data having different degrees of expressing the detection target from the input data by the first change process, and a second change process different from the first change process 2. The target according to claim 1, wherein second change data in which degrees of expressing the detection target are different from each other is generated from the input data, and the first change data and the second change data are used as the change data. Detection device.

3. The evaluation value series generation unit according to claim 1, wherein the evaluation value series generation unit generates the evaluation value series by smoothing an original series in which the evaluation values are arranged in a preset order defined by the information level. Object detection device.