JP2010003116A

JP2010003116A - Object deciding device and program

Info

Publication number: JP2010003116A
Application number: JP2008161355A
Authority: JP
Inventors: Takashi Naito; 貴志内藤; Shinichi Kojima; 真一小島; Satoru Nakanishi; 悟中西; Takehiko Tanaka; 勇彦田中; Junya Kasugai; 純也春日井; Takuhiro Omi; 拓寛大見; Hiroyuki Ishizaka; 宏幸石坂
Original assignee: Aisin Seiki Co Ltd; Hino Motors Ltd; Denso Corp; Toyota Motor Corp; Toyota Central R&D Labs Inc
Current assignee: Hino Motors Ltd; Denso Corp; Toyota Motor Corp; Toyota Central R&D Labs Inc; Aisin Corp
Priority date: 2008-06-20
Filing date: 2008-06-20
Publication date: 2010-01-07
Anticipated expiration: 2028-06-20
Also published as: JP5127582B2

Abstract

PROBLEM TO BE SOLVED: To provide an object deciding device and a program for deciding whether or not an object is wearing a mounted object such as a mask or sun glasses with high reliability. SOLUTION: A likelihood value p1 showing object-likeliness is calculated by using a machine learning system configured to learn to respond to the image of an object shown by a pickup image, the object wearing a mask and the object not wearing any mask (S104), and a likelihood value p2 showing object-likeliness wearing the mask is calculated by using the machine learning system configured to learn to respond to the image of the object wearing the mask (S106), and a likelihood value p3 showing object-likeliness is calculated by using the machine learning system configured to respond to the image of the object not wearing the mask (S108), and whether or not the object shown by the pickup image is wearing the mask is decided on the basis of the calculated three likelihood values p1, p2 and p3 (S110). COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、対象物判定装置及びプログラムに関し、特に、対象物が装着物を装着しているか否かを判定する対象物判定装置及びプログラムに関する。 The present invention relates to an object determination device and a program, and more particularly to an object determination device and a program for determining whether or not an object is wearing an attachment.

顔画像から抽出される口や鼻などの顔特徴量を用いて顔向きや、瞬き、視線などの対象者の顔状態量を推定する場合において、対象者がマスクやサングラスなどを装着していると、前記顔状態量を正しく推定できない。顔への装着物を考慮して顔向きを推定する方法として、例えば下記特許文献１に記載された技術がある。この技術では顔画像から目、口、鼻などの特徴点の抽出を行い顔向きを推定するが、目の特徴点が所定回抽出できない場合はサングラスを装着していると判定し、サングラス上の画像特徴を用いて顔向きを推定している。また鼻と口の特徴量が所定回抽出できない場合は、マスクを装着していると判定して、マスク上の画像特徴を用いて顔向きの推定を行なっている。
特開２００３−２９６７１２号公報 The target person wears a mask, sunglasses, etc. when estimating the face state quantity such as face direction, blink, gaze, etc., using facial feature quantities such as mouth and nose extracted from the face image Then, the face state quantity cannot be estimated correctly. As a method for estimating the face orientation in consideration of the attachment on the face, for example, there is a technique described in Patent Document 1 below. In this technique, feature points such as eyes, mouth, and nose are extracted from the face image to estimate the face direction. If feature points of the eyes cannot be extracted a predetermined number of times, it is determined that sunglasses are worn, Face orientation is estimated using image features. If the feature amount of the nose and mouth cannot be extracted a predetermined number of times, it is determined that the mask is worn, and the face orientation is estimated using the image feature on the mask.
JP 2003-296712 A

上記従来の技術では、目、鼻、口の特徴量が抽出できない場合にそれぞれサングラス、マクスを装着していると判定しているが、例えば、車両を運転する運転者の顔向きの推定を行なう場合には、実際の走行環境において、周辺照明環境の変動による顔画像上での局所的明るさ変化や、対象者の顔姿勢などにより目、鼻、口の特徴が未検出となる場合が頻繁に発生する。このため、サングラスやマスクを装着していないにもかかわらず、目や鼻、口が検出できないことからサングラス、マスクを装着していると判定してしまい、それによって本来存在しないサングラス、マスクの特徴点から顔向きなどの状態量を推定することとなり、結果的に正しい状態量は得られず、逆に非常に誤差の大きい状態量を算出してしまう、という問題が発生する。 In the above conventional technique, when it is not possible to extract the feature amounts of the eyes, nose, and mouth, it is determined that the sunglasses and the max are worn, respectively. For example, the face orientation of the driver who drives the vehicle is estimated. In some cases, in the actual driving environment, the characteristics of eyes, nose, and mouth are often undetected due to local brightness changes on the face image due to fluctuations in the surrounding lighting environment, face posture of the subject, etc. Occurs. For this reason, it is determined that the user wears sunglasses and a mask because the eyes, nose, and mouth cannot be detected even though the sunglasses and mask are not worn. A state quantity such as the face orientation is estimated from the point, and as a result, a correct state quantity cannot be obtained, and conversely, a state quantity with a very large error is calculated.

本発明は、上述した問題を解決するために提案されたものであり、対象物がマスクやサングラスなどの装着物を装着しているか否かを信頼性高く判定することができる対象物判定装置及びプログラムを提供することを目的とする。 The present invention has been proposed in order to solve the above-described problem, and an object determination apparatus capable of determining with high reliability whether or not the object is wearing an object such as a mask or sunglasses, and The purpose is to provide a program.

上記目的を達成するために、請求項１の発明の対象物判定装置は、対象物を撮像するための撮像手段と、前記撮像手段で撮像されて得られた画像が表す対象物について、前記撮像手段で撮像されて得られた画像と、装着物を装着した対象物の画像および前記装着物を装着していない対象物の画像に反応するように学習した機械学習システムとを用いて対象物らしさを示す第１尤度値を算出し、前記撮像手段で撮像されて得られた画像と、前記装着物を装着した対象物の画像に反応するように学習した機械学習システムとを用いて前記装着物を装着した対象物らしさを示す第２尤度値を算出し、前記撮像手段で撮像されて得られた画像と、前記装着物を装着していない対象物の画像に反応するように学習した機械学習システムとを用いて前記装着物を装着していない対象物らしさを示す第３尤度値を算出する尤度値算出手段と、前記尤度値算出手段で算出された第１尤度値、第２尤度値、及び第３尤度値に基づいて、前記撮像手段で撮像されて得られた画像が表す対象物が前記装着物を装着しているか否かを判定する判定手段と、を含んで構成されている。 In order to achieve the above object, the object determination device of the invention of claim 1 is configured to capture an image of an object represented by an imaging unit for imaging the object and an image obtained by imaging by the imaging unit. The object-likeness is obtained by using an image captured by the means, an image of the object with the attached object, and a machine learning system learned to react to the image of the object without the attached object. A first likelihood value is calculated, and the wearing is performed using an image obtained by being picked up by the image pickup means, and a machine learning system that has learned to react to an image of the object on which the wearing object is attached. A second likelihood value indicating the likelihood of the object wearing the object is calculated, and learning is performed so as to react to the image obtained by imaging by the imaging unit and the image of the object not wearing the object. Using the machine learning system A likelihood value calculating means for calculating a third likelihood value indicating the likelihood of an object not wearing an object; a first likelihood value, a second likelihood value calculated by the likelihood value calculating means; And determining means for determining whether or not the object represented by the image obtained by imaging by the imaging means is wearing the wearing object based on the three likelihood values.

このように、装着物を装着した対象物の画像および装着物を装着していない対象物の画像に反応するように学習した機械学習システムと、装着物を装着した対象物の画像に反応するように学習した機械学習システムと、装着物を装着していない対象物の画像に反応するように学習した機械学習システムとを用いて３つの尤度値を求め、この３つの尤度値に基づいて対象物が装着物を装着しているか否かを判定するようにしたため、装着物の装着の有無を信頼性高く判定することができる。特に３種類の異なる機械学習システムから求められた尤度値を評価基準に用いることで、１種類の機械学習システムを用いる場合よりもより誤検出を少なくすることができる。 In this way, the machine learning system learned to respond to the image of the object with the attachment and the image of the object without the attachment, and the image of the object with the attachment Three likelihood values are obtained using the machine learning system learned in the above and the machine learning system learned to react to the image of the object not wearing the attachment, and based on the three likelihood values. Since it is determined whether or not the object is wearing an attachment, the presence or absence of the attachment can be determined with high reliability. In particular, by using likelihood values obtained from three different types of machine learning systems as evaluation criteria, it is possible to reduce false detections more than when using one type of machine learning system.

請求項２の発明の対象物判定装置は、対象物を撮像するための撮像手段と、前記撮像手段で撮像されて得られた画像から、前記対象物が装着物を装着していると仮定したときに該装着物が存在すると推定される予め定められた領域を抽出する抽出手段と、前記撮像手段で撮像されて得られた画像が表す対象物について、前記抽出手段で抽出された領域の画像と、前記装着物を装着した対象物の画像における前記予め定められた領域の画像および前記装着物を装着していない対象物の画像における前記予め定められた領域の画像に反応するように学習した機械学習システムとを用いて対象物らしさを示す第１尤度値を算出し、前記抽出手段で抽出された領域の画像と、前記装着物を装着した対象物の画像における前記予め定められた領域の画像に反応するように学習した機械学習システムとを用いて前記装着物を装着した対象物らしさを示す第２尤度値を算出し、前記抽出手段で抽出された領域の画像と、前記装着物を装着していない対象物の画像における前記予め定められた領域の画像に反応するように学習した機械学習システムとを用いて前記装着物を装着していない対象物らしさを示す第３尤度値を算出する尤度値算出手段と、前記尤度値算出手段で算出された第１尤度値、第２尤度値、及び第３尤度値に基づいて、前記撮像手段で撮像されて得られた画像が表す対象物が前記装着物を装着しているか否かを判定する判定手段と、を含んで構成されている。 The object determination apparatus according to the invention of claim 2 assumes that the object is mounted on the object from an imaging unit for imaging the object and an image obtained by imaging with the imaging unit. An extraction unit that extracts a predetermined region that is estimated to be the presence of the wearing object, and an image of the region extracted by the extraction unit with respect to an object represented by an image captured by the imaging unit And learning to react to the image of the predetermined area in the image of the object with the wearing object and the image of the predetermined area in the image of the object not wearing the wearing object. A first likelihood value indicating the likelihood of an object is calculated using a machine learning system, and the predetermined area in the image of the area extracted by the extraction unit and the image of the object mounted with the attachment Image of A second likelihood value indicating the likelihood of the object wearing the wearing object is calculated using a machine learning system that learns to react, and the image of the region extracted by the extracting unit and the wearing object are attached. A third likelihood value indicating the likelihood of the object not wearing the attachment is calculated using a machine learning system that has learned to react to the image of the predetermined region in the image of the object that has not been attached. Obtained by the imaging means based on the likelihood value calculating means and the first likelihood value, the second likelihood value, and the third likelihood value calculated by the likelihood value calculating means. Determination means for determining whether or not the object represented by the image is wearing the wearing object.

このように、撮像されて得られた画像から、対象物が装着物を装着していると仮定したときに該装着物が存在すると推定される予め定められた領域を抽出し、該抽出した領域の画像と、装着物を装着した対象物の画像における上記予め定められた領域の画像および装着物を装着していない対象物の画像における上記予め定められた領域の画像に反応するように学習した機械学習システムと、装着物を装着した対象物の画像における上記予め定められた領域の画像に反応するように学習した機械学習システムと、装着物を装着していない対象物の画像における上記予め定められた領域の画像に反応するように学習した機械学習システムとを用いて３つの尤度値を求め、この３つの尤度値に基づいて対象物が装着物を装着しているか否かを判定するようにしたため、請求項１に記載の発明と同様に、装着物の装着の有無を信頼性高く判定することができる。また、請求項１に記載の発明の効果に加え、請求項２に記載の発明では、対象物全体の画像ではなく装着物が存在すると推定される予め定められた領域の画像に着目して判定を行なうため、処理時間の短縮につながる。 In this way, a predetermined area that is assumed to be present when the object is assumed to be mounted is extracted from the image obtained by imaging, and the extracted area is extracted. And the image of the predetermined area in the image of the object with the attached object and the image of the predetermined area in the image of the object without the attached object. A machine learning system, a machine learning system that learns to react to an image of the predetermined area in the image of the object on which the wearing object is mounted, and the predetermined image in the image of the object that does not wear the wearing object Three likelihood values are obtained using a machine learning system that has learned to react to an image of a given region, and it is determined whether or not the object is wearing an attachment based on the three likelihood values. You Because the way, like the invention described in claim 1, the presence or absence of the mounting of the mounting object can be determined reliably. Further, in addition to the effect of the invention described in claim 1, in the invention described in claim 2, the determination is made by paying attention to an image of a predetermined region in which it is estimated that a wearing object exists, not an image of the entire object. As a result, the processing time is shortened.

なお、請求項３に記載の発明のように、請求項１または２記載の対象物判定装置の判定手段は、前記第１尤度値、前記第２尤度値、及び前記第３尤度値の大小関係に基づいて、前記撮像手段で撮像されて得られた画像が表す対象物が前記装着物を装着しているか否かを判定するようにしてもよい。 As in the invention described in claim 3, the determination means of the object determination device according to claim 1 or 2 is configured such that the first likelihood value, the second likelihood value, and the third likelihood value are determined. Based on the size relationship, it may be determined whether or not the object represented by the image obtained by the imaging unit is wearing the wearing object.

また、請求項４に記載の発明のように、請求項１または２記載の対象物判定装置の判定手段は、前記第１尤度値と第２尤度値との比、および前記第１尤度値と前記第３尤度値との比に基づいて、前記撮像手段で撮像されて得られた画像が表す対象物が前記装着物を装着しているか否かを判定するようにしてもよい。 Further, as in the invention described in claim 4, the determination unit of the object determination device according to claim 1 or 2 includes the ratio between the first likelihood value and the second likelihood value, and the first likelihood. Based on the ratio between the degree value and the third likelihood value, it may be determined whether or not the object represented by the image captured by the imaging unit is wearing the attachment. .

請求項５の発明のプログラムは、コンピュータに、対象物を撮像するための撮像手段で撮像されて得られた画像が表す対象物について、前記撮像手段で撮像されて得られた画像と、装着物を装着した対象物の画像および前記装着物を装着していない対象物の画像に反応するように学習した機械学習システムとを用いて対象物らしさを示す第１尤度値を算出する第１尤度値算出ステップと、前記撮像手段で撮像されて得られた画像が表す対象物について、前記撮像手段で撮像されて得られた画像と、前記装着物を装着した対象物の画像に反応するように学習した機械学習システムとを用いて前記装着物を装着した対象物らしさを示す第２尤度値を算出する第２尤度値算出ステップと、前記撮像手段で撮像されて得られた画像が表す対象物について、前記撮像手段で撮像されて得られた画像と、前記装着物を装着していない対象物の画像に反応するように学習した機械学習システムとを用いて前記装着物を装着していない対象物らしさを示す第３尤度値を算出する第３尤度値算出ステップと、前記算出された第１尤度値、第２尤度値、及び第３尤度値に基づいて、前記撮像手段で撮像されて得られた画像が表す対象物が前記装着物を装着しているか否かを判定する判定ステップと、を実行させるためのプログラムである。 According to a fifth aspect of the present invention, there is provided a program for an object represented by an image picked up by an image pickup means for picking up an object on a computer. A first likelihood value that calculates a first likelihood value indicating the likelihood of an object using a machine learning system that has learned to react to an image of an object wearing the object and an image of an object not wearing the attachment The degree value calculating step and the object represented by the image captured by the image capturing unit are responsive to the image captured by the image capturing unit and the image of the object mounted with the mounted object. A second likelihood value calculating step for calculating a second likelihood value indicating the likelihood of the object on which the object is mounted using the machine learning system that has been learned, and an image obtained by being imaged by the imaging means. For the object to represent An object that is not attached to the wearing object using an image obtained by being picked up by the imaging means and a machine learning system that has learned to react to an image of the object that is not attached to the wearing object Based on the third likelihood value calculating step for calculating the third likelihood value indicating the likelihood and the calculated first likelihood value, second likelihood value, and third likelihood value, the imaging means A determination step of determining whether or not the object represented by the image obtained by imaging is mounted on the mounted object.

このようなプログラムによっても、請求項１に記載の対象物判定装置と同様に作用するため、請求項１に記載の対象物判定装置と同様の効果が得られる。なお、第１尤度値算出ステップ、第２尤度値算出ステップ、および第３尤度値算出ステップの３つのステップはどのような順序で行なってもよい。 Even with such a program, the same effect as that of the object determination device according to claim 1 can be obtained because it operates in the same manner as the object determination device according to claim 1. Note that the three steps of the first likelihood value calculation step, the second likelihood value calculation step, and the third likelihood value calculation step may be performed in any order.

請求項６の発明のプログラムは、コンピュータに、対象物を撮像するための撮像手段で撮像されて得られた画像から、前記対象物が装着物を装着していると仮定したときに該装着物が存在すると推定される予め定められた領域を抽出する抽出ステップと、前記撮像手段で撮像されて得られた画像が表す対象物について、前記抽出された領域の画像と、前記装着物を装着した対象物の画像における前記予め定められた領域の画像および前記装着物を装着していない対象物の画像における前記予め定められた領域の画像に反応するように学習した機械学習システムとを用いて対象物らしさを示す第１尤度値を算出する第１尤度値算出ステップと、前記撮像手段で撮像されて得られた画像が表す対象物について、前記抽出された領域の画像と、前記装着物を装着した対象物の画像における前記予め定められた領域の画像に反応するように学習した機械学習システムとを用いて前記装着物を装着した対象物らしさを示す第２尤度値を算出する第２尤度値算出ステップと、前記撮像手段で撮像されて得られた画像が表す対象物について、前記抽出された領域の画像と、前記装着物を装着していない対象物の画像における前記予め定められた領域の画像に反応するように学習した機械学習システムとを用いて前記装着物を装着していない対象物らしさを示す第３尤度値を算出する第３尤度値算出ステップと、前記算出された第１尤度値、第２尤度値、及び第３尤度値に基づいて、前記撮像手段で撮像されて得られた画像が表す対象物が前記装着物を装着しているか否かを判定する判定ステップと、を実行させるためのプログラムである。 According to a sixth aspect of the present invention, there is provided a program according to a sixth aspect of the present invention, when it is assumed that the object is attached to the object from an image obtained by imaging with an imaging means for imaging the object. An extraction step for extracting a predetermined area estimated to exist, and an object represented by an image captured by the imaging means, and an image of the extracted area and the wearing object are mounted Using a machine learning system that has learned to react to an image of the predetermined area in the image of the object and an image of the predetermined area in the image of the object that is not mounted A first likelihood value calculating step for calculating a first likelihood value indicating physicality; an object image represented by an image captured by the imaging unit; and an image of the extracted region; A second likelihood value indicating the likelihood of the object wearing the wearing object is calculated using a machine learning system learned to react to the image of the predetermined area in the image of the object wearing the kimono. For the object represented by the second likelihood value calculating step and the image obtained by imaging by the imaging means, the image in the extracted area and the image of the object not wearing the wearing object in advance A third likelihood value calculating step of calculating a third likelihood value indicating the likelihood of an object not wearing the wearing object using a machine learning system that has learned to react to an image of a predetermined area; Based on the calculated first likelihood value, second likelihood value, and third likelihood value, is the object represented by the image captured by the imaging means wearing the wearing object? A determination step for determining whether or not Is a program for executing the.

このようなプログラムによっても、請求項２に記載の対象物判定装置と同様に作用するため、請求項２に記載の対象物判定装置と同様の効果が得られる。なお、第１尤度値算出ステップ、第２尤度値算出ステップ、および第３尤度値算出ステップの３つのステップはどのような順序で行なってもよい。 Such a program also acts in the same manner as the object determination device according to claim 2, and therefore the same effect as the object determination device according to claim 2 can be obtained. Note that the three steps of the first likelihood value calculation step, the second likelihood value calculation step, and the third likelihood value calculation step may be performed in any order.

以上説明したように本発明によれば、対象物が装着物を装着しているか否かを信頼性高く判定することができる。 As described above, according to the present invention, it is possible to determine with high reliability whether or not an object is wearing an attachment.

図１は、本発明の実施の形態に係る対象物判定装置１０の概略構成図である。本実施の形態の対象物判定装置１０は、対象物として人物の顔を撮影し、該撮影した人物の顔に装着物（ここではマスク）が装着されているか否かを判定する。 FIG. 1 is a schematic configuration diagram of an object determination device 10 according to an embodiment of the present invention. The target object determination apparatus 10 according to the present embodiment captures a person's face as the target object, and determines whether or not an attachment (here, a mask) is attached to the photographed person's face.

図１に示すように、対象物判定装置１０は、対象者１２を撮影するためのカメラ１４と、画像取り込み装置１６と、画像処理装置１８とから構成される。 As shown in FIG. 1, the object determination device 10 includes a camera 14 for photographing a subject 12, an image capturing device 16, and an image processing device 18.

画像取り込み装置１６は、画像メモリを備えており、カメラ１４で撮影されて得られた画像データを取り込んで画像メモリに記憶する。また、画像取り込み装置１６は、画像メモリから該画像データを読み出して画像処理装置１８に入力する。 The image capturing device 16 includes an image memory, captures image data obtained by photographing with the camera 14, and stores the image data in the image memory. The image capturing device 16 reads out the image data from the image memory and inputs it to the image processing device 18.

図２は、画像処理装置１８の機能構成図である。 FIG. 2 is a functional configuration diagram of the image processing apparatus 18.

画像処理装置１８は、画像取り込み装置１６から入力された画像データに基づいて対象者１２の顔にマスクが装着されているか否かを判定する処理を行なう装置であって、顔領域検出部２０、尤度算出部２２、および判定部２４を含んで構成されている。 The image processing device 18 is a device that performs processing for determining whether or not a mask is attached to the face of the subject 12 based on the image data input from the image capturing device 16, and includes a face region detection unit 20, The likelihood calculation unit 22 and the determination unit 24 are included.

顔領域検出部２０は、画像取り込み装置１６から入力された画像データの画像から、対象者１２の顔の領域を検出する。顔の領域を検出する技術には、様々な技術があるが、本実施の形態では、パターン認識能力の優れた識別器の一種であるニューラルネットワーク（以下、ＮＮ）を用いたH.Rowleyらの顔検出の手法（"Neural Network-based Face Detection", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.20, No.1, 1998, pp.23-38参照）を用いてもよい。ただし、顔領域検出部２０では対象者１２がマスクを装着しているか否かを判断するわけではないため、対象者１２がマスクを装着しているか否かに関わらず対象者１２の顔を検出できるように予めＮＮを学習しておく。 The face area detection unit 20 detects the face area of the subject 12 from the image of the image data input from the image capturing device 16. There are various techniques for detecting the face area. In this embodiment, H. Rowley et al. Using a neural network (hereinafter referred to as NN) which is a kind of classifier having excellent pattern recognition capability. A face detection method (see “Neural Network-based Face Detection”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, 1998, pp. 23-38) may be used. However, since the face area detection unit 20 does not determine whether or not the subject 12 is wearing a mask, the face of the subject 12 is detected regardless of whether or not the subject 12 is wearing a mask. NN is learned beforehand so that it can be done.

より具体的には、入力層、中間層、及び出力層を連結して構成したＮＮに、既知の入出力データによって連結関係を学習させる。ここでは、バックプロパゲーションと呼ぶ学習手法により、図３（Ａ）に示すように、マスクを装着していない多数の顔画像や、図３（Ｂ）に示すように、さまざまな形状のマスクを装着している多数の顔画像に対してＮＮの出力層が１の信号を出力するように教師信号１を与えて興奮学習を行なう。一方、顔と全く異なる画像（たとえば風景画像など）を非顔画像として多数用意しておき、これらの画像に対しては教師信号０を与えて抑制学習を行う。このＮＮをマスク／非マスク顔反応ＮＮと呼称する。なお、マスク／非マスク顔反応ＮＮの出力値は０から１の範囲の値をとるように学習しておく。 More specifically, an NN configured by connecting an input layer, an intermediate layer, and an output layer is caused to learn a connection relationship using known input / output data. Here, by a learning method called back propagation, as shown in FIG. 3 (A), a large number of facial images without a mask, and masks of various shapes as shown in FIG. Excitement learning is performed by giving a teacher signal 1 so that the output layer of the NN outputs a signal of 1 for a large number of face images worn. On the other hand, a number of images completely different from the face (for example, landscape images) are prepared as non-face images, and suppression learning is performed by giving a teacher signal 0 to these images. This NN is referred to as a mask / non-mask face reaction NN. It should be noted that the output value of the mask / non-mask face reaction NN is learned to take a value in the range of 0 to 1.

尤度算出部２２は、顔領域検出部２０で検出された領域に対して、マスクの装着の有無に拘わらず顔らしさを示す第１尤度値ｐ１と、マスクを装着した顔らしさを示す第２尤度値ｐ２と、マスクを装着していない顔らしさを示す第３尤度値ｐ３とを算出する。 The likelihood calculating unit 22 has a first likelihood value p1 indicating the face-likeness regardless of whether or not the mask is attached to the region detected by the face region detecting unit 20, and a first likelihood indicating the face-likeness wearing the mask. A two-likelihood value p2 and a third likelihood value p3 indicating the likelihood of a face not wearing a mask are calculated.

具体的には、マスクの装着の有無に拘わらず顔らしさを示す第１尤度値ｐ１については、顔領域検出部２０の顔領域検出の際に用いた上記マスク／非マスク顔反応ＮＮを、上記検出した顔領域に適用した場合の出力値を第１尤度値ｐ１とする。従って、第１尤度値ｐ１は、０から１の範囲の値をとる。 Specifically, for the first likelihood value p1 indicating the facial appearance regardless of whether or not a mask is worn, the mask / non-mask face reaction NN used in the face area detection of the face area detection unit 20 is expressed as follows: The output value when applied to the detected face area is defined as a first likelihood value p1. Accordingly, the first likelihood value p1 takes a value in the range of 0 to 1.

また、マスクを装着した顔らしさを示す第２尤度値ｐ２については、マスクを装着した顔の画像に反応するように学習したＮＮ（以下、マスク顔反応ＮＮ）を、顔領域検出部２０で検出された顔領域に適用して算出する。なお、マスク顔反応ＮＮも、上記マスク／非マスク顔反応ＮＮと同様にバックプロパゲーションの方法を用いて、図３（Ｂ）に示したようなさまざまな形状のマスクを装着している多数の顔画像に対して、教師信号を１として、いわゆる興奮学習を行なうと同時に、図３（Ａ）に示したようなマスクを装着していない多数の顔画像に対して教師信号を０として抑制学習を行なう。このような学習を行なうことによって、マスク顔反応ＮＮの出力層がマスクを装着した顔画像に対しては高い出力値を、マスクを装着していない顔画像に対しては低い出力値を出力するようになることが期待できる。なお、マスク顔反応ＮＮの出力値は０から１の範囲の値をとるように学習しておく。本実施の形態では、マスク顔反応ＮＮを顔領域検出部２０で検出された顔領域に適用したときの出力値をそのまま第２尤度値ｐ２として用いる。従って、第２尤度値ｐ２は、０から１の範囲の値をとる。 In addition, for the second likelihood value p2 indicating the likelihood of a face wearing a mask, the face area detecting unit 20 uses an NN learned to react to an image of the face wearing the mask (hereinafter referred to as mask face reaction NN). The calculation is applied to the detected face area. Note that the mask face reaction NN also uses a number of masks having various shapes as shown in FIG. 3B by using the back-propagation method in the same manner as the mask / non-mask face reaction NN. Performing so-called excitement learning with a teacher signal of 1 for a face image, and at the same time, suppressing learning with a teacher signal of 0 for a number of face images not wearing a mask as shown in FIG. To do. By performing such learning, the output layer of the mask face reaction NN outputs a high output value for a face image wearing a mask and a low output value for a face image not wearing a mask. It can be expected that Note that the output value of the mask face reaction NN is learned to take a value in the range of 0 to 1. In the present embodiment, the output value when the mask face reaction NN is applied to the face area detected by the face area detection unit 20 is used as it is as the second likelihood value p2. Therefore, the second likelihood value p2 takes a value in the range of 0 to 1.

また、マスクを装着していない顔らしさを示す第３尤度値ｐ３については、マスクを装着していない顔の画像のみに反応するように学習したＮＮ（非マスク顔反応ＮＮ）を、顔領域検出部２０で検出された顔領域に適用して算出する。なお、非マスク顔反応ＮＮも、上記マスク／非マスク顔反応ＮＮと同様にバックプロパゲーションの方法を用いて、図３（Ａ）に示したようなマスクを装着していない多数の顔画像に対して教師信号を１として、興奮学習を行なうと同時に、図３（Ｂ）に示したようなさまざまな形状のマスクを装着している多数の顔画像に対して、教師信号を０として抑制学習を行なう。このような学習を行なうことによって、非マスク顔反応ＮＮの出力層がマスクを装着していない顔画像に対しては高い出力値を、マスクを装着した顔画像に対しては低い出力値を出力するようになることが期待できる。なお、非マスク顔反応ＮＮの出力値は０から１の範囲の値をとるように学習しておく。本実施の形態では、非マスク顔反応ＮＮを顔領域検出部２０で検出された顔領域に適用したときの出力値をそのまま第３尤度値ｐ３として用いる。従って、第３尤度値ｐ３は、０から１の範囲の値をとる。 For the third likelihood value p3 indicating the likelihood of a face not wearing a mask, NN (non-mask face reaction NN) learned to react only to a face image not wearing a mask is used as a face region. The calculation is applied to the face area detected by the detection unit 20. The non-mask face reaction NN is also applied to a large number of face images not wearing a mask as shown in FIG. 3A by using the back-propagation method in the same manner as the mask / non-mask face reaction NN. On the other hand, excitement learning is performed by setting the teacher signal to 1, and at the same time, suppression learning is performed by setting the teacher signal to 0 for a large number of face images wearing masks of various shapes as shown in FIG. To do. By performing such learning, the output layer of the non-mask face reaction NN outputs a high output value for a face image not wearing a mask and a low output value for a face image wearing a mask. You can expect to be. It should be noted that the output value of the non-mask face reaction NN is learned to take a value in the range of 0 to 1. In the present embodiment, the output value when the non-mask face reaction NN is applied to the face area detected by the face area detection unit 20 is used as it is as the third likelihood value p3. Accordingly, the third likelihood value p3 takes a value in the range of 0 to 1.

判定部２４は、尤度算出部２２で算出された第１尤度値ｐ１、第２尤度値ｐ２、および第３尤度値ｐ３とを用いて、カメラ１４で撮影された画像が表す対象者１２の顔にマスクが装着されているか否かを判定する。 The determination unit 24 uses the first likelihood value p1, the second likelihood value p2, and the third likelihood value p3 calculated by the likelihood calculation unit 22 to represent an object captured by the camera 14 It is determined whether a mask is attached to the face of the person 12.

なお、上記説明した画像取り込み装置１６や画像処理装置１８を構成する各構成要素は、ＣＰＵ、ＲＡＭ、ＲＯＭを含んで構成されたコンピュータによって実現される。すなわちＣＰＵが、ＲＯＭや所定の記憶装置に記憶されたプログラムを実行することにより上記各構成要素の機能を実現し、以下に説明する処理が行なわれる。また、各構成要素を別々のコンピュータで構成してもよいし、１つのコンピュータで構成してもよい。 Each component constituting the image capturing device 16 and the image processing device 18 described above is realized by a computer including a CPU, a RAM, and a ROM. That is, the CPU realizes the functions of the above-described components by executing a program stored in a ROM or a predetermined storage device, and the processing described below is performed. Moreover, each component may be comprised by a separate computer, and may be comprised by one computer.

次に、本実施の形態で実行されるマスク装着判定処理の詳細を説明する。 Next, details of the mask mounting determination process executed in the present embodiment will be described.

図４は、対象物判定装置１０の画像取り込み装置１６及び画像処理装置１８により行なわれるマスク装着判定処理の流れを示すフローチャートである。 FIG. 4 is a flowchart showing the flow of the mask attachment determination process performed by the image capturing device 16 and the image processing device 18 of the object determination device 10.

まず、ステップＳ１００では、画像取り込み装置１６が、カメラ１４で撮像されて得られた画像データを取り込んで一旦画像メモリに記憶する。そして画像メモリから画像データを読みだして画像処理装置１８の顔領域検出部２０に入力する。ステップＳ１０２では、顔領域検出部２０が、画像取り込み装置１６から入力された画像データが表す画像から、マスク／非マスク顔反応ＮＮを用いて顔領域を検出する。 First, in step S100, the image capturing device 16 captures image data obtained by imaging with the camera 14 and temporarily stores it in the image memory. Then, the image data is read from the image memory and input to the face area detection unit 20 of the image processing device 18. In step S102, the face area detection unit 20 detects a face area from the image represented by the image data input from the image capturing device 16 using the mask / non-mask face reaction NN.

ステップＳ１０４では、尤度算出部２２が、顔領域検出部２０で検出された顔領域に対してマスク／非マスク顔反応ＮＮを適用して第１尤度値ｐ１を算出する。前述したように、マスク／非マスク顔反応ＮＮは、マスクの装着の有無に拘わらず顔画像に対しては１が出力されるように、それ以外の画像に対しては０が出力されるように学習されているため、第１尤度値ｐ１は０から１の範囲の数値となる。 In step S104, the likelihood calculating unit 22 calculates the first likelihood value p1 by applying the mask / non-mask face reaction NN to the face area detected by the face area detecting unit 20. As described above, in the mask / non-mask face reaction NN, 1 is output for a face image regardless of whether or not a mask is attached, and 0 is output for other images. Therefore, the first likelihood value p1 is a numerical value in the range of 0 to 1.

ステップＳ１０６では、尤度算出部２２が、顔領域検出部２０で検出された顔領域に対してマスク顔反応ＮＮを適用して第２尤度値ｐ２を算出する。前述したように、マスク顔反応ＮＮは、マスクを装着した顔画像に対しては１が出力されるように、それ以外の画像に対しては０が出力されるように学習されているため、第２尤度値ｐ２は０から１の範囲の数値となり、マスクを装着した顔画像に適用したときの第２尤度値ｐ２のほうがマスクを装着していない顔画像に適用したときの第２尤度値ｐ２に比べて値が大きくなることが期待できる。 In step S 106, the likelihood calculating unit 22 calculates the second likelihood value p 2 by applying the mask face reaction NN to the face area detected by the face area detecting unit 20. As described above, the mask face reaction NN is learned so that 1 is output for a face image wearing a mask, and 0 is output for other images. The second likelihood value p2 is a numerical value in the range of 0 to 1, and the second likelihood value p2 when applied to a face image wearing a mask is the second value when applied to a face image not wearing a mask. It can be expected that the value becomes larger than the likelihood value p2.

ステップＳ１０８では、尤度算出部２２が、顔領域検出部２０で検出された顔領域に対して非マスク顔反応ＮＮを適用して第３尤度値ｐ３を算出する。前述したように、非マスク顔反応ＮＮは、マスクを装着していない顔画像に対しては１が出力されるように、それ以外の画像に対しては０が出力されるように学習されているため、第３尤度値ｐ３も０から１の範囲の数値となり、マスクを装着していない顔画像に適用したときの第３尤度値ｐ３のほうがマスクを装着している顔画像に適用したときの第３尤度値ｐ３に比べて値が大きくなることが期待できる。 In step S108, the likelihood calculating unit 22 calculates the third likelihood value p3 by applying the non-masked face reaction NN to the face area detected by the face area detecting unit 20. As described above, the non-mask face reaction NN is learned so that 1 is output for a face image without a mask and 0 is output for other images. Therefore, the third likelihood value p3 is also a numerical value in the range of 0 to 1, and the third likelihood value p3 when applied to the face image not wearing the mask is applied to the face image wearing the mask. It can be expected that the value becomes larger than the third likelihood value p3 at that time.

ステップＳ１１０では、判定部２４が、第１尤度値ｐ１、第２尤度値ｐ２、および第３尤度値ｐ３に基づいて、対象者１２がマスクを装着しているか否かを判定する。 In step S110, the determination unit 24 determines whether the subject 12 is wearing a mask based on the first likelihood value p1, the second likelihood value p2, and the third likelihood value p3.

対象とする顔画像がマスクを装着している場合には、第２尤度値ｐ２は１に近い値となり、第３尤度値ｐ３は０に近い値になるものと期待できる。一方、第１尤度値Ｐｌはマスクを装着した顔画像と装着していない顔画像の両方に反応するように学習したマスク／非マスク顔反応ＮＮの出力であるため、第２尤度値ｐ２ほどには高い値ではないが（または第２尤度値ｐ２と同じ程度の値になる可能性もある）、第３尤度値ｐ３に比べれば大きな値になるものと期待できる。そこで、判定部２４は、第１尤度値ｐ１、第２尤度値ｐ２、第３尤度値ｐ３が下記式（１）に示した大小関係を満たす場合は、検出された顔はマスクを装着していると判定する。 If the target face image is wearing a mask, the second likelihood value p2 can be expected to be close to 1, and the third likelihood value p3 can be expected to be close to 0. On the other hand, since the first likelihood value Pl is an output of the mask / non-mask face reaction NN learned to react to both the face image with the mask and the face image without the mask, the second likelihood value p2 Although not as high as possible (or possibly as high as the second likelihood value p2), it can be expected to be larger than the third likelihood value p3. Therefore, when the first likelihood value p1, the second likelihood value p2, and the third likelihood value p3 satisfy the magnitude relationship shown in the following equation (1), the determination unit 24 masks the detected face. It is determined that it is attached.

ｐ２≧ｐ１＞ｐ３・・・（１） p2 ≧ p1> p3 (1)

一方、対象とする顔画像がマスクを装着していない場合には、第２尤度値ｐ２は０に近い値であり、第３尤度値ｐ３は１に近い値になり、第１尤度値Ｐｌについては前述の場合と同様に第３尤度値ｐ３ほどには高い値ではないが（または第３尤度値ｐ３と同じ程度の値になる可能性もある）、第２尤度値ｐ２に比べれば大きな値になるものと期待できる。そこで、判定部２４は、第１尤度値ｐ１、第２尤度値ｐ２、第３尤度値ｐ３が下記式（２）に示した大小関係を満たす場合は、検出された顔はマスクを装着していないと判定する。 On the other hand, if the target face image is not wearing a mask, the second likelihood value p2 is close to 0, the third likelihood value p3 is close to 1, and the first likelihood The value Pl is not as high as the third likelihood value p3 as in the case described above (or may be the same value as the third likelihood value p3), but the second likelihood value. It can be expected to be a large value compared to p2. Therefore, when the first likelihood value p1, the second likelihood value p2, and the third likelihood value p3 satisfy the magnitude relationship shown in the following formula (2), the determination unit 24 masks the detected face. It is determined that it is not installed.

ｐ３≧ｐ１＞ｐ２・・・（２） p3 ≧ p1> p2 (2)

また、第１尤度値ｐ１、第２尤度値ｐ２、および第３尤度値ｐ３の大小関係が、式（１）および（２）のどちらも満たさない場合には、マスクの有無を判定するには十分な確度がないと判断しで、判定を保留する。 Also, if the magnitude relationship among the first likelihood value p1, the second likelihood value p2, and the third likelihood value p3 does not satisfy either of the expressions (1) and (2), the presence / absence of a mask is determined. Therefore, it is determined that there is not sufficient accuracy, and the determination is suspended.

このようにマスクの装着の有無に関わらず顔画像に反応するマスク／非マスク顔反応ＮＮと、マスクを装着した顔画像のみに反応するマスク顔反応ＮＮと、マスクを装着していない顔画像のみに反応する非マスク顔反応ＮＮの３つを用いて３つの尤度値を求め、３つの尤度値の大小を比較することで、より確からしい判定をすることができ誤検出を低く抑えることができる。 Thus, a mask / non-mask face reaction NN that reacts to a face image regardless of whether or not a mask is attached, a mask face reaction NN that reacts only to a face image with a mask attached, and only a face image without a mask attached By using three non-masked face reactions NN that react to 3 and obtaining three likelihood values, comparing the likelihood values of the three likelihood values makes it possible to make a more probable decision and to keep false detection low. Can do.

マスク装着判定処理の判定結果は、例えば、顔状態の推定処理を行なう場合に利用できる。例えば、マスクを装着していると判定された場合には、画像から口や鼻の特徴量の検出は困難であるため、他の特徴量を用いた顔状態の推定処理を行なえば良く、マスクを装着している場合としていない場合とで効率的に処理手順を切り替えることで安定した顔状態推定が実現できる。 The determination result of the mask wearing determination process can be used, for example, when a face state estimation process is performed. For example, when it is determined that the mask is worn, it is difficult to detect the feature amount of the mouth and nose from the image. Therefore, it is only necessary to perform a face state estimation process using another feature amount. Stable face state estimation can be realized by efficiently switching the processing procedure depending on whether or not the camera is worn.

なお、ここでは、第１尤度値ｐ１、第２尤度値ｐ２、および第３尤度値ｐ３の大小関係に基づいてマスク装着の有無を判定する例について説明したが、第１尤度値ｐ１と第２尤度値ｐ２との比、および第１尤度値ｐ１と第３尤度値ｐ３との比に基づいてマスク装着の有無を判定するようにしてもよい。 In addition, although the example which determines the presence or absence of mask wearing based on the magnitude relationship of the 1st likelihood value p1, the 2nd likelihood value p2, and the 3rd likelihood value p3 was demonstrated here, the 1st likelihood value The presence / absence of wearing a mask may be determined based on the ratio between p1 and the second likelihood value p2 and the ratio between the first likelihood value p1 and the third likelihood value p3.

具体的には、まず、第１尤度値ｐ１と第２尤度値ｐ２との比ｒ１を以下の式（３）により求め、第１尤度値ｐ１と第３尤度値ｐ３との比ｒ２を以下の式（４）により求める。
ｒ１＝第２尤度値ｐ２／第１尤度値ｐ１・・・（３）
ｒ２＝第１尤度値ｐ１／第３尤度値ｐ３・・・（４） Specifically, first, the ratio r1 between the first likelihood value p1 and the second likelihood value p2 is obtained by the following equation (3), and the ratio between the first likelihood value p1 and the third likelihood value p3 is calculated. r2 is obtained by the following equation (4).
r1 = second likelihood value p2 / first likelihood value p1 (3)
r2 = first likelihood value p1 / third likelihood value p3 (4)

そして、以下の関係式（５）を満たすか否かを判断する。
ｒ１＞ｔｈ１かつｒ２＞ｔｈ２・・・（５） Then, it is determined whether or not the following relational expression (5) is satisfied.
r1> th1 and r2> th2 (5)

ここでｔｈ１、ｔｈ２は判定のための閾値であり、０＜ｔｈ１≦ｔｈ２の関係を有する。関係式（５）を満たす場合には、検出された顔画像はマスクを装着している顔と判定する。 Here, th1 and th2 are threshold values for determination, and have a relationship of 0 <th1 ≦ th2. If the relational expression (5) is satisfied, the detected face image is determined to be a face wearing a mask.

同様に、第１尤度値ｐ１と第２尤度値ｐ２との比ｒ３を以下の式（６）により求め、第１尤度値ｐ１と第３尤度値ｐ３との比ｒ４を以下の式（７）により求める。
ｒ３＝第３尤度値ｐ３／第１尤度値ｐ１・・・（６）
ｒ４＝第１尤度値ｐ１／第２尤度値ｐ２・・・（７） Similarly, the ratio r3 between the first likelihood value p1 and the second likelihood value p2 is obtained by the following equation (6), and the ratio r4 between the first likelihood value p1 and the third likelihood value p3 is calculated as follows: It calculates | requires by Formula (7).
r3 = third likelihood value p3 / first likelihood value p1 (6)
r4 = first likelihood value p1 / second likelihood value p2 (7)

そして、以下の関係式（５）を満たすか否かを判断する。
ｒ３＞ｔｈ３かつｒ４＞ｔｈ４・・・（８） Then, it is determined whether or not the following relational expression (5) is satisfied.
r3> th3 and r4> th4 (8)

ここで、ｔｈ３、ｔｈ４は、判定のための閾値であり、０＜ｔｈ３≦ｔｈ４の関係を有する。関係式（８）を満たす場合には、検出された顔画像はマスクを装着していない顔と判定する。 Here, th3 and th4 are thresholds for determination, and have a relationship of 0 <th3 ≦ th4. When the relational expression (8) is satisfied, the detected face image is determined as a face not wearing a mask.

なお、（５）、（８）の関係式のいずれにも該当しない場合には、マスクの有無を判定するには十分な確度がないと判断して、判定を保留するようにしてもよい。 If none of the relational expressions (5) and (8) correspond, it may be determined that there is not sufficient accuracy to determine the presence / absence of a mask, and the determination may be suspended.

このように、比を用いても、上記と同様にマスク装着の有無を信頼性高く判定することができる。 Thus, even if the ratio is used, whether or not the mask is mounted can be determined with high reliability as described above.

なお、本実施の形態では、マスクの装着の有無を判定する例について説明したが、サングラスの装着の有無を判定する場合であっても、上記実施の形態と同様に処理すればよい。すなわち、サングラスの装着の有無に拘わらず顔画像に反応するＮＮと、サングラスを装着している顔画像のみに反応するＮＮと、サングラスを装着していない顔画像のみに反応するＮＮとを用意しておき、この３つのＮＮを用いて３つの尤度値を求め、判定に用いることで、同様の効果を得ることができる。 In the present embodiment, an example of determining whether or not a mask is mounted has been described. However, even if it is determined whether or not sunglasses are mounted, the same processing as in the above embodiment may be performed. In other words, there are prepared an NN that reacts to a face image regardless of whether or not sunglasses are worn, an NN that reacts only to a face image wearing sunglasses, and an NN that reacts only to a face image not wearing sunglasses. The same effect can be obtained by obtaining three likelihood values using these three NNs and using them for the determination.

また、マスク及びサングラスの両方の装着の有無を判定するように構成してもよい。すなわち、マスクとサングラスの装着の有無に拘わらず顔画像に反応するＮＮと、マスクとサングラスの両方を装着している顔画像のみに反応するＮＮと、マスクとサングラスの両方を装着していない顔画像のみに反応するＮＮとを用意しておき、この３つのＮＮを用いて３つの尤度値を求め、判定に用いることで、同様の効果を得ることができる。 Moreover, you may comprise so that the presence or absence of mounting | wearing of both a mask and sunglasses may be determined. That is, an NN that reacts to a face image regardless of whether or not a mask and sunglasses are worn, an NN that reacts only to a face image that wears both a mask and sunglasses, and a face that does not wear both a mask and sunglasses Similar effects can be obtained by preparing an NN that reacts only to an image, obtaining three likelihood values using these three NNs, and using them for determination.

また、上記実施の形態では、マスク装着の有無の判定に異なる３つのＮＮを用いる例について説明したが、これに限定されず、サポートベクターマシンなど、ＮＮ以外の他の機械学習システムを用いてもよい。 In the above-described embodiment, an example in which three different NNs are used to determine whether or not a mask is attached has been described. However, the present invention is not limited to this, and a machine learning system other than NN such as a support vector machine may be used. Good.

また、上記実施の形態では、顔全体の画像に対してＮＮを適用してマスクの装着を判定する例について説明したが、顔の下半分の領域の画像のみを用いてマスクの装着の判定を行なうようにしてもよい。以下、顔の下半分の領域の画像のみを用いてマスクの装着の判定を行なう変形例について説明する。 In the above-described embodiment, the example in which the NN is applied to the entire face image to determine the wearing of the mask has been described. However, the wearing of the mask is determined using only the image of the lower half of the face. You may make it perform. Hereinafter, a modified example in which the mask wearing determination is performed using only the image of the lower half area of the face will be described.

この変形例の対象物判定装置の構成は、上記実施の形態の対象物判定装置１０と同様の構成とする。ただし、以下の構成要素は以下に説明するように機能する。 The configuration of the object determination device of this modification is the same as that of the object determination device 10 of the above embodiment. However, the following components function as described below.

画像処理装置１８の顔領域検出部２０は、上記実施の形態と同様に、カメラ１４で撮影されて得られた画像データが表す画像から、対象者１２の顔の領域を検出する。そして、更に、検出された顔領域を上下半分に分割したときの下半分の領域の画像の画像データを切り出して尤度算出部２２に出力する。下半分の領域は、対象者１２がマスクを装着していると仮定した場合に、顔画像の中で該マスクが存在すると推定される領域である。 The face area detection unit 20 of the image processing device 18 detects the face area of the subject 12 from the image represented by the image data obtained by being photographed by the camera 14 as in the above embodiment. Further, image data of an image in the lower half area when the detected face area is divided into upper and lower halves is cut out and output to the likelihood calculating unit 22. The lower half area is an area where it is estimated that the mask exists in the face image when it is assumed that the subject 12 wears the mask.

尤度算出部２２は、顔領域検出部２０で検出され切り出された画像データが示す画像に対して、マスクの装着の有無に拘わらず顔らしさを示す第１尤度値ｐ１と、マスクを装着した顔らしさを示す第１尤度値ｐ２と、マスクを装着していない顔らしさを示す第３尤度値ｐ３とを算出する。 The likelihood calculation unit 22 attaches a first likelihood value p1 indicating the likelihood of a face to the image indicated by the image data detected and cut out by the face region detection unit 20 regardless of whether or not the mask is attached, and the mask. A first likelihood value p2 indicating the likelihood of a face and a third likelihood value p3 indicating the likelihood of a face not wearing a mask are calculated.

ただし、第１尤度値ｐ１、第２尤度値ｐ２、および第３尤度値ｐ３を求めるためのマスク／非マスク顔反応ＮＮ、マスク顔反応ＮＮ、および非マスク顔反応ＮＮは、マスク非装着画像の下半分の画像と、マスク装着画像の下半分の画像とを用いて学習させる。 However, the mask / non-mask face reaction NN, the mask face reaction NN, and the non-mask face reaction NN for obtaining the first likelihood value p1, the second likelihood value p2, and the third likelihood value p3 are not masked. Learning is performed using the lower half image of the wearing image and the lower half image of the mask wearing image.

すなわち、マスク／非マスク顔反応ＮＮは、バックプロパゲーションの方法を用いて、図５（Ａ）において破線で示したように、マスクを装着していない多数の顔画像の下半分の画像や、図５（Ｂ）において破線で示したように、マスクを装着している多数の顔画像の下半分の画像に対して、教師信号を１として興奮学習を行なうと同時に、顔と全く異なる非顔画像に対して、教師信号を０として抑制学習を行なう。 That is, the mask / non-mask face reaction NN uses the back-propagation method, as shown by the broken line in FIG. As indicated by a broken line in FIG. 5B, excitement learning is performed with a teacher signal as 1 for the lower half of many facial images wearing masks, and at the same time, a non-face completely different from the face The suppression learning is performed on the image with the teacher signal set to 0.

また、マスク顔反応ＮＮも、バックプロパゲーションの方法を用いて、図５（Ｂ）において破線で示したように、マスクを装着している多数の顔画像の下半分の画像に対して、教師信号を１として興奮学習を行なうと同時に、図５（Ａ）において破線で示したように、マスクを装着していない多数の顔画像の下半分の画像や非顔画像に対して、教師信号を０として抑制学習を行なう。 In addition, the mask face reaction NN is also applied to the lower half of many face images wearing masks by using the back-propagation method as indicated by the broken line in FIG. At the same time as excitement learning is performed with a signal of 1, teacher signals are applied to the lower half images and non-face images of a large number of face images not wearing masks, as shown by the broken lines in FIG. Suppression learning is performed as 0.

また、非マスク顔反応ＮＮも、バックプロパゲーションの方法を用いて、図５（Ａ）において破線で示したように、マスクを装着していない多数の顔画像の下半分の画像に対して、教師信号を１として興奮学習を行なうと同時に、図５（Ｂ）において破線で示したように、マスクを装着している多数の顔画像の下半分の画像や非顔画像に対して、教師信号を０として抑制学習を行なう。 Further, the non-mask face reaction NN also uses the back-propagation method, as shown by the broken line in FIG. At the same time as performing the excitement learning with the teacher signal as 1, the teacher signal is applied to the lower half images and non-face images of the many face images wearing masks as shown by the broken lines in FIG. Suppression learning is performed with 0 being 0.

なお、顔領域検出部２０が顔領域を検出する際に用いるＮＮについては、顔全体の領域を検出するため、マスクの装着の有無に拘わらず顔画像全体に反応するように学習を行なう。従って、この変形例では、顔領域検出部２０が用いるＮＮと、尤度算出部２２で第１尤度値ｐ１を算出するために用いるＮＮとを個別に用意して学習させておく。 Note that the NN used when the face area detection unit 20 detects the face area is learned so that it reacts to the entire face image regardless of whether or not a mask is attached in order to detect the entire face area. Therefore, in this modified example, the NN used by the face area detection unit 20 and the NN used by the likelihood calculation unit 22 to calculate the first likelihood value p1 are separately prepared and learned.

図６（Ａ）は、本変形例における画像取り込み装置１６及び画像処理装置１８により行なわれるマスク装着判定処理の流れを示すフローチャートである。 FIG. 6A is a flowchart showing the flow of the mask attachment determination process performed by the image capturing device 16 and the image processing device 18 in this modification.

ステップＳ２００〜Ｓ２０２は、カメラ１４で撮影されて得られた画像データを取り込んで顔領域を検出する処理であり（図６（Ｂ）参照）、上記ステップＳ１００〜Ｓ１０２と同様に処理するため、説明を省略する。 Steps S200 to S202 are processing for detecting the face area by taking in the image data obtained by the camera 14 (see FIG. 6B), and will be described in the same manner as steps S100 to S102. Is omitted.

ステップＳ２０４では、顔領域検出部２０は、画像取り込み装置１６から入力された画像データが表す画像から上記検出した対象者１２の顔の領域を上下半分に分割したときの下半分の領域の画像データを切り出して尤度算出部２２に出力する（図６（Ｃ）参照）。 In step S 204, the face area detection unit 20 divides the detected face area of the target person 12 into upper and lower halves from the image represented by the image data input from the image capturing device 16, and the lower half area image data. Is output to the likelihood calculating unit 22 (see FIG. 6C).

ステップＳ２０６では、尤度算出部２２が、マスク／非マスク顔反応ＮＮを、顔領域検出部２０で切り出された下半分の領域の画像データが示す画像に適用して第１尤度値ｐ１を算出する。 In step S206, the likelihood calculation unit 22 applies the mask / non-mask face reaction NN to the image indicated by the image data of the lower half region cut out by the face region detection unit 20, and uses the first likelihood value p1. calculate.

ステップＳ２０８では、尤度算出部２２が、マスク顔反応ＮＮを、顔領域検出部２０で切り出された下半分の領域の画像データが示す画像に適用して第２尤度値ｐ２を算出する。 In step S208, the likelihood calculating unit 22 applies the mask face reaction NN to the image indicated by the image data of the lower half region cut out by the face region detecting unit 20, and calculates the second likelihood value p2.

ステップＳ２１０では、尤度算出部２２が、非マスク顔反応ＮＮを、顔領域検出部２０で切り出された下半分の領域の画像データが示す画像に適用して第３尤度値ｐ３を算出する。 In step S210, the likelihood calculating unit 22 applies the non-masked face reaction NN to the image indicated by the image data of the lower half region cut out by the face region detecting unit 20, and calculates the third likelihood value p3. .

ステップＳ２１２では、判定部２４が、第１尤度値ｐ１、第２尤度値ｐ２、第３尤度値ｐ３に基づいて、対象者１２がマスクを装着しているか否かを判定する。判定の方法は、上記ステップＳ１１０と同様であるため説明を省略する。 In step S212, the determination unit 24 determines whether the subject 12 is wearing a mask based on the first likelihood value p1, the second likelihood value p2, and the third likelihood value p3. Since the determination method is the same as that in step S110, description thereof is omitted.

このように、顔の下半分の領域の画像のみを用いてマスクの装着の判定を行なうことにより、処理時間を短縮することができる。 In this way, the processing time can be shortened by determining whether or not the mask is mounted using only the image of the lower half of the face.

なお、この変形例では、マスクの装着の有無を判定する場合について説明したが、サングラス装着の有無を判定するようにしてもよい。この場合には、上記変形例と同じ手順で顔画像の上半分の領域を切り出して、顔画像の上半分の領域を元に学習したサングラス顔／非サングラス顔反応ＮＮ、サングラス顔反応ＮＮ、非サングラス顔対応ＮＮを用いて、サングラス装着の判定を行なう。 In this modification, the case where the presence / absence of wearing a mask is determined has been described, but the presence / absence of wearing sunglasses may be determined. In this case, the upper half area of the face image is cut out in the same procedure as in the above-described modification, and the sunglasses face / non-sunglass face reaction NN, the sunglasses face reaction NN, the non-sunglass face reaction NN, which are learned based on the upper half area of the face image. The sunglasses wearing correspondence NN is used to determine whether to wear sunglasses.

なお、上記実施の形態や変形例では、第１尤度値ｐ１、第２尤度値ｐ２、第３尤度値ｐ３の順に尤度値を算出する例について説明したが、３つの尤度値の算出順序は特に限定されない。 In the above-described embodiment and modification, an example in which likelihood values are calculated in the order of the first likelihood value p1, the second likelihood value p2, and the third likelihood value p3 has been described. The calculation order is not particularly limited.

なお、上記実施の形態及び変形例では、人物の顔にマスク等が装着されているか否かを判定する例について説明したが、これに限定されず、様々な対象物について本発明の適用が可能である。 In the above-described embodiment and modification, an example of determining whether a mask or the like is mounted on a person's face has been described. However, the present invention is not limited to this, and the present invention can be applied to various objects. It is.

本発明の実施の形態に係る対象物判定装置の概略構成図である。It is a schematic block diagram of the target object determination apparatus which concerns on embodiment of this invention. 画像処理装置の機能構成図である。It is a functional block diagram of an image processing apparatus. （Ａ）は、マスクを装着していない顔画像の一例を示す図であり、（Ｂ）は、マスクを装着している顔画像の一例を示す図である。(A) is a figure which shows an example of the face image which is not wearing the mask, (B) is a figure which shows an example of the face image which is wearing the mask. 対象物判定装置の画像取り込み装置及び画像処理装置により行なわれるマスク装着判定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the mask mounting | wearing determination process performed by the image capture device and image processing apparatus of a target object determination apparatus. （Ａ）は、マスクを装着していない顔画像の下半分を切り出す場合の切り出し領域の例を示す図であり、（Ｂ）は、マスクを装着している顔画像の下半分を切り出す場合の切り出し領域の例を示す図である。(A) is a figure which shows the example of the cut-out area | region in the case of cutting out the lower half of the face image which is not wearing the mask, (B) is the case where the lower half of the face image which is wearing the mask is cut out It is a figure which shows the example of a cut-out area | region. （Ａ）は、変形例における画像取り込み装置及び画像処理装置により行なわれるマスク装着判定処理の流れを示すフローチャートであり、（Ｂ）は、顔領域の検出結果の一例を示した図であり、（Ｃ）は、顔領域の画像から切り出した下半分の領域の画像を示す図である。(A) is a flowchart showing a flow of mask wearing determination processing performed by an image capturing device and an image processing device in a modified example, and (B) is a diagram showing an example of a face area detection result; C) is a diagram showing an image of a lower half area cut out from an image of a face area.

Explanation of symbols

１０対象物判定装置
１２対象者
１４カメラ
１６画像取り込み装置
１８画像処理装置
２０顔領域検出部
２２尤度算出部
２４判定部 DESCRIPTION OF SYMBOLS 10 Target object determination apparatus 12 Subject person 14 Camera 16 Image capturing apparatus 18 Image processing apparatus 20 Face area detection part 22 Likelihood calculation part 24 Determination part

Claims

An imaging means for imaging the object;
For an object represented by an image obtained by imaging with the imaging means, an image obtained by imaging with the imaging means, an image of an object with an attached object, and an object without the attached object The first likelihood value indicating the likelihood of an object is calculated using a machine learning system that has learned to respond to the image of the image, and the image obtained by being picked up by the image pickup means and the object to which the attachment is attached Calculating a second likelihood value indicating the likelihood of the object wearing the wearing object using a machine learning system learned to react to an image of the object, and an image obtained by being imaged by the imaging unit; A likelihood value for calculating a third likelihood value indicating the likelihood of the object not wearing the attachment using a machine learning system learned to react to an image of the object not wearing the attachment A calculation means;
Based on the first likelihood value, the second likelihood value, and the third likelihood value calculated by the likelihood value calculation means, the object represented by the image captured by the imaging means is the wearing Determining means for determining whether or not an object is attached;
An object determination apparatus including:

An imaging means for imaging the object;
An extraction means for extracting a predetermined area that is estimated to be present when the object is assumed to be mounted from the image captured by the imaging means;
For the object represented by the image obtained by the imaging means, the image of the area extracted by the extraction means, the image of the predetermined area in the image of the object to which the attachment is attached, and the Calculating a first likelihood value indicating the likelihood of an object using a machine learning system that has learned to react to an image of the predetermined region in an image of an object that is not mounted; An object wearing the attachment using an image of the region extracted by the means and a machine learning system learned to react to the image of the predetermined region in the image of the object wearing the attachment A second likelihood value indicating physicality is calculated and reacts to the image of the region extracted by the extraction means and the image of the predetermined region in the image of the object not wearing the attachment. And likelihood value calculation means for calculating a third likelihood value indicating object likeness not wearing the wearable object by using the learned machine learning system as,
Based on the first likelihood value, the second likelihood value, and the third likelihood value calculated by the likelihood value calculation means, the object represented by the image captured by the imaging means is the wearing Determining means for determining whether or not an object is attached;
An object determination apparatus including:

The determination unit is configured such that an object represented by an image captured by the imaging unit is based on a magnitude relationship between the first likelihood value, the second likelihood value, and the third likelihood value. The object determination apparatus according to claim 1, wherein it is determined whether or not an object is mounted.

The determination means is obtained by being imaged by the imaging means based on a ratio between the first likelihood value and the second likelihood value and a ratio between the first likelihood value and the third likelihood value. The target object determination apparatus according to claim 1, wherein the target object represented by the displayed image determines whether the target object is mounted.

On the computer,
For an object represented by an image obtained by imaging with an imaging means for imaging the object, an image obtained by imaging with the imaging means, an image of the object with the attached object, and the attached object A first likelihood value calculating step of calculating a first likelihood value indicating the likelihood of an object using a machine learning system that has learned to react to an image of an object that is not mounted;
A machine learning system in which an object represented by an image captured by the image capturing unit is learned to react to an image captured by the image capturing unit and an image of the object mounted with the mounted object. A second likelihood value calculating step for calculating a second likelihood value indicating the likelihood of the object wearing the wearing object using
For the object represented by the image captured by the imaging unit, the machine learned to react to the image captured by the imaging unit and the image of the object not mounted with the mounted object A third likelihood value calculating step of calculating a third likelihood value indicating the likelihood of an object not wearing the wearing object using a learning system;
Based on the calculated first likelihood value, second likelihood value, and third likelihood value, is the object represented by the image captured by the imaging means wearing the wearing object? A determination step for determining whether or not;
A program for running

On the computer,
Based on an image obtained by imaging with an imaging means for imaging the object, a predetermined region that is estimated to be present when the object is mounted is assumed when the object is mounted. An extraction step to extract;
For the object represented by the image captured by the imaging means, the image of the extracted area, the image of the predetermined area in the image of the object to which the wearing object is attached, and the wearing object First likelihood value calculation for calculating a first likelihood value indicating the likelihood of an object using a machine learning system that has learned to react to an image of the predetermined region in an image of the object that is not attached Steps,
The object represented by the image captured by the imaging unit is responsive to the image of the extracted area and the image of the predetermined area in the image of the object to which the wearing object is attached. A second likelihood value calculating step of calculating a second likelihood value indicating the likelihood of the object wearing the wearing object using the learned machine learning system;
The object represented by the image captured by the imaging means reacts to the image of the extracted area and the image of the predetermined area in the image of the object not wearing the attachment. A third likelihood value calculating step of calculating a third likelihood value indicating the likelihood of an object not wearing the attachment using the machine learning system learned as described above,
Based on the calculated first likelihood value, second likelihood value, and third likelihood value, is the object represented by the image captured by the imaging means wearing the wearing object? A determination step for determining whether or not;
A program for running