JP2021111228A

JP2021111228A - Learning device, learning method, and program

Info

Publication number: JP2021111228A
Application number: JP2020003945A
Authority: JP
Inventors: 孝嗣牧田; Takatsugu Makita; 貴久山本; Takahisa Yamamoto; 敦夫野本; Atsuo Nomoto
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2021-08-02

Abstract

To enhance quality for expansion data generated by expanding data, for learning.SOLUTION: A model is trained to output a determination result against inputted data. An edition parameter for a plurality of learning data obtained by editing original data expressing determination targets is determined for each of the original data. Based on the parameter, the plurality of learning data expressing each of the determination subjects is generated, from the original data. Each of the plurality of learning data is used to train the model.SELECTED DRAWING: Figure 1

Description

本発明は、学習装置、学習方法、及びプログラムに関し、特に機械学習のために用いる学習用データを拡張する技術に関する。 The present invention relates to learning devices, learning methods, and programs, and particularly to techniques for extending learning data used for machine learning.

データの検出若しくは分類、又はデータに基づく認証などのタスクを行うために、ニューラルネットワーク等のモデルに学習用データを用いて学習させる方法が知られている。一般的に、十分な性能を達成するためには多量の学習用データが必要となるため、学習用データを効率的に生成する方法が求められている。 A method of training a model such as a neural network using training data is known in order to perform tasks such as data detection or classification or data-based authentication. In general, a large amount of learning data is required to achieve sufficient performance, and therefore, a method for efficiently generating learning data is required.

疑似的に学習用データを増やす方法は、「データ拡張」、「データオーグメンテーション」、又は「データ水増し」等の表現で呼ばれている（以下では「データ拡張」と呼ぶ）。例えば特許文献１は、学習に用いる学習画像に対して、上下左右対称変換を行うこと、及び画像回転を行うことにより、データ拡張を行う方法を提案している。 The method of increasing the learning data in a pseudo manner is called by expressions such as "data expansion", "data augmentation", or "data padding" (hereinafter referred to as "data expansion"). For example, Patent Document 1 proposes a method of performing data expansion by performing vertical-left-right symmetry conversion and image rotation on a learning image used for learning.

また、非特許文献１は、一般物体を検出するタスクにおいて、２５６×２５６画素の大きさの学習画像を用意し、学習画像から２２４×２２４画素の大きさの部分領域を切り出すことでデータ拡張を行う方法を提案している。 Further, in Non-Patent Document 1, in the task of detecting a general object, a learning image having a size of 256 × 256 pixels is prepared, and a partial area having a size of 224 × 224 pixels is cut out from the learning image to expand the data. I'm proposing a way to do it.

特開２００４―３６１０９２号公報Japanese Unexamined Patent Publication No. 2004-361092

K. Chatfield et al. "Return of the Devil in the Details: Delving Deep into Convolutional Nets", arXiv:1405.3531v4 5th Nov 2014 (https://arxiv.org/pdf/1405.3531.pdf)K. Chatfield et al. "Return of the Devil in the Details: Delving Deep into Convolutional Nets", arXiv: 1405.331v4 5th Nov 2014 (https://arxiv.org/pdf/1405.3531.pdf)

データ拡張により生成されたデータ（以下、「拡張データ」と呼ぶ）の品質は、学習により得られるモデルの品質に影響を与える。すなわち、拡張データの品質が学習用データに求められる品質を満たしていなければ、拡張データはノイズとなり、学習の質及び効率を下げる要因となる。 The quality of the data generated by data expansion (hereinafter referred to as "extended data") affects the quality of the model obtained by training. That is, if the quality of the extended data does not meet the quality required for the learning data, the extended data becomes noise, which is a factor of lowering the quality and efficiency of learning.

従来技術では、拡張データの品質がばらつくことがあった。例えば、画像から部分領域を抽出することによりデータ拡張を行う場合、検出又は分類の対象となる物体を含まない部分領域が拡張データとして抽出され、すなわち拡張データがノイズとなってしまうことがあった。 In the prior art, the quality of extended data may vary. For example, when data expansion is performed by extracting a partial area from an image, a partial area that does not include an object to be detected or classified may be extracted as extended data, that is, the expanded data may become noise. ..

本発明は、学習のためにデータ拡張により生成される拡張データの品質を向上させることを目的とする。 An object of the present invention is to improve the quality of extended data generated by data expansion for learning.

本発明の目的を達成するために、例えば、本発明の一実施形態に係る学習装置は以下の構成を備える。すなわち、
入力されたデータに対する判定結果を出力するモデルの学習を、学習データを用いて行う学習装置であって、
判定対象を表す元データを編集して得られる複数の学習データの編集パラメータを、該元データごとに決定する決定手段と、
前記パラメータに基づいて、前記元データから、前記判定対象をそれぞれ表す複数の学習データを生成する生成手段と、
前記複数の学習データのそれぞれを用いて前記モデルの学習を行う学習手段と、
を備える。 In order to achieve the object of the present invention, for example, the learning device according to the embodiment of the present invention has the following configurations. That is,
A learning device that uses training data to train a model that outputs judgment results for input data.
A determination means for determining the editing parameters of a plurality of training data obtained by editing the original data representing the determination target for each original data, and
A generation means for generating a plurality of learning data representing each of the determination targets from the original data based on the parameters.
A learning means for learning the model using each of the plurality of learning data, and
To be equipped.

学習のためにデータ拡張により生成される拡張データの品質を向上させることができる。 It is possible to improve the quality of extended data generated by data expansion for learning.

一実施形態に係る学習装置の制御部４０１のブロック図。The block diagram of the control part 401 of the learning apparatus which concerns on one Embodiment. 一実施形態に係る学習方法の処理の流れの例を示すフローチャート。A flowchart showing an example of a processing flow of a learning method according to an embodiment. モデルの精度評価結果の例を示す図。The figure which shows the example of the accuracy evaluation result of a model. 一実施形態に係る学習装置の構成例を示すブロック図。The block diagram which shows the structural example of the learning apparatus which concerns on one Embodiment. 基本画像と拡張画像の関係を説明する図。The figure explaining the relationship between a basic image and an extended image. ブラー強度とシフト幅の関係の一例を示す図。The figure which shows an example of the relationship between the blur intensity and the shift width. 元画像と学習画像との関係を説明する図。The figure explaining the relationship between the original image and the learning image. 一実施形態で用いられるコンピュータの基本構成のブロック図。A block diagram of a basic configuration of a computer used in one embodiment. 拡張画像の見切れ領域を説明する図。The figure explaining the cut-off area of an extended image. ３枚の異なる基本画像に対応する拡張画像の例を示す図。The figure which shows the example of the extended image corresponding to three different basic images. 参照値を用いた中心点間距離の推定する手法を説明する図。The figure explaining the method of estimating the distance between center points using a reference value. シフト幅を調整するためのユーザインタフェースの一例を示す図。The figure which shows an example of the user interface for adjusting a shift width. 尤度とシフト幅の関係の一例を示す図。The figure which shows an example of the relationship between the likelihood and the shift width. 元画像に付随する参照値の例を示す図。The figure which shows the example of the reference value attached to the original image. 学習画像を除去するためのユーザインタフェースの一例を示す図。The figure which shows an example of the user interface for removing a training image.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the invention according to the claims. Although a plurality of features are described in the embodiment, not all of the plurality of features are essential to the invention, and the plurality of features may be arbitrarily combined. Further, in the attached drawings, the same or similar configurations are designated by the same reference numbers, and duplicate explanations are omitted.

［実施形態１］
本実施形態に係る学習装置は、学習に用いる元データに応じて、拡張データを生成するための編集パラメータを決定する。このような処理により生成された拡張データは、入力されたデータに対する判定結果を出力する、ニューラルネットワークのようなモデルの学習のために用いられる。 [Embodiment 1]
The learning device according to the present embodiment determines editing parameters for generating extended data according to the original data used for learning. The extended data generated by such processing is used for learning a model such as a neural network that outputs a determination result for the input data.

従来技術においては、全ての元データに対して同一の方法でデータ拡張がおこなわれていた。このため、例えば元画像間で被写体の位置が変動しているなど、元データの性質にばらつきがある場合に、ノイズとなる拡張データが生成されやすかった。一方、本実施形態においては、個々の元データの内容に応じて、データ拡張のためのパラメータが決定される。このような本実施形態によれば、ノイズとなる拡張データを減らし、拡張データの品質を向上させることができる。 In the prior art, data expansion is performed by the same method for all original data. Therefore, when the properties of the original data vary, for example, the position of the subject fluctuates between the original images, it is easy to generate extended data that becomes noise. On the other hand, in the present embodiment, the parameters for data expansion are determined according to the contents of the individual original data. According to such an embodiment, it is possible to reduce the extended data that becomes noise and improve the quality of the extended data.

以下では、判定対象の像を含む元画像のデータに対する判定結果を出力するモデルの学習を行う場合を例として説明する。元画像全体から得られるデータを「元データ」と呼んで以降の説明に用いる。このモデルは、画像に写っている人物を推定する顔認証のためのニューラルネットワークモデルであり、顔画像処理によって写っている人物の人名を推定することができる。人名は不明であっても個人ごとの識別をする推定ができる場合もある。まず、以下に説明する処理例の概要を示す。
（１）顔認証モデルの学習のために、人物の顔が写る元データである元画像を用いて、学習データの一部として使うことができる基本画像を作成する。基本画像は、元画像から切り出した部分画像である。また、顔認証モデルの評価のために、人物の顔が写る画像を用いて評価データを作成する。
（２）人物の顔が写る基本画像の位置シフトを行うことにより、拡張データである拡張画像を生成する。拡張画像は、元画像から切り出す位置を様々にシフトして生成する。こうして生成された拡張画像も学習に利用される。 In the following, a case of learning a model that outputs a judgment result for the data of the original image including the image to be judged will be described as an example. The data obtained from the entire original image is called "original data" and will be used in the following description. This model is a neural network model for face recognition that estimates a person in an image, and can estimate the person's name in the image by face image processing. Even if the person's name is unknown, it may be possible to presume to identify each individual. First, an outline of a processing example described below will be shown.
(1) For learning the face recognition model, a basic image that can be used as a part of the learning data is created by using the original image which is the original data in which the face of a person is captured. The basic image is a partial image cut out from the original image. In addition, for the evaluation of the face recognition model, evaluation data is created using an image showing a person's face.
(2) By shifting the position of the basic image in which the face of a person is captured, an extended image, which is extended data, is generated. The extended image is generated by shifting the position to be cut out from the original image in various ways. The extended image generated in this way is also used for learning.

図４は、本発明の一実施形態であり、データ拡張及びモデルの学習を行うことができる学習装置としての機能を有する、監視システムの構成例を示す。この学習装置は、入力されたデータに対する判定結果を出力するモデルに学習させることができる。図４に示す監視システムは、制御部４０１と、学習部４０２とを有する。 FIG. 4 shows a configuration example of a monitoring system according to an embodiment of the present invention, which has a function as a learning device capable of performing data expansion and learning of a model. This learning device can train a model that outputs a determination result for the input data. The monitoring system shown in FIG. 4 has a control unit 401 and a learning unit 402.

制御部４０１は、後述するように、元データから複数の学習データを生成する。例えば、制御部４０１は、データ保持部４０７から画像を受信し、受信した画像を用いて学習画像を生成し、データ保持部４０７に送信することができる。また、制御部４０１は、モデルの判定精度を評価するための評価データを生成してもよい。例えば、制御部４０１は、データ保持部４０７から受信した画像を用いて評価画像を生成し、データ保持部４０７に送信することができる。 The control unit 401 generates a plurality of learning data from the original data, as will be described later. For example, the control unit 401 can receive an image from the data holding unit 407, generate a learning image using the received image, and transmit it to the data holding unit 407. Further, the control unit 401 may generate evaluation data for evaluating the determination accuracy of the model. For example, the control unit 401 can generate an evaluation image using the image received from the data holding unit 407 and transmit it to the data holding unit 407.

学習部４０２は、複数の学習データのそれぞれを用いて、入力されたデータに対する判定結果を出力するモデルの学習を行う。例えば、学習部４０２は、データ保持部４０７から学習画像を受信し、受信した学習画像を用いて、モデルの学習を行うことができる。モデルの学習においては、学習データと、学習データに対する正しい判定結果を示す教師データと、の組を用いることができる。本実施形態においては、元データに対する判定結果を予め定めておくことができ、元データから生成された複数の学習データのそれぞれを入力されたモデルが同じ判定結果を出力するように、学習部４０２はモデルの学習を行うことができる。学習部４０２は、学習後のモデルをデータ保持部４０７に送信することができる。 The learning unit 402 trains a model that outputs a determination result for the input data by using each of the plurality of learning data. For example, the learning unit 402 can receive a learning image from the data holding unit 407 and train the model using the received learning image. In the learning of the model, a set of the learning data and the teacher data showing the correct judgment result for the learning data can be used. In the present embodiment, the determination result for the original data can be predetermined, and the learning unit 402 outputs the same determination result so that the model in which each of the plurality of learning data generated from the original data is input outputs the same determination result. Can train the model. The learning unit 402 can transmit the trained model to the data holding unit 407.

本発明の一実施形態に係る監視システムは、さらに評価部４０３を有していてもよい。評価部４０３は、評価データを用いて、学習後のモデルの判定精度を評価することができる。例えば評価部４０３は、データ保持部４０７から学習後のモデル及び評価画像を受信することができる。さらに評価部４０３は、評価画像を用いて学習後のモデルの性能評価を行い、性能評価の結果をデータ保持部４０７に送信することができる。 The monitoring system according to the embodiment of the present invention may further include an evaluation unit 403. The evaluation unit 403 can evaluate the determination accuracy of the model after training by using the evaluation data. For example, the evaluation unit 403 can receive the trained model and the evaluation image from the data holding unit 407. Further, the evaluation unit 403 can evaluate the performance of the model after training using the evaluation image, and can transmit the result of the performance evaluation to the data holding unit 407.

本発明の一実施形態に係る監視システムは、さらに表示部４０４と、操作入力部４０５と、データ保持部４０７とを有していてもよい。表示部４０４は、データ保持部４０７から学習画像を受信して表示することができる。また、表示部４０４は、学習データを生成するためのパラメータを設定するためのインタフェースを表示することができる。操作入力部４０５は、学習データを生成するためのパラメータを設定することができる。例えばユーザは、表示部４０４を見ながら、マウス操作又はタッチパネル操作等の操作を行うことができ、操作入力部４０５はユーザ操作に応じてパラメータを設定することができる。 The monitoring system according to the embodiment of the present invention may further include a display unit 404, an operation input unit 405, and a data holding unit 407. The display unit 404 can receive and display the learning image from the data holding unit 407. In addition, the display unit 404 can display an interface for setting parameters for generating training data. The operation input unit 405 can set parameters for generating learning data. For example, the user can perform operations such as mouse operation or touch panel operation while looking at the display unit 404, and the operation input unit 405 can set parameters according to the user operation.

上述の各部は、図４に示すように、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）４０６等のネットワークを介して接続することができる。 As shown in FIG. 4, each of the above-mentioned parts can be connected via a network such as a LAN (Local Area Network) 406.

なお、本実施形態において、図１及び図４に示される各処理部は、コンピュータによって実現されてもよいし、専用のハードウェアによって実現されてもよい。本実施形態においては、処理の少なくとも一部がコンピュータにより実行される。また、本発明の一実施形態に係る学習装置又は監視システムは、例えばネットワークを介して接続された複数の情報処理装置によって構成されていてもよい。 In this embodiment, each of the processing units shown in FIGS. 1 and 4 may be realized by a computer or by dedicated hardware. In this embodiment, at least part of the processing is performed by a computer. Further, the learning device or the monitoring system according to the embodiment of the present invention may be composed of, for example, a plurality of information processing devices connected via a network.

図８はコンピュータの基本構成を示す図である。図８においてプロセッサー８１０は、例えばＣＰＵであり、コンピュータ全体の動作をコントロールする。メモリ８２０は、例えばＲＡＭであり、プログラム及びデータ等を一時的に記憶する。コンピュータが読み取り可能な記憶媒体８３０は、例えばハードディスク又はＣＤ−ＲＯＭ等であり、プログラム及びデータ等を長期的に記憶する。本実施形態においては、記憶媒体８３０が格納している、各部の機能を実現するプログラムが、メモリ８２０へと読み出される。そして、プロセッサー８１０が、メモリ８２０上のプログラムに従って動作することにより、各処理部の機能が実現される。 FIG. 8 is a diagram showing a basic configuration of a computer. In FIG. 8, the processor 810 is, for example, a CPU and controls the operation of the entire computer. The memory 820 is, for example, a RAM, and temporarily stores programs, data, and the like. The computer-readable storage medium 830 is, for example, a hard disk or a CD-ROM, and stores programs, data, and the like for a long period of time. In the present embodiment, the program that realizes the functions of each part stored in the storage medium 830 is read into the memory 820. Then, the processor 810 operates according to the program on the memory 820, so that the functions of each processing unit are realized.

図８において、入力インタフェース８４０は外部の装置から情報を取得するためのインタフェースである。また、出力インタフェース８５０は外部の装置へと情報を出力するためのインタフェースである。バス８６０は、上述の各部を接続し、データのやりとりを可能とする。 In FIG. 8, the input interface 840 is an interface for acquiring information from an external device. Further, the output interface 850 is an interface for outputting information to an external device. The bus 860 connects the above-mentioned parts and enables data exchange.

図１は、制御部４０１の詳細な構成例を示すブロック図である。制御部４０１は、元画像１０４を入力として受け取り、学習画像１０６を生成して出力する。制御部４０１は、判定対象を表す元データごとに、複数の学習データ間の差異を示すパラメータを決定する決定部を有している。図１の例においては、範囲設定部１０１及びパラメータ設定部１０２が、このパラメータを決定することができる。また、制御部４０１は、パラメータに基づいて、元データから、判定対象をそれぞれ表す複数の学習データを生成する拡張部を有している。図１の例においては、画像生成部１０３及び検出部１０７が学習データを生成する拡張部として機能することができる。 FIG. 1 is a block diagram showing a detailed configuration example of the control unit 401. The control unit 401 receives the original image 104 as an input, generates a learning image 106, and outputs the learning image 106. The control unit 401 has a determination unit that determines a parameter indicating a difference between the plurality of training data for each original data representing the determination target. In the example of FIG. 1, the range setting unit 101 and the parameter setting unit 102 can determine this parameter. Further, the control unit 401 has an extension unit that generates a plurality of learning data representing each determination target from the original data based on the parameters. In the example of FIG. 1, the image generation unit 103 and the detection unit 107 can function as extension units for generating learning data.

検出部１０７は、元画像１０４から顔領域を検出する。検出部１０７は、元画像１０４から、０個又は１個の顔領域を検出することができる。ところで、モデルの学習のためには大量の学習画像が用いられる。学習画像についても、顔の位置が正確に判定されていることは重要ではあるが、学習画像の数は通常膨大であるため、作業者が目視により顔領域を設定することは困難である。そこで、検出部１０７が、元画像１０４から顔領域を検出する。例えば、検出部１０７は、画像から物体の矩形領域を尤度付きで自動検出する画像処理手法を用いて、顔領域を検出することができる。 The detection unit 107 detects the face region from the original image 104. The detection unit 107 can detect 0 or 1 face region from the original image 104. By the way, a large amount of training images are used for learning the model. It is important that the position of the face is accurately determined for the trained image as well, but since the number of trained images is usually enormous, it is difficult for the operator to visually set the face region. Therefore, the detection unit 107 detects the face region from the original image 104. For example, the detection unit 107 can detect a face region by using an image processing method that automatically detects a rectangular region of an object from an image with a likelihood.

本実施形態において検出部１０７は、Redmon (J. Redmon et al. "YOLOv3: An Incremental Improvement", arXiv:1804.02767)に記載の方法を用いて顔領域を検出するが、他の方法を用いてもよい。Redmonに開示されている方法を用いる場合、出力される、顔領域を表す矩形の個数は未知である。そこで、検出部１０７は、元画像１０４に対して出力された矩形の個数が０個である場合には、元画像１０４を学習の対象から除外することができる。また、出力された矩形の個数が１個である場合には、検出部１０７は、出力された矩形をそのまま顔領域として利用することができる。一方、出力された矩形の個数が２個以上である場合には、検出部１０７は、最も顔を表す尤度の高い矩形を１つ選択して顔領域として利用することができる。 In the present embodiment, the detection unit 107 detects the face region by using the method described in Redmon (J. Redmon et al. "YOLOv3: An Incremental Improvement", arXiv: 1804.02767), but other methods may also be used. good. When using the method disclosed in Redmon, the number of rectangles representing the face area output is unknown. Therefore, the detection unit 107 can exclude the original image 104 from the learning target when the number of rectangles output to the original image 104 is 0. Further, when the number of output rectangles is one, the detection unit 107 can use the output rectangles as they are as a face area. On the other hand, when the number of output rectangles is two or more, the detection unit 107 can select one rectangle having the highest likelihood of representing a face and use it as a face region.

図７（Ａ）は、元画像７０１、及び元画像１０４から検出された顔領域である領域７０２を表す。このような領域７０２の画像は、モデルの学習のために用いることができる。以下では、元画像（元データ）から、自動検出技術などの特定の方法で得られた、判定対象を表す１つの画像（データ）のことを、基本画像（基本データ）と呼ぶ。基本画像は、元画像から検出された判定対象の像を含む部分領域の画像である。モデルの学習のために用いる学習画像（学習データ）は、このような基本画像（基本データ）を含むことができる。なお、本実施形態に係る学習装置は基本画像（基本データ）を取得してもよく、この場合には基本画像（基本データ）を元画像（元データ）として用いることができる。一方で、本実施形態においては、学習データを増やすために、基本画像（基本データ）に加えて拡張画像（拡張データ）が生成される。例えば、図５に示すように、１つの基本画像５０１に対応する、複数の拡張画像５０２を生成することができる。 FIG. 7A represents a region 702, which is a face region detected from the original image 701 and the original image 104. The image of such a region 702 can be used for training the model. Hereinafter, one image (data) representing a determination target obtained from the original image (original data) by a specific method such as an automatic detection technique is referred to as a basic image (basic data). The basic image is an image of a partial region including an image of a determination target detected from the original image. The learning image (learning data) used for learning the model can include such a basic image (basic data). The learning device according to the present embodiment may acquire a basic image (basic data), and in this case, the basic image (basic data) can be used as the original image (original data). On the other hand, in the present embodiment, in order to increase the learning data, an extended image (extended data) is generated in addition to the basic image (basic data). For example, as shown in FIG. 5, a plurality of extended images 502 corresponding to one basic image 501 can be generated.

範囲設定部１０１は、判定対象を表す元データごとに、複数の学習データ間の差異を示すパラメータを決定する。本実施形態において、範囲設定部１０１は、複数の学習画像の間における判定対象の像の位置シフト量を示し、拡張画像を生成するために用いられる、シフト幅を設定する。このシフト幅は、後述するように、拡張画像の切り出し位置を決定するために用いられる。このため、このシフト幅は、基本画像と拡張画像との間、又は拡張画像同士の間における、判定対象を表す像の位置の差異を表すパラメータであるといえる。 The range setting unit 101 determines a parameter indicating a difference between the plurality of learning data for each original data representing the determination target. In the present embodiment, the range setting unit 101 indicates the position shift amount of the image to be determined between the plurality of learning images, and sets the shift width used for generating the extended image. This shift width is used to determine the cropping position of the extended image, as will be described later. Therefore, it can be said that this shift width is a parameter representing the difference in the position of the image representing the determination target between the basic image and the extended image, or between the extended images.

範囲設定部１０１は、元画像から検出された判定対象の像の位置と、基本画像が抽出される部分領域の位置と、の推定誤差に基づいてパラメータを決定することができる。本実施形態において、範囲設定部１０１は、検出部１０７によって検出された顔領域における、顔の中心点と顔領域の中心点との距離（以下、中心点間距離と呼ぶ）を推定し、推定された中心点間距離に応じてシフト幅を設定する。ここで、拡張画像における顔の見切れ量（顔のうち画像領域に入っていない部分の大きさ）はなるべく小さく、一方で位置シフト量はなるべく大きいことが、モデルの精度向上のために望ましいと考えられる。位置シフト量が大きいほど、顔の見切れが発生する（顔の一部が画像から消える）可能性が高くなることから、顔の見切れ量と位置シフト量はトレードオフの関係にある。同じシフト量が用いられる場合、中心点間距離が大きいほど拡張画像から顔の一部が欠落する可能性が高い。このため、範囲設定部１０１は、中心点間距離が大きいほどシフト量が小さくなるように、シフト幅を設定することができる。 The range setting unit 101 can determine the parameter based on the estimation error of the position of the image to be determined detected from the original image and the position of the partial region from which the basic image is extracted. In the present embodiment, the range setting unit 101 estimates and estimates the distance between the center point of the face and the center point of the face area (hereinafter, referred to as the distance between the center points) in the face region detected by the detection unit 107. The shift width is set according to the distance between the center points. Here, it is considered desirable to improve the accuracy of the model that the amount of cut-off of the face (the size of the part of the face that is not in the image area) in the extended image is as small as possible, while the amount of position shift is as large as possible. Be done. The larger the position shift amount, the higher the possibility that the face is cut off (a part of the face disappears from the image). Therefore, the face cut-off amount and the position shift amount are in a trade-off relationship. When the same shift amount is used, the larger the distance between the center points, the more likely it is that a part of the face will be missing from the expanded image. Therefore, the range setting unit 101 can set the shift width so that the shift amount becomes smaller as the distance between the center points becomes larger.

また、範囲設定部１０１は、元画像に含まれる判定対象の像に基づいてパラメータを決定することができる。例えば、範囲設定部１０１は、検出部１０７によって検出された顔領域におけるブラー強度を推定し、推定されたブラー強度に応じてシフト幅を設定することができる。一般的に、画像のブラー強度が高いほど、顔検出処理の難易度は高くなり、検出された顔領域の位置精度は下がる傾向がある。すなわち、ブラー強度と中心点間距離との間には正の相関があり、ブラー強度が高いほど中心点間距離が大きくなると考えることができる。 Further, the range setting unit 101 can determine the parameters based on the image of the determination target included in the original image. For example, the range setting unit 101 can estimate the blur intensity in the face region detected by the detection unit 107, and set the shift width according to the estimated blur intensity. In general, the higher the blur intensity of an image, the higher the difficulty of the face detection process, and the lower the position accuracy of the detected face region tends to be. That is, there is a positive correlation between the blur intensity and the distance between the center points, and it can be considered that the higher the blur intensity, the larger the distance between the center points.

本実施形態において、範囲設定部１０１は、西山正志ら、「顔認識におけるぼけ除去のためのPSF推定」、情報処理学会研究報告コンピュータビジョンとイメージメディア(CVIM) 2008(3(2008-CVIM-161)), p. 61-68に記載されている方法を用いてブラー強度を推定するが、他の方法を用いることもできる。ブラー強度の範囲は０から１００の間とする。そして、範囲設定部１０１は、ブラー強度が大きいほどシフト量が小さくなるように、シフト幅を設定することができる。例えば、範囲設定部１０１は、ブラー強度に反比例する大きさとなるように、シフト幅を設定することができる。本実施形態において、範囲設定部１０１は、図６（Ａ）に示される表６０１のような、予め設定されているブラー強度とシフト幅との対応関係を用いて、シフト幅を設定する。 In this embodiment, the range setting unit 101 is described by Masashi Nishiyama et al., "PSF estimation for blur removal in face recognition", IPSJ Research Report Computer Vision and Image Media (CVIM) 2008 (3 (2008-CVIM-161)). )), P. 61-68 is used to estimate the blur intensity, but other methods may be used. The range of blur intensity is between 0 and 100. Then, the range setting unit 101 can set the shift width so that the shift amount becomes smaller as the blur strength becomes larger. For example, the range setting unit 101 can set the shift width so that the magnitude is inversely proportional to the blur strength. In the present embodiment, the range setting unit 101 sets the shift width by using the correspondence relationship between the blur strength and the shift width, which is set in advance as shown in Table 601 shown in FIG. 6 (A).

パラメータ設定部１０２は、拡張画像を生成するために用いるパラメータを決定する。本実施形態の場合、パラメータ設定部１０２は、元データ（例えば顔領域のブラー強度）に応じて範囲設定部１０１が設定したシフト幅に応じて、パラメータを決定する。例えば、パラメータ設定部１０２は、元画像１０４からの拡張画像の切り出し位置を決定することができる。本実施形態の場合、パラメータ設定部１０２は、ブラー強度に応じて範囲設定部１０１が決定したシフト幅に従って、元画像１０４上で顔領域を上、下、左、又は右にシフトすることにより、拡張画像の切り出し位置を決定する。 The parameter setting unit 102 determines the parameters used to generate the extended image. In the case of the present embodiment, the parameter setting unit 102 determines the parameter according to the shift width set by the range setting unit 101 according to the original data (for example, the blur intensity of the face region). For example, the parameter setting unit 102 can determine the cutout position of the extended image from the original image 104. In the case of the present embodiment, the parameter setting unit 102 shifts the face area up, down, left, or right on the original image 104 according to the shift width determined by the range setting unit 101 according to the blur intensity. Determine the cropping position of the extended image.

パラメータ設定部１０２は、さらに、１枚の元画像からの拡張画像の生成枚数を決定することができる。本実施形態において、拡張画像の生成枚数は４とする。また、図９に示すように、拡張画像を生成するための領域９０１が、元画像７０１の外側の領域を含むことがある。このため、パラメータ設定部１０２は、及び拡張画像の見切れ領域（元画像７０１に対応するデータがない部分）の画素値の補完方法をさらに設定することができる。例えば、拡張画像のうち、元画像７０１の外側に相当する領域の画素値を、元画像７０１の画素値の値域の中央値で補完することができる。例えば、元画像７０１の画素値が正の整数で、その値域が０から２５５の範囲である場合には、画素値を１２７又は１２８に設定することができる。別の補完方法として、予め用意しておいた背景画像データを用いる方法、又は元画像７０１の外側の部分を含まないようにシフト幅を小さくする方法、などを用いることもできる。 The parameter setting unit 102 can further determine the number of extended images generated from one original image. In the present embodiment, the number of extended images generated is 4. Further, as shown in FIG. 9, the region 901 for generating the extended image may include a region outside the original image 701. Therefore, the parameter setting unit 102 can further set the method of complementing the pixel values of the cut-off region of the extended image (the portion where there is no data corresponding to the original image 701). For example, in the extended image, the pixel value of the region corresponding to the outside of the original image 701 can be complemented by the median value range of the pixel value of the original image 701. For example, when the pixel value of the original image 701 is a positive integer and its range is in the range of 0 to 255, the pixel value can be set to 127 or 128. As another complementary method, a method using background image data prepared in advance, a method of reducing the shift width so as not to include the outer portion of the original image 701, or the like can be used.

画像生成部１０３は、パラメータに基づいて、判定対象の像を含む元画像のデータから、判定対象の像をそれぞれ含む複数の学習画像のデータを生成する。本実施形態において、画像生成部１０３は、パラメータ設定部１０２が出力するパラメータを用いて、元画像１０４から拡張画像を生成する。また、画像生成部１０３は、検出部１０７による検出結果に従って、元画像１０４から基本画像を生成することができる。モデルの学習のために用いる学習画像１０６（学習データ）は、このような拡張画像（拡張データ）及び基本画像（基本データ）を含むことができる。 The image generation unit 103 generates data of a plurality of learning images including the image of the determination target from the data of the original image including the image of the determination target based on the parameters. In the present embodiment, the image generation unit 103 generates an extended image from the original image 104 by using the parameters output by the parameter setting unit 102. Further, the image generation unit 103 can generate a basic image from the original image 104 according to the detection result by the detection unit 107. The training image 106 (training data) used for training the model can include such an extended image (extended data) and a basic image (basic data).

以下、拡張画像を生成する方法の例を説明する。本実施形態において、拡張画像は、基本画像が抽出される部分領域の位置をシフト幅に従ってシフトする編集をした後の領域の画像に相当する。より具体的には、基本画像が抽出される部分領域の位置を、シフト幅に従う距離だけ元画像の垂直方向（縦方向）及び水平方向（横方向）にシフトした後の領域の画像が、拡張画像である。例えば、図７において、領域７０２は、基本画像を生成するための切り出し領域であり、検出部１０７によって検出された顔領域である。さらに、領域７０３は、拡張画像を生成するための切り出し領域の一例を、距離７０４はシフト幅をそれぞれ示す。図７（Ａ）〜（Ｄ）の例では、右下、左下、左上、及び右上の４方向に、シフト幅に従う距離７０４だけ領域７０２をシフトすることで、拡張画像を生成するための４つの切り出し領域が得られる。例えば、領域７０３は、領域７０２を画像の右下方向に距離７０４だけシフトすることにより得られる。同様に、領域７０５、領域７０６、及び領域７０７は、拡張画像を生成するための残る３つの切り出し領域をそれぞれ示す。 Hereinafter, an example of a method for generating an extended image will be described. In the present embodiment, the extended image corresponds to the image of the region after editing in which the position of the partial region from which the basic image is extracted is shifted according to the shift width. More specifically, the image of the region after shifting the position of the partial region from which the basic image is extracted in the vertical direction (vertical direction) and the horizontal direction (horizontal direction) of the original image by the distance according to the shift width is expanded. It is an image. For example, in FIG. 7, the area 702 is a cut-out area for generating a basic image, and is a face area detected by the detection unit 107. Further, the region 703 indicates an example of a cutout region for generating an extended image, and the distance 704 indicates a shift width. In the examples of FIGS. 7A to 7D, four areas for generating an extended image are generated by shifting the area 702 by a distance 704 according to the shift width in the four directions of lower right, lower left, upper left, and upper right. A cutout area is obtained. For example, region 703 is obtained by shifting region 702 in the lower right direction of the image by a distance of 704. Similarly, region 705, region 706, and region 707 each indicate the remaining three cropped regions for generating an extended image.

図１０は、３枚の異なる元画像から生成された、基本画像と、拡張画像と、の関係を示す。画像ＩＤがＡＡＡの基本画像は、ブラー強度が１０と推定され、表６０１に基づきシフト幅が１０に設定されている。画像ＩＤがＢＢＢの基本画像は、画像ＩＤがＡＡＡの基本画像よりもブラー強度が高く、５０と推定されたため、シフト幅は小さくなり、６に設定されている。画像ＩＤがＣＣＣの基本画像は、画像ＩＤがＢＢＢの基本画像よりもブラー強度がさらに高く、９０と推定されたため、シフト幅はさらに小さくなり、２に設定されている。もっとも、拡張画像の生成方法は上記の方法には限定されず、例えば基本画像（又は元画像）に対して回転、拡大、又は縮小処理などの編集を行うことにより拡張画像を生成することができる。この場合にも、同様に、基本画像（又は元画像）に応じて処理のパラメータを変更することができる。 FIG. 10 shows the relationship between the basic image and the extended image generated from three different original images. The blur intensity of the basic image whose image ID is AAA is estimated to be 10, and the shift width is set to 10 based on Table 601. The basic image having an image ID of BBB has a higher blur intensity than the basic image having an image ID of AAA and is estimated to be 50, so the shift width is smaller and is set to 6. Since the basic image having the image ID of CCC has a higher blur intensity than the basic image having the image ID of BBB and is estimated to be 90, the shift width is further reduced and is set to 2. However, the method of generating the extended image is not limited to the above method, and the extended image can be generated by, for example, editing the basic image (or the original image) such as rotation, enlargement, or reduction processing. .. Similarly, in this case as well, the processing parameters can be changed according to the basic image (or the original image).

学習部４０２は、複数の学習データのそれぞれを用いてモデルの学習を行う。例えば、学習部４０２は、データ保持部４０７から複数の学習画像を受け取り、顔認証用ニューラルネットワークの学習を行うことができる。学習部４０２が学習のために用いる学習画像は、拡張画像を含んでおり、さらに基本画像を含んでいてもよい。本実施形態においては、Schroff (F. Schroff et al. "FaceNet: A Unified Embedding for Face Recognition and Clustering", arXiv:1503.03832)に記載のニューラルネットワークが用いられ、学習部４０２はSchroffに記載の方法を用いてニューラルネットワークの学習を行うが、用いるモデル及びその学習方法はこれに限られない。 The learning unit 402 trains the model using each of the plurality of learning data. For example, the learning unit 402 can receive a plurality of learning images from the data holding unit 407 and learn the face recognition neural network. The learning image used by the learning unit 402 for learning includes an extended image, and may further include a basic image. In this embodiment, the neural network described in Schroff (F. Schroff et al. "FaceNet: A Unified Embedding for Face Recognition and Clustering", arXiv: 1503.03832) is used, and the learning unit 402 uses the method described in Schroff. The neural network is trained by using it, but the model used and the learning method thereof are not limited to this.

評価部４０３は、評価データを用いて、学習部４０２による学習後のモデルの判定精度を評価する。評価部４０３は、学習用の元データとは別に用意された評価データを用いて、精度評価を行うことができる。例えば、評価部４０３は、学習部４０２による学習により得られたモデル及び制御部４０１によって生成された評価画像を、データ保持部４０７から受け取ることができる。そして、評価部４０３は、顔領域の画像である評価画像をモデルに入力し、評価画像に予め関連付けられている人名（すなわち評価画像に対する正しい判定結果）と出力とを比較することにより、精度評価を行うことができる。 The evaluation unit 403 evaluates the determination accuracy of the model after learning by the learning unit 402 using the evaluation data. The evaluation unit 403 can perform accuracy evaluation using evaluation data prepared separately from the original data for learning. For example, the evaluation unit 403 can receive the model obtained by learning by the learning unit 402 and the evaluation image generated by the control unit 401 from the data holding unit 407. Then, the evaluation unit 403 inputs the evaluation image, which is an image of the face region, into the model, and compares the output with the person's name (that is, the correct determination result for the evaluation image) associated with the evaluation image in advance to evaluate the accuracy. It can be performed.

以下では、本実施形態に係る学習方法の処理の流れについて、図２のフローチャートを参照して説明する。 Hereinafter, the processing flow of the learning method according to the present embodiment will be described with reference to the flowchart of FIG.

ステップＳ２０１で評価部４０３は、評価画像を取得する。本実施形態では、人物の顔をデジタルカメラで撮影して得たデジタル画像が、評価画像を作成するための材料として利用される。また、各デジタル画像には１人の人物が写っており、この人物の人名は既知である。評価画像は、顔の位置を正確に判定するために作業者が目視によりデジタル画像上に顔領域を設定し、設定された顔領域の画像を切り出すことにより作成できる。顔領域とはデジタル画像上に設けられた矩形領域であり、１枚のデジタル画像あたりの１個の顔領域が設定される。作成された評価画像は、人名（すなわち評価画像に対する正しい判定結果）を表す情報とともに、データ保持部４０７に格納することができる。評価部４０３は、このような評価画像を、人名を表す情報とともに取得することができる。 In step S201, the evaluation unit 403 acquires an evaluation image. In the present embodiment, a digital image obtained by photographing a person's face with a digital camera is used as a material for creating an evaluation image. In addition, each digital image shows one person, and the person's name is known. The evaluation image can be created by the operator visually setting a face area on the digital image in order to accurately determine the position of the face and cutting out the image of the set face area. The face area is a rectangular area provided on the digital image, and one face area is set for each digital image. The created evaluation image can be stored in the data holding unit 407 together with information representing a person's name (that is, a correct determination result for the evaluation image). The evaluation unit 403 can acquire such an evaluation image together with information representing a person's name.

ステップＳ２０２で検出部１０７は、複数の元画像１０４のそれぞれに対して、上記のように顔領域の検出を行う。この顔領域の画像が基本画像として用いられる。 In step S202, the detection unit 107 detects the face region for each of the plurality of original images 104 as described above. The image of this face area is used as the basic image.

ステップＳ２０３で範囲設定部１０１は、上記のように、複数の元画像１０４のそれぞれについて、拡張画像を生成する際に用いるシフト幅の設定を行う。 In step S203, the range setting unit 101 sets the shift width used when generating the extended image for each of the plurality of original images 104 as described above.

ステップＳ２０４でパラメータ設定部１０２は、ステップＳ２０３で設定されたシフト幅に基づいて、複数の元画像１０４のそれぞれについてパラメータの設定を行う。 In step S204, the parameter setting unit 102 sets parameters for each of the plurality of original images 104 based on the shift width set in step S203.

ステップＳ２０５で画像生成部１０３は、複数の元画像１０４のそれぞれについて、ステップＳ２０４で設定されたパラメータに基づいて拡張画像の生成を行う。 In step S205, the image generation unit 103 generates an extended image for each of the plurality of original images 104 based on the parameters set in step S204.

ステップＳ２０６で学習部４０２は、学習画像（ステップＳ２０２で得られた基本画像及びステップＳ２０５で得られた拡張画像）を用いて、モデルの学習を行う。 In step S206, the learning unit 402 trains the model using the learning image (the basic image obtained in step S202 and the extended image obtained in step S205).

ステップＳ２０７で評価部４０３は、ステップＳ２０６で学習が行われた後のモデルの精度評価を行う。 In step S207, the evaluation unit 403 evaluates the accuracy of the model after the training is performed in step S206.

ステップＳ２０８で評価部４０３は、ステップＳ２０７で得られた精度評価の結果に基づいて、学習処理を継続するか否かを判定する。例えば、精度が閾値未満である場合に、学習処理を継続することができる。学習処理を継続する場合、処理はステップＳ２０３に戻る。学習処理を継続しない場合、図２の処理は終了する。 In step S208, the evaluation unit 403 determines whether or not to continue the learning process based on the result of the accuracy evaluation obtained in step S207. For example, the learning process can be continued when the accuracy is less than the threshold value. When continuing the learning process, the process returns to step S203. If the learning process is not continued, the process of FIG. 2 ends.

上記の実施形態によれば、元画像ごとに、拡張画像を生成するためのパラメータが設定される。このようなパラメータを用いることにより、元画像の特性に応じた処理により拡張画像を生成することができる。すなわち、元画像における判定対象の写り方に応じて、モデルの学習に適した拡張画像を装置が生成することが可能となる。したがって、一律の条件で生成した拡張画像よりも、本実施形態では、ノイズとなる拡張画像を減らすことが可能となり、学習効率が向上し、精度の高いモデルを生成することが可能となる。 According to the above embodiment, parameters for generating an extended image are set for each original image. By using such a parameter, an extended image can be generated by processing according to the characteristics of the original image. That is, the apparatus can generate an extended image suitable for learning the model according to how the determination target is captured in the original image. Therefore, in the present embodiment, it is possible to reduce the extended image that becomes noise, improve the learning efficiency, and generate a highly accurate model, as compared with the extended image generated under uniform conditions.

（変形例）
上記の実施形態においては、元データに応じて、例えば元画像のブラー強度に応じて、シフト幅が決定された。一方で、モデルの精度がさらに向上するように、シフト幅を変化させてもよい。一実施形態において、範囲設定部１０１は、学習後のモデルの判定精度に従って、シフト幅のようなパラメータを更新する。例えば、範囲設定部１０１は、ステップＳ２０７で生成された精度評価結果にさらに基づいてシフト幅を決定することができる。以下ではこのような実施形態について説明する。 (Modification example)
In the above embodiment, the shift width is determined according to the original data, for example, according to the blur intensity of the original image. On the other hand, the shift width may be changed so that the accuracy of the model is further improved. In one embodiment, the range setting unit 101 updates parameters such as the shift width according to the determination accuracy of the model after learning. For example, the range setting unit 101 can further determine the shift width based on the accuracy evaluation result generated in step S207. Hereinafter, such an embodiment will be described.

以下の例においては、シフト量に一定の変動範囲（下記の例では１〜１０の最大シフト幅が用いられる）が設定され、この変動範囲内において高い精度のモデルを与えるシフト幅が探索される。範囲設定部１０１は、最大シフト幅と、元画像（例えば顔領域のブラー強度）のそれぞれに基づいてシフト幅を決定する。ここで、表６０１に示されるような、最大シフト幅、ブラー強度、及びシフト幅の関係は、テーブル又は計算式として予め定めておくことができる。すなわち、範囲設定部１０１は、複数の基準のそれぞれに従って、判定対象を表す元データに対応するパラメータを決定することができる。そして、最大シフト幅を変化させながら、画像生成部１０３による拡張データの生成と、学習部４０２によるモデルの学習と、評価部４０３によるモデルの評価と、が繰り返される。このような方法により、より高い精度を有するモデルを与える、最適な最大シフト幅を決定することができる。 In the following example, a constant fluctuation range (the maximum shift width of 1 to 10 is used in the following example) is set for the shift amount, and a shift width that gives a highly accurate model is searched within this fluctuation range. .. The range setting unit 101 determines the shift width based on each of the maximum shift width and the original image (for example, the blur intensity of the face region). Here, the relationship between the maximum shift width, the blur strength, and the shift width as shown in Table 601 can be predetermined as a table or a calculation formula. That is, the range setting unit 101 can determine the parameters corresponding to the original data representing the determination target according to each of the plurality of criteria. Then, while changing the maximum shift width, the image generation unit 103 generates extended data, the learning unit 402 learns the model, and the evaluation unit 403 evaluates the model. By such a method, the optimum maximum shift width that gives a model with higher accuracy can be determined.

一例として、最大シフト幅の候補のうち最大のものを初期値として、最大シフト幅が減少する方向に山登り法による最適値探索を行うことにより、最適な最大シフト幅を決定する方法について説明する。図３（Ｂ）の例では、最大シフト幅の初期値として１０が設定されている。１回目のステップＳ２０３において、範囲設定部１０１は、最大シフト幅が１０である場合の基準を定める表６０１に従って、元画像のそれぞれのブラー強度に応じたシフト幅を決定する。また、２回目以降のステップＳ２０３において、範囲設定部１０１は、最大シフト幅を小さくしながら、元画像のそれぞれに基づいてシフト幅を決定する。例えば、４回目のステップＳ２０３においては、最大シフト幅として７が設定される。このとき、範囲設定部１０１は、図６（Ｂ）に示す、最大シフト幅が７である場合の基準を定める表６０２に従って、元画像のブラー強度に応じたシフト幅を決定する。 As an example, a method of determining the optimum maximum shift width by searching for the optimum value by the mountain climbing method in the direction in which the maximum shift width decreases, with the largest candidate of the maximum shift width as the initial value, will be described. In the example of FIG. 3B, 10 is set as the initial value of the maximum shift width. In the first step S203, the range setting unit 101 determines the shift width according to each blur intensity of the original image according to Table 601 that defines the reference when the maximum shift width is 10. Further, in the second and subsequent steps S203, the range setting unit 101 determines the shift width based on each of the original images while reducing the maximum shift width. For example, in the fourth step S203, 7 is set as the maximum shift width. At this time, the range setting unit 101 determines the shift width according to the blur intensity of the original image according to Table 602, which defines the reference when the maximum shift width is 7 shown in FIG. 6 (B).

ステップＳ２０４〜Ｓ２０７では、ステップＳ２０３で決定されたシフト幅に従って同様の処理が行われる。例えばステップＳ２０６で学習部４０２は、更新されたシフト幅のようなパラメータに基づいて、画像生成部１０３により生成された複数の学習画像のそれぞれを用いて、モデルの学習を行う。本実施形態において、評価部４０３は、誤認証率が０．０００１％となる場合の正認証率を、モデルの精度評価結果として求めることができる。図３（Ｂ）の表３０２は、ステップＳ２０３〜Ｓ２０８の処理を３ループ行うことにより得られた精度評価結果の例を示す。表３０２には、最大シフト幅が８、９、及び１０である場合に、精度評価結果がそれぞれ９４％、９１％、及び９０％であったことを示す。 In steps S204 to S207, the same process is performed according to the shift width determined in step S203. For example, in step S206, the learning unit 402 trains the model using each of the plurality of learning images generated by the image generation unit 103 based on parameters such as the updated shift width. In the present embodiment, the evaluation unit 403 can obtain the positive authentication rate when the erroneous authentication rate is 0.0001% as the accuracy evaluation result of the model. Table 302 of FIG. 3B shows an example of the accuracy evaluation result obtained by performing the processing of steps S203 to S208 in three loops. Table 302 shows that the accuracy evaluation results were 94%, 91%, and 90%, respectively, when the maximum shift widths were 8, 9, and 10.

このような実施形態においては、ステップＳ２０８で評価部４０３は、最大シフト幅と精度との関係に基づいて処理を継続するか否かを判定することができる。最適値が見つかっていない場合、例えば現在のループで得られた精度評価結果が前のループで得られた精度評価結果よりも良い場合、処理はステップＳ２０３に戻る。また、最適値が見つかった場合、例えば前のループで得られた精度評価結果の方が現在のループで得られた精度評価結果よりも良い場合、図２の処理は終了する。 In such an embodiment, in step S208, the evaluation unit 403 can determine whether or not to continue the process based on the relationship between the maximum shift width and the accuracy. If the optimum value is not found, for example, if the accuracy evaluation result obtained in the current loop is better than the accuracy evaluation result obtained in the previous loop, the process returns to step S203. Further, when the optimum value is found, for example, when the accuracy evaluation result obtained in the previous loop is better than the accuracy evaluation result obtained in the current loop, the process of FIG. 2 ends.

例えば、図３（Ａ）の表３０１は、仮に、評価部４０３が１から１０までの最大シフト幅のそれぞれについて評価を行ったとした場合に得られる、精度評価結果の例を示す。このような事例においては、最大シフト幅が１０の場合から５の場合までステップＳ２０３〜Ｓ２０８の処理が６ループ行われる。そして、最大シフト幅が６の場合の精度評価結果が、最大シフト幅が５の場合の精度評価結果よりも良いため、最大シフト幅の最適値は６に決定する。このように、評価部４０３は、それぞれの基準に従う学習後のモデルの判定精度に従って、元データに対応するパラメータを定める複数の基準から１つを選択することができる。もっとも、最大シフト幅の決定方法はこの方法に限られず、例えば他の最適値探索アルゴリズムを用いて最大シフト幅の最適値を決定してもよい。 For example, Table 301 in FIG. 3A shows an example of the accuracy evaluation result obtained when the evaluation unit 403 evaluates each of the maximum shift widths from 1 to 10. In such a case, the processing of steps S203 to S208 is performed in 6 loops from the case where the maximum shift width is 10 to the case where the maximum shift width is 5. Since the accuracy evaluation result when the maximum shift width is 6 is better than the accuracy evaluation result when the maximum shift width is 5, the optimum value of the maximum shift width is determined to be 6. In this way, the evaluation unit 403 can select one from a plurality of criteria that determine the parameters corresponding to the original data according to the determination accuracy of the model after learning according to each criterion. However, the method for determining the maximum shift width is not limited to this method, and for example, another optimum value search algorithm may be used to determine the optimum value for the maximum shift width.

［実施形態２］
実施形態２では、モデルの用途に応じて拡張画像を生成する。このような構成によれば、学習に用いる個々のサンプルの内容と、モデルを用いて行うタスクの内容の双方に応じて、データ拡張のためのパラメータを決定することにより、品質の高い拡張データを生成することができる。以下では、実施形態１と実施形態２の差分に絞って説明する。特に言及がない限り、ハードウェア構成・機能構成は実施形態１と同様である。 [Embodiment 2]
In the second embodiment, an extended image is generated according to the use of the model. According to such a configuration, high-quality extended data can be obtained by determining parameters for data expansion according to both the contents of individual samples used for training and the contents of tasks performed using a model. Can be generated. Hereinafter, the difference between the first embodiment and the second embodiment will be described. Unless otherwise specified, the hardware configuration and functional configuration are the same as those in the first embodiment.

実施形態１では、モデルは人物の認証のために用いられた。また、拡張画像を生成する際のシフト幅の候補値は、表６０１に示されるような、１から１０までの１０種の整数であり、元データ（例えばブラー強度）に応じて予め設定されていた。 In Embodiment 1, the model was used to authenticate a person. Further, the candidate value of the shift width when generating the extended image is 10 kinds of integers from 1 to 10 as shown in Table 601, and is preset according to the original data (for example, blur intensity). rice field.

一方で、実施形態２においては、モデルの複数の用途が予め定義されている。例えば、モデルの用途の種別は、「顔の認証」と「顔の検出」から選択されてもよい。ユーザは、操作入力部４０５を介したユーザ入力により、用途の種別を設定することができる。また、範囲設定部１０１は、モデルの用途にさらに応じてパラメータを決定することができる。例えば、範囲設定部１０１は、ブラー強度とモデルの用途との双方に応じて、パラメータを決定することができる。一例として、範囲設定部１０１は、用途が「顔の認証」である場合、図６（Ａ）に示す「顔の認証」用途の表６０１を用いてシフト幅を決定することができる。また、範囲設定部１０１は、用途が「顔の検出」である場合、図６（Ｃ）に示す「顔の検出」用途の表６０３を用いてシフト幅を決定することができる。 On the other hand, in the second embodiment, a plurality of uses of the model are defined in advance. For example, the type of use of the model may be selected from "face recognition" and "face detection". The user can set the type of use by the user input via the operation input unit 405. In addition, the range setting unit 101 can further determine the parameters according to the use of the model. For example, the range setting unit 101 can determine the parameters according to both the blur strength and the application of the model. As an example, when the use is "face recognition", the range setting unit 101 can determine the shift width by using Table 601 for the "face recognition" use shown in FIG. 6 (A). Further, when the application is "face detection", the range setting unit 101 can determine the shift width by using Table 603 for the "face detection" application shown in FIG. 6 (C).

［実施形態３］
実施形態１では、元画像に含まれる判定対象の像のブラー強度に応じてパラメータ（シフト幅）が決定された。一方で、ブラー強度とは別の要素を用いて、又は複数の要素を組み合わせて、パラメータが決定されてもよい。例えば、ブラー強度の代替として、元画像内における判定対象の位置、向き、又はこれらの双方が用いられてもよい。また、ブラー強度の代替として、環境の明るさのような、撮像環境を示す情報が用いられてもよい。環境の明るさは、判定対象付近の背景領域の画素値を解析するなどの方法で推定することができる。これらの要素も、ブラー強度と同様に、中心点間距離の推定のために用いることができる。例えば、判定対象の位置が画像の中心に近いほど、また、判定対象の向きが正面に近いほど、判定対象の検出処理の難易度は低くなるため、中心点間距離は小さいと推定される。また、環境が明るいほど、判定対象の検出処理の難易度は低くなるため、中心点間距離は小さいと推定される。 [Embodiment 3]
In the first embodiment, the parameter (shift width) is determined according to the blur intensity of the image to be determined included in the original image. On the other hand, the parameters may be determined by using an element different from the blur strength or by combining a plurality of elements. For example, as an alternative to blur intensity, the position, orientation, or both of the determination targets in the original image may be used. Further, as an alternative to the blur intensity, information indicating the imaging environment such as the brightness of the environment may be used. The brightness of the environment can be estimated by a method such as analyzing the pixel value of the background area near the determination target. These factors, as well as the blur intensity, can be used to estimate the distance between center points. For example, the closer the position of the determination target is to the center of the image and the closer the orientation of the determination target is to the front, the lower the difficulty of the determination target detection process, and therefore the distance between the center points is estimated to be small. Further, the brighter the environment, the lower the difficulty of the detection process of the determination target, so it is estimated that the distance between the center points is small.

さらに、ブラー強度の代替として、元画像又は基本画像に付随するメタデータが用いられてもよい。例えば、上述のように、元画像から判定対象（例えば顔領域）を検出することにより基本画像を生成する際に、元画像中の判定対象の像が判定対象を表す尤度を出力する検出器を用いることができる。この尤度はメタデータとして用いることができる。このような尤度も、ブラー強度と同様に、中心点間距離の推定のために用いることができる。すなわち、尤度が低いほど、判定対象の検出処理の難易度は高くなるため、中心点間距離は大きいと推定される。例えば、顔尤度の範囲を０から１００の間の値とすると、元画像から生成された学習画像についての顔尤度の大きさは９０であるかもしれない。このような例において、範囲設定部１０１は、図１３に示すテーブルに基づいて、尤度に対応するシフト幅を設定することができる。図１３の例では、尤度が９０である場合、シフト幅１０が用いられる。 Further, as an alternative to the blur intensity, the metadata associated with the original image or the basic image may be used. For example, as described above, when a basic image is generated by detecting a judgment target (for example, a face region) from the original image, a detector that outputs the likelihood that the image of the judgment target in the original image represents the judgment target. Can be used. This likelihood can be used as metadata. Such a likelihood, like the blur intensity, can also be used to estimate the distance between center points. That is, the lower the likelihood, the higher the difficulty of the detection process of the determination target, so it is estimated that the distance between the center points is large. For example, if the range of face likelihood is a value between 0 and 100, the magnitude of face likelihood for the trained image generated from the original image may be 90. In such an example, the range setting unit 101 can set the shift width corresponding to the likelihood based on the table shown in FIG. In the example of FIG. 13, when the likelihood is 90, the shift width 10 is used.

メタデータの種類は尤度に限定されない。例えば、元画像が撮影された際の撮像条件を示す情報を、メタデータとして用いてもよい。撮像条件の例としては、カメラ位置姿勢のデータ、及び撮影時のカメラゲイン値、等が挙げられる。例えば、カメラ位置姿勢のデータが、判定物体を正面付近から撮影したことを示している場合には、中心点間距離が小さいと推定されるためシフト幅を大きくし、それ以外の場合には中心点間距離が大きいと推定されるためシフト幅を小さくすることができる。また、撮影時のカメラゲイン値が小さい場合には、中心点間距離が小さいと推定されるためシフト幅を大きくし、それ以外の場合には中心点間距離が大きいと推定されるためシフト幅を小さくすることができる。 The type of metadata is not limited to likelihood. For example, information indicating the imaging conditions when the original image is captured may be used as metadata. Examples of imaging conditions include camera position / orientation data, camera gain value at the time of shooting, and the like. For example, if the camera position / orientation data indicates that the judgment object was taken from near the front, the distance between the center points is presumed to be small, so the shift width is increased, and in other cases, the center. Since it is estimated that the distance between points is large, the shift width can be reduced. If the camera gain value at the time of shooting is small, the distance between the center points is estimated to be small, so the shift width is increased. In other cases, the distance between the center points is estimated to be large, so the shift width is increased. Can be made smaller.

［実施形態４］
実施形態１では、元画像ごとに、中心点間距離を推定するブラー強度等の要素を用いて、拡張データを生成するためのパラメータが決定された。しかしながら、本発明はこのような形態には限定されない。例えば、元画像群の一部に判定対象領域を示す参照値が付随している場合、元画像群をいくつかの部分集合に分離して、部分集合ごとに、参照値を用いて拡張データを生成するためのパラメータを決定することができる。以下では、このような方法について、実施形態１と実施形態４の差分に絞って説明する。特に言及がない限り、ハードウェア構成・機能構成は実施形態１と同様である。 [Embodiment 4]
In the first embodiment, parameters for generating extended data are determined for each original image by using elements such as blur intensity for estimating the distance between center points. However, the present invention is not limited to such a form. For example, when a part of the original image group is accompanied by a reference value indicating a judgment target area, the original image group is divided into several subsets, and extended data is used for each subset using the reference value. The parameters to generate can be determined. Hereinafter, such a method will be described focusing on the difference between the first embodiment and the fourth embodiment. Unless otherwise specified, the hardware configuration and functional configuration are the same as those in the first embodiment.

はじめに、参照値について、図１４を参照して説明する。画像１４０１は元画像であり、領域１４０２は検出部１０７を用いる方法で装置により検出された、判定対象の像（顔）の領域であり、領域１４０３は参照値によって示される顔領域である。参照値は、領域１４０２を検出する方法よりも精度の高い方法により検出された判定対象の像の位置を表す。参照値は、人の目視によって生成されたデータであってもよいし、処理時間はかかるが精度が非常に高い検出器の出力であってもよい。以下では、説明の単純化のために、領域１４０２と領域１４０３とは同一形状であるものとする。また、距離１４０４及び距離１４０５はそれぞれ、画像上における縦方向及び横方向の、領域１４０２と領域１４０３との位置誤差を示す。このような位置誤差は、同様の特性を有する画像についての中心点間距離を推定するために用いることができる。 First, the reference value will be described with reference to FIG. The image 1401 is the original image, the area 1402 is the area of the image (face) to be determined detected by the apparatus by the method using the detection unit 107, and the area 1403 is the face area indicated by the reference value. The reference value represents the position of the image to be determined detected by a method having a higher accuracy than the method of detecting the region 1402. The reference value may be data generated by human visual inspection, or may be the output of a detector that takes a long time to process but has very high accuracy. In the following, for the sake of simplification of the description, it is assumed that the region 1402 and the region 1403 have the same shape. Further, the distance 1404 and the distance 1405 indicate the positional error between the region 1402 and the region 1403 in the vertical direction and the horizontal direction on the image, respectively. Such a positional error can be used to estimate the distance between center points for images having similar characteristics.

本実施形態において中心点間距離を推定する手順について、図１１を参照して説明する。はじめに、複数の元画像群、画像の特性に基づいて複数の元画像群（部分集合）に分割される。例えば、元画像群は、カメラＣ１で撮像された画像と、カメラＣ２で撮像された画像とを含んでいてもよい。２台のカメラＣ１及びＣ２は、異なる環境に固定して設置されている。この場合、元画像群を、カメラＣ１で撮影された画像からなる部分集合Ａと、カメラＣ２で撮影された画像からなる部分集合Ｂとに分割することができる。ここでは説明の単純化のために元画像が２つの部分集合Ａ，Ｂに分けられるものとするが、元画像群は３つ以上の部分集合に分割されてもよい。また、分割方法もこれの方法に限定されず、例えば、元画像群が、昼に撮像された画像群と夜に撮像された画像群に分割されてもよい。 The procedure for estimating the distance between the center points in the present embodiment will be described with reference to FIG. First, it is divided into a plurality of original image groups (subsets) based on the characteristics of the plurality of original image groups and images. For example, the original image group may include an image captured by the camera C1 and an image captured by the camera C2. The two cameras C1 and C2 are fixedly installed in different environments. In this case, the original image group can be divided into a subset A composed of images taken by the camera C1 and a subset B composed of images taken by the camera C2. Here, the original image is divided into two subsets A and B for the sake of simplification of the description, but the original image group may be divided into three or more subsets. Further, the division method is not limited to this method, and for example, the original image group may be divided into an image group captured in the daytime and an image group captured in the nighttime.

続いて、部分集合Ａの一部の画像、及び部分集合Ｂの一部の画像に対して参照値が付与される。また、部分集合Ａ，Ｂのそれぞれが、参照値付き画像群と参照値なし画像群とに分離される。次に、部分集合Ａについての参照値付き画像群について、顔領域と、参照値によって特定される領域と、の位置誤差（すなわち、距離１４０４，１４０５）が統計的に解析される。例えば、画像上における縦方向の誤差（距離１４０４）のヒストグラムを作成し、その中央値を部分集合Ａの参照値なし画素群についての縦方向のシフト幅とすることができる。同様に、画像上における横方向の誤差（距離１４０５）の中央値を、部分集合Ａの参照値なし画素群についての横方向のシフト幅とすることができる。同様に、部分集合Ｂについても、縦方向及び横方向のシフト幅を得ることができる。 Subsequently, reference values are given to a part of the images of the subset A and a part of the images of the subset B. Further, each of the subsets A and B is separated into an image group with a reference value and an image group without a reference value. Next, for the reference valued image group for the subset A, the positional error (that is, the distance 1404, 1405) between the face region and the region specified by the reference value is statistically analyzed. For example, a histogram of the vertical error (distance 1404) on the image can be created, and the median value thereof can be used as the vertical shift width for the reference value-less pixel group of the subset A. Similarly, the median of the lateral error (distance 1405) on the image can be the lateral shift width for the reference-less pixel group of the subset A. Similarly, for the subset B, the shift widths in the vertical direction and the horizontal direction can be obtained.

このような方法により、参照値付き画像群についての、装置により検出された判定対象の像の位置と、参照値によって示される判定対象の像の位置と、の誤差に基づいて、同じ部分集合の参照値なし画像群についてのパラメータを決定することができる。上記の方法によれば、部分集合のそれぞれについて、異なるシフト幅が設定されている表を作成することができる。この場合、画像生成部１０３は、各部分集合の参照値なし画素群から、設定されたシフト幅に従って、実施形態１と同様に拡張画像を生成することができる。なお、表６０１と同様に、所定範囲のブラー強度を有する画像の部分集合のそれぞれについて、異なるシフト幅を設定してもよい。 By such a method, based on the error between the position of the image of the judgment target detected by the device and the position of the image of the judgment target indicated by the reference value for the image group with the reference value, the same subset Parameters for image groups without reference values can be determined. According to the above method, it is possible to create a table in which different shift widths are set for each of the subsets. In this case, the image generation unit 103 can generate an extended image from the reference value-less pixel group of each subset according to the set shift width in the same manner as in the first embodiment. As in Table 601, different shift widths may be set for each subset of images having a predetermined range of blur intensity.

［実施形態５］
上記の実施形態に係る学習装置は、さらに、生成された拡張画像を目視で確認するための構成、各種のパラメータをインタラクティブに調整するための構成、及び学習画像の一部を除去するための構成を含んでいてもよい。以下、このような構成について説明する。特に言及がない限り、ハードウェア構成・機能構成は実施形態１と同様である。 [Embodiment 5]
The learning device according to the above embodiment further has a configuration for visually confirming the generated extended image, a configuration for interactively adjusting various parameters, and a configuration for removing a part of the learning image. May include. Hereinafter, such a configuration will be described. Unless otherwise specified, the hardware configuration and functional configuration are the same as those in the first embodiment.

（生成された拡張画像を目視確認するための構成）
意図した拡張画像が生成されていることを目視で確認するために、学習装置は、拡張画像を表示する機能を有していてもよい。例えば、実施形態１のように画像のブラー強度に基づいて拡張画像を作成する場合、表示部４０４は図１０のような表示を行うことができる。 (Configuration for visually confirming the generated extended image)
The learning device may have a function of displaying the extended image in order to visually confirm that the intended extended image is generated. For example, when the extended image is created based on the blur intensity of the image as in the first embodiment, the display unit 404 can perform the display as shown in FIG.

（各種パラメータをインタラクティブに調整するための構成）
拡張画像の生成に用いられるパラメータを、画像生成の結果を目視で確認しながらインタラクティブに調整するために、学習装置は、パラメータを手動で調整するための機能を有していてもよい。例えば、実施形態１に従って拡張画像が得られた後に、ユーザが拡張画像ごとにシフト幅を調整してもよい。この場合、表示部４０４は、図１２に示すユーザインタフェースを表示することができる。このユーザインタフェースは、シフト幅のようなパラメータを表示するための要素を含んでいてもよい。また、このユーザインタフェースは、パラメータを調整するユーザ入力を受け取るための要素と、調整後のパラメータに従って生成された学習画像を表示するための要素と、を含んでいてもよい。例えば、図１２に示すユーザインタフェースにおいて、ユーザがシフト幅を手動で変更すると、変更後のシフト幅に従う拡張画像が表示される。 (Configuration for interactive adjustment of various parameters)
In order to interactively adjust the parameters used for generating the extended image while visually confirming the result of image generation, the learning device may have a function for manually adjusting the parameters. For example, after the extended image is obtained according to the first embodiment, the user may adjust the shift width for each extended image. In this case, the display unit 404 can display the user interface shown in FIG. This user interface may include elements for displaying parameters such as shift width. The user interface may also include an element for receiving user input for adjusting parameters and an element for displaying a learning image generated according to the adjusted parameters. For example, in the user interface shown in FIG. 12, when the user manually changes the shift width, an extended image according to the changed shift width is displayed.

図１２においては、実施形態１に従ってシフト幅が６に設定されており、このシフト幅に従って生成された拡張画像１６０４と、対応する基本画像と、が表示されている。シフト幅を変更する場合、ユーザはボックス１６０１の中に新たなシフト幅を入力し、生成開始ボタン１６０２を押下する。ここでは、ユーザは新たなシフト幅として３を入力しており、このシフト幅に従って生成された新たな拡張画像１６０５が表示される。ユーザは、表示された拡張画像１６０４及び拡張画像１６０５を目視して、シフト幅の変更を希望する場合にはボタン１６０６を、希望しない場合にはボタン１６０７を、それぞれ押下する。ユーザが異なる大きさのシフト幅を検討したい場合、再度ボックス１６０１の中に新たなシフト幅を入力して、上記の手順を繰り返すことができる。 In FIG. 12, the shift width is set to 6 according to the first embodiment, and the extended image 1604 generated according to the shift width and the corresponding basic image are displayed. When changing the shift width, the user inputs a new shift width in the box 1601 and presses the generation start button 1602. Here, the user has entered 3 as the new shift width, and the new extended image 1605 generated according to this shift width is displayed. The user visually observes the displayed extended image 1604 and extended image 1605, and presses the button 1606 when he / she wants to change the shift width, and presses the button 1607 when he / she does not want to change the shift width. If the user wants to consider a shift width of a different size, he / she can enter the new shift width in the box 1601 again and repeat the above procedure.

（学習画像を除去するための構成）
ユーザは、学習画像（基本画像及び拡張画像）を目視確認した際に、学習に不適であると判断された画像等の、特定の画像を除去してもよい。この場合、表示部４０４は、図１５に示すユーザインタフェースを表示することができる。このユーザインタフェースは、学習画像を削除するユーザ入力を受け取るための要素を含んでいる。例えば、図１５に示すユーザインタフェースは、基本画像及び拡張画像の各画像を選択するためのチェックボックスを有している。ユーザは、チェックボックスをチェックすることにより、チェックした画像を、学習に用いられる画像から除去することができる。 (Structure for removing the learning image)
The user may remove a specific image such as an image determined to be unsuitable for learning when the learning image (basic image and extended image) is visually confirmed. In this case, the display unit 404 can display the user interface shown in FIG. This user interface contains elements for receiving user input to delete the training image. For example, the user interface shown in FIG. 15 has a check box for selecting each image of the basic image and the extended image. By checking the check box, the user can remove the checked image from the image used for learning.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other Examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the above embodiments, and various modifications and modifications can be made without departing from the spirit and scope of the invention. Therefore, a claim is attached to make the scope of the invention public.

１０１：範囲設定部、１０２：パラメータ設定部、１０３：画像生成部、１０７：検出部、４０１：制御部、４０２：学習部、４０３：評価部 101: Range setting unit, 102: Parameter setting unit, 103: Image generation unit, 107: Detection unit, 401: Control unit, 402: Learning unit, 403: Evaluation unit

Claims

A learning device that uses training data to train a model that outputs judgment results for input data.
A determination means for determining the editing parameters of a plurality of training data obtained by editing the original data representing the determination target for each original data, and
A generation means for generating a plurality of learning data representing each of the determination targets from the original data based on the parameters.
A learning means for learning the model using each of the plurality of learning data, and
A learning device characterized by being provided with.

An evaluation means for evaluating the determination accuracy of the model after learning is further provided by using the evaluation data.
The determination means updates the parameters according to the determination accuracy of the model after learning.
The learning according to claim 1, wherein the learning means trains the model using each of the plurality of learning data generated by the generation means based on the updated parameters. Device.

The determination means determines the parameter corresponding to the original data representing the determination target according to each of the plurality of criteria.
The learning apparatus according to claim 2, wherein the evaluation means selects one from the plurality of criteria according to the determination accuracy of the model after learning according to the respective criteria.

The model outputs the judgment result for the data of the original image including the image to be judged, and outputs the judgment result.
The generation means is characterized in that, based on the parameter, data of a plurality of learning images including the image of the determination target is generated from the data of the original image including the image of the determination target. The learning device according to any one of 3 to 3.

The learning device according to claim 4, wherein the parameter is a position shift amount of the image to be determined between the plurality of learning images.

The generation means, as at least a part of the data of the plurality of training images,
Data of a basic image which is an image of a partial region including the image of the determination target detected from the original image and
The data of the extended image which is the image of the region after shifting the position of the partial region according to the parameter, and
The learning device according to claim 4 or 5, wherein the learning apparatus is generated.

The learning apparatus according to claim 6, wherein the extended image includes an image of the region after the position of the partial region is shifted in the vertical direction and the horizontal direction of the image by a distance according to the parameter.

The determination means is
6. The parameter is determined based on an estimation error between the position of the image of the determination target detected from the original image and the position of the partial region including the image of the determination target. Or the learning device according to 7.

The learning device according to any one of claims 4 to 8, wherein the determination means determines the parameters based on the image of the determination target included in the original image.

4. The determination means is characterized in that the parameter is determined based on the blur intensity, position, or orientation of the image of the determination target, or the likelihood that the image of the determination target represents the determination target. 9. The learning device according to any one of 9.

The learning device according to any one of claims 4 to 10, wherein the determination means determines the parameters based on information indicating the imaging conditions or the imaging environment of the original image.

The determination means is
The plurality of original images are classified into a plurality of image groups according to the characteristics of the images, and the plurality of original images are classified into a plurality of image groups.
The position of the image of the determination target detected by the method by the apparatus and the position of the image of the determination target detected by a method with higher accuracy than the method for a part of the images included in one image group. The learning apparatus according to any one of claims 4 to 7, wherein the parameters for the remaining images included in the one image group are determined based on the error of.

The learning device according to any one of claims 4 to 12, further comprising a display means for displaying the learning image generated according to the parameters.

The display means displays a user interface including an element for receiving a user input for adjusting the parameter and an element for displaying the learning image generated according to the adjusted parameter. , The learning device according to claim 13.

The learning device according to claim 13, wherein the display means displays a user interface including an element for receiving a user input for deleting the learning image.

The judgment result for the original data is predetermined.
The learning means according to claims 1 to 15, wherein the learning means trains the model so that the model to which each of the plurality of learning data is input outputs the predetermined determination result. The learning device according to any one item.

The learning device according to any one of claims 1 to 16, wherein the determination means further determines the parameters according to the use of the model.

The learning device according to any one of claims 1 to 17, wherein the model is a neural network.

It is a learning method that uses training data to train a model that outputs judgment results for input data.
A process of determining editing parameters of a plurality of learning data obtained by editing the original data representing the determination target for each original data, and
A step of generating a plurality of learning data representing each of the determination targets from the original data based on the parameters, and
A process of learning the model using each of the plurality of training data, and
A learning method characterized by including.

A program for causing a computer to function as each means of the learning device according to any one of claims 1 to 18.