JP2021111273A

JP2021111273A - Learning model generation method, program and information processor

Info

Publication number: JP2021111273A
Application number: JP2020004562A
Authority: JP
Inventors: パヌメートチェトプラユン; Phrayun Panumate Ceto; 倩穎戴; Qianying Dai
Original assignee: Mobility Technologies Co Ltd
Current assignee: Mobility Technologies Co Ltd
Priority date: 2020-01-15
Filing date: 2020-01-15
Publication date: 2021-08-02
Anticipated expiration: 2040-01-15
Also published as: JP7374001B2

Abstract

To provide a learning model generation method, a program, and an information processor that accurately output information related to an object from a photographed image by using combination data based on the photographed image including the object.SOLUTION: A learning model generation method includes the steps of: acquiring a photographed image including an object photographed by an imaging apparatus mounted on a moving object; acquiring a plurality of object extraction images obtained by extracting the object in association with the photographed image; and generating, based on the acquired photographed image, the plurality of object extraction images and training data including information related to the object, a learning model that outputs information related to the object when the photographed image and the plurality of object extraction images are input.SELECTED DRAWING: Figure 9

Description

本技術は、学習モデルの生成方法、プログラム及び情報処理装置に関する。 The present technology relates to a learning model generation method, a program, and an information processing device.

従来、車両等の移動体の前方を撮像した画像から各種の情報を検出する検出方法が提案されている。特許文献１には、車両に搭載され、その車両が走行する路上の白線をより正確に検出することが出来る車両用白線検出装置が開示されている。 Conventionally, a detection method has been proposed in which various information is detected from an image captured in front of a moving body such as a vehicle. Patent Document 1 discloses a vehicle white line detection device that is mounted on a vehicle and can more accurately detect a white line on the road on which the vehicle travels.

特開２００７−２２００１３号公報Japanese Unexamined Patent Publication No. 2007-220013

しかしながら、特許文献１の手法では、撮像画像から検出対象である白線を検出する能力は十分ではないという問題がある。 However, the method of Patent Document 1 has a problem that the ability to detect a white line to be detected from a captured image is not sufficient.

本開示の目的は、撮像画像から対象物に関する情報を精度よく出力する学習モデルの生成方法、プログラム及び情報処理装置を提供することにある。 An object of the present disclosure is to provide a learning model generation method, a program, and an information processing device that accurately output information about an object from a captured image.

本開示の一態様における学習モデルの生成方法は、移動体に載置される撮像装置により撮像された対象物を含む撮像画像を取得し、前記対象物を抽出した複数の対象物抽出画像を前記撮像画像に関連付けて取得し、取得した撮像画像及び複数の対象物抽出画像と対象物に関する情報とを含む訓練データに基づき、撮像画像及び複数の対象物抽出画像を入力した場合に対象物に関する情報を出力する学習モデルを生成する。 In the method of generating a learning model in one aspect of the present disclosure, an image taken by an image pickup device mounted on a moving body including an object is acquired, and a plurality of object extraction images obtained by extracting the object are obtained. Information about an object when a captured image and a plurality of object extracted images are input based on training data that is acquired in association with the captured image and includes the acquired captured image and a plurality of object extracted images and information about the object. Generate a training model that outputs.

本開示によれば、撮像画像から対象物に関する情報を精度よく出力することができる。 According to the present disclosure, it is possible to accurately output information about an object from a captured image.

第１実施形態における学習モデル生成システムの概要図である。It is a schematic diagram of the learning model generation system in 1st Embodiment. 情報処理装置の構成例を示すブロック図である。It is a block diagram which shows the configuration example of an information processing apparatus. 制御ユニットの構成例を示すブロック図である。It is a block diagram which shows the configuration example of a control unit. ユニットグループ値の生成方法を示す概念図である。It is a conceptual diagram which shows the generation method of a unit group value. データユニットの内容を説明する説明図である。It is explanatory drawing explaining the content of a data unit. 白線抽出画像の結合方法の一例を説明する説明図である。It is explanatory drawing explaining an example of the combination method of the white line extracted image. 学習モデルの構成を説明する説明図である。It is explanatory drawing explaining the structure of the learning model. ｎ及びｍに対応するグループＩＤをマトリクス状に記憶したテーブル例を示す図である。It is a figure which shows the example of the table which stored the group ID corresponding to n and m in a matrix form. 学習モデルの生成処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the generation processing procedure of a learning model. ユニットグループ値取得の詳細な手順の一例を示すフローチャートである。It is a flowchart which shows an example of the detailed procedure of unit group value acquisition. 第２実施形態における学習モデルの生成処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the generation processing procedure of the learning model in 2nd Embodiment. 第２実施形態におけるユニットグループ値取得の詳細な手順の一例を示すフローチャートである。It is a flowchart which shows an example of the detailed procedure of the unit group value acquisition in 2nd Embodiment. 第３実施形態における推定システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the estimation system in 3rd Embodiment. 推定処理方法を示す概念図である。It is a conceptual diagram which shows the estimation processing method. 学習モデルを用いた推定処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the estimation processing procedure using a learning model. 第３実施形態におけるユニットグループ値取得の詳細な手順の一例を示すフローチャートである。It is a flowchart which shows an example of the detailed procedure of the unit group value acquisition in 3rd Embodiment. 表示装置で表示される画面例を示す図である。It is a figure which shows the screen example displayed on the display device. 第４実施形態における学習モデルを用いた推定処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the estimation processing procedure using the learning model in 4th Embodiment. 第４実施形態におけるユニットグループ値取得の詳細な手順の一例を示すフローチャートである。It is a flowchart which shows an example of the detailed procedure of the unit group value acquisition in 4th Embodiment.

本発明をその実施の形態を示す図面を参照して具体的に説明する。 The present invention will be specifically described with reference to the drawings showing the embodiments thereof.

（第１実施形態）
図１は、第１実施形態における学習モデル生成システム１００の概要図である。学習モデル生成システム１００は、情報処理装置１、移動体２の制御ユニット２００及び撮像装置３１を含む。情報処理装置１及び制御ユニット２００は例えばインターネット又は公衆回線網等のネットワークＮ１を介して通信可能に接続されている。 (First Embodiment)
FIG. 1 is a schematic diagram of the learning model generation system 100 according to the first embodiment. The learning model generation system 100 includes an information processing device 1, a control unit 200 of the mobile body 2, and an image pickup device 31. The information processing device 1 and the control unit 200 are communicably connected via a network N1 such as the Internet or a public switched telephone network.

移動体２は、例えば車両、オートバイ、ヘリコプター、船舶、ドローン等の移動機構を備えるものであり、移動体２に載置される撮像装置３１により、移動中の移動体２の外部を撮影する。以下では移動体２は車両であるものとして説明する。撮像装置３１は、例えばドライブレコーダーであり、移動体２の外部をカメラにより撮影し、撮影した映像データをＳＤカード等の記録媒体に記録する装置である。なお撮像装置３１は、カメラ等のイメージセンサに加えて、例えばレーダ又はライダー（LIDAR: Laser Imaging Detection and Ranging）等の測距センサを備えていてもよい。測距センサは、送信波を出力し、物体からの反射波を受け取ることにより、反射波の受信状態から物体の位置や速度を算出することができる。撮像装置３１は、制御ユニット２００に接続される。制御ユニット２００は、撮像装置３１により撮影された動画を情報処理装置１に送信する。 The moving body 2 is provided with a moving mechanism such as a vehicle, a motorcycle, a helicopter, a ship, or a drone, and an image pickup device 31 mounted on the moving body 2 photographs the outside of the moving moving body 2. Hereinafter, the moving body 2 will be described as being a vehicle. The image pickup device 31 is, for example, a drive recorder, which is a device for photographing the outside of the moving body 2 with a camera and recording the photographed video data on a recording medium such as an SD card. The imaging device 31 may include, for example, a distance measuring sensor such as a radar or a lidar (LIDAR: Laser Imaging Detection and Ranging) in addition to an image sensor such as a camera. The ranging sensor outputs the transmitted wave and receives the reflected wave from the object, so that the position and speed of the object can be calculated from the reception state of the reflected wave. The image pickup device 31 is connected to the control unit 200. The control unit 200 transmits the moving image captured by the imaging device 31 to the information processing device 1.

情報処理装置１は、例えばサーバコンピュータである。情報処理装置１は、制御ユニット２００から取得した情報に基づき、撮像画像に含まれる対象物に関する情報を出力する学習モデルを生成する。第１実施形態において、情報処理装置１は１台のサーバコンピュータとして説明するが、複数のサーバコンピュータで機能又は処理を分散させてもよいし、１台の大型コンピュータに仮想的に生成される複数のサーバコンピュータ（インスタンス）の内の１つであってもよい。なお、情報処理装置１は、移動体２の内部に設置されていてもよい。 The information processing device 1 is, for example, a server computer. The information processing device 1 generates a learning model that outputs information about an object included in the captured image based on the information acquired from the control unit 200. In the first embodiment, the information processing device 1 is described as one server computer, but the functions or processes may be distributed among a plurality of server computers, or a plurality of virtually generated information processing devices 1 may be generated in one large computer. It may be one of the server computers (instances) of. The information processing device 1 may be installed inside the mobile body 2.

このような学習モデル生成システム１００の構成及び詳細な処理内容について以下に説明する。 The configuration and detailed processing contents of such a learning model generation system 100 will be described below.

図２は、情報処理装置１の構成例を示すブロック図である。情報処理装置１は、制御部１０、記憶部１１、通信部１２及び操作部１３を含む。制御部１０は、一又は複数のＣＰＵ（Central Processing Unit ）、ＧＰＵ（Graphics Processing Unit）等を用いたプロセッサであり、内蔵するＲＯＭ（Read Only Memory）又はＲＡＭ（Random Access Memory）等のメモリを用い、各構成部を制御して処理を実行する。制御部１０は、記憶部１１に記憶されているプログラム１Ｐを読み出して実行することにより、種々の情報処理、制御処理等を行う。 FIG. 2 is a block diagram showing a configuration example of the information processing device 1. The information processing device 1 includes a control unit 10, a storage unit 11, a communication unit 12, and an operation unit 13. The control unit 10 is a processor using one or a plurality of CPUs (Central Processing Units), GPUs (Graphics Processing Units), etc., and uses a built-in memory such as a ROM (Read Only Memory) or a RAM (Random Access Memory). , Control each component to execute processing. The control unit 10 performs various information processing, control processing, and the like by reading and executing the program 1P stored in the storage unit 11.

記憶部１１には、例えばハードディスク又はＳＳＤ（Solid State Drive ）等の不揮発性メモリを含む。記憶部１１には、プログラム１Ｐを含む制御部１０が参照するプログラム及びデータが記憶される。記憶部１１に記憶されるプログラム１Ｐは、記録媒体にコンピュータ読み取り可能に記録されている態様であってもよい。記憶部１１は、図示しない読出装置によって記録媒体１Ａから読み出されたプログラム１Ｐを記憶する。また、図示しない通信網に接続されている図示しない外部コンピュータからプログラム１Ｐをダウンロードし、記憶部１１に記憶させたものであってもよい。なお記憶部１１は、複数の記憶装置により構成されていてもよく、情報処理装置１に接続された外部記憶装置であってもよい。 The storage unit 11 includes, for example, a hard disk or a non-volatile memory such as an SSD (Solid State Drive). The storage unit 11 stores programs and data referred to by the control unit 10 including the program 1P. The program 1P stored in the storage unit 11 may be recorded on a recording medium so that it can be read by a computer. The storage unit 11 stores the program 1P read from the recording medium 1A by a reading device (not shown). Further, the program 1P may be downloaded from an external computer (not shown) connected to a communication network (not shown) and stored in the storage unit 11. The storage unit 11 may be composed of a plurality of storage devices, or may be an external storage device connected to the information processing device 1.

記憶部１１には、更に複数の学習モデル１Ｍが記憶される。学習モデル１Ｍは、撮像画像に含まれる対象物に関する情報を識別する識別器であり、機械学習により生成された学習モデルである。学習モデル１Ｍは、その定義情報によって定義される。学習モデル１Ｍの定義情報は、例えば、学習モデル１Ｍの構造情報や層の情報、各層が備えるチャネルの情報、学習済みのパラメータを含む。記憶部１１には、学習モデル１Ｍに関する定義情報が記憶される。学習モデル１Ｍの詳細については後述する。 A plurality of learning models 1M are further stored in the storage unit 11. The learning model 1M is a discriminator that identifies information about an object included in a captured image, and is a learning model generated by machine learning. The learning model 1M is defined by its definition information. The definition information of the learning model 1M includes, for example, structural information and layer information of the learning model 1M, channel information included in each layer, and learned parameters. Definition information about the learning model 1M is stored in the storage unit 11. The details of the learning model 1M will be described later.

通信部１２は、ネットワークＮ１を介した通信を実現する通信インタフェースである。制御部１０は、通信部１２によりネットワークＮ１を介して制御ユニット２００と通信接続が可能である。 The communication unit 12 is a communication interface that realizes communication via the network N1. The control unit 10 can communicate with the control unit 200 via the network N1 by the communication unit 12.

操作部１３は、ユーザの操作を受け付けるインタフェースであり、物理ボタン、マウス、ディスプレイ内蔵のタッチパネルデバイスを含む。操作部１３は、ユーザからの操作入力を受け付け、操作内容に応じた制御信号を制御部１０へ送出する。 The operation unit 13 is an interface that accepts user operations, and includes a physical button, a mouse, and a touch panel device with a built-in display. The operation unit 13 receives an operation input from the user and sends a control signal according to the operation content to the control unit 10.

図３は、制御ユニット２００の構成例を示すブロック図である。制御ユニット２００は、例えば移動体２の装備品を制御するためのＥＣＵ（Electronic Control Unit）であり、制御部２０、記憶部２１、第１通信部２２及び第２通信部２３等を備える。 FIG. 3 is a block diagram showing a configuration example of the control unit 200. The control unit 200 is, for example, an ECU (Electronic Control Unit) for controlling the equipment of the mobile body 2, and includes a control unit 20, a storage unit 21, a first communication unit 22, a second communication unit 23, and the like.

制御部２０は、一又は複数のＣＰＵ、ＧＰＵ等を用いたプロセッサであり、内蔵するＲＯＭ及びＲＡＭ等のメモリを用いて各構成部を制御して処理を実行する。制御部２０は、内蔵するタイマーによって逐次、時間情報を取得することができる。制御部２０は、記憶部２１に記憶されているプログラムに基づく情報処理を実行する。 The control unit 20 is a processor using one or a plurality of CPUs, GPUs, etc., and controls each component unit using a built-in memory such as a ROM and a RAM to execute processing. The control unit 20 can sequentially acquire time information by the built-in timer. The control unit 20 executes information processing based on the program stored in the storage unit 21.

記憶部２１は、ＥＥＰＲＯＭ（Electronically Erasable Programmable Read Only Memory）などの不揮発性メモリを備える。記憶部２１には、制御部２０により実行されるプログラム及び当該プログラムの実行に必要なデータ等が記憶される。記憶部２１は、制御部２０に内蔵のタイマーによって得られる時刻情報を対応付けて移動体２の移動中における速度のログを記憶するとよい。 The storage unit 21 includes a non-volatile memory such as an EEPROM (Electronically Erasable Programmable Read Only Memory). The storage unit 21 stores a program executed by the control unit 20, data necessary for executing the program, and the like. The storage unit 21 may store a log of the speed of the moving body 2 during movement by associating the time information obtained by the timer built in the control unit 20 with the time information.

第１通信部２２は、ＣＡＮ（Control Area Network）又はＥｔｈｅｒｎｅｔ（登録商標）等の通信プロトコルを用いた通信インタフェースであり、制御部２０は、第１通信部２２を介して移動体内通信回線Ｎ２に接続されている各種機器、他のＥＣＵ等と相互に通信する。移動体内通信回線Ｎ２を介して第１通信部２２に接続される機器には、撮像装置３１が含まれる。 The first communication unit 22 is a communication interface using a communication protocol such as CAN (Control Area Network) or Ethernet (registered trademark), and the control unit 20 connects to the mobile communication line N2 via the first communication unit 22. It communicates with various connected devices, other ECUs, etc. The device connected to the first communication unit 22 via the mobile communication line N2 includes an image pickup device 31.

第２通信部２３は、３Ｇ、ＬＴＥ、４Ｇ、５Ｇ、ＷｉＦｉ等の移動体通信のプロトコルを用いて無線通信をするための通信インタフェースであり、第２通信部に接続されたアンテナを介して情報処理装置１とデータの送受信を行う。第２通信部２３と情報処理装置１との通信は、例えば公衆回線網又はインターネット等の外部ネットワークＮ１を介して行われる。 The second communication unit 23 is a communication interface for wireless communication using mobile communication protocols such as 3G, LTE, 4G, 5G, and WiFi, and information is provided via an antenna connected to the second communication unit. Sends and receives data to and from the processing device 1. Communication between the second communication unit 23 and the information processing device 1 is performed via, for example, an external network N1 such as a public line network or the Internet.

上述のように構成される学習モデル生成システム１００において、情報処理装置１は、取得した撮像画像に基づき、撮像画像データを含む複数のデータの組み合わせからなるユニットグループ値Ｘを生成する。情報処理装置１は、生成したユニットグループ値Ｘを含む訓練データセットを用いて後述の学習モデル１Ｍを生成する。 In the learning model generation system 100 configured as described above, the information processing device 1 generates a unit group value X composed of a combination of a plurality of data including the captured image data based on the acquired captured image. The information processing device 1 generates the learning model 1M described later using the training data set including the generated unit group value X.

学習モデルは、対象物を含む撮像画像の特徴量を抽出することで各種の情報を判定する。対象物を高精度に検出可能な撮像画像を用いた場合には高い精度の推定結果が得られる可能性が高い。一方で、例えば対象物が不明瞭である、対象物の一部が欠けている等の撮像画像を用いた場合には、低い精度で検出された対象物に関する情報に基づくことにより、学習モデルの推定精度が低くなる虞がある。本学習モデル生成システム１００では、対象物の検出精度を高めた複数のデータの組み合わせからなるユニットグループ値Ｘを用いて訓練データを生成することで、対象物に関する学習モデルの推定精度を向上させる。 The learning model determines various information by extracting the feature amount of the captured image including the object. When an captured image capable of detecting an object with high accuracy is used, it is highly possible that a highly accurate estimation result can be obtained. On the other hand, when a captured image is used, for example, the object is unclear or a part of the object is missing, the learning model is based on the information about the object detected with low accuracy. The estimation accuracy may be low. The learning model generation system 100 improves the estimation accuracy of the learning model for the object by generating training data using the unit group value X composed of a combination of a plurality of data whose detection accuracy of the object is improved.

図４は、ユニットグループ値Ｘの生成方法を示す概念図である。ユニットグループ値Ｘは、撮像画像に基づき、データユニット生成及びユニットグループ生成の処理を経て取得される。図４を用いて、時刻ｔにおける撮像画像に基づくユニットグループ値Ｘ（ｔ）の生成に関して、具体的に説明する。 FIG. 4 is a conceptual diagram showing a method of generating the unit group value X. The unit group value X is acquired through the processing of data unit generation and unit group generation based on the captured image. The generation of the unit group value X (t) based on the captured image at time t will be specifically described with reference to FIG.

情報処理装置１は、初めに、撮像画像に関連付けてグレースケール撮像画像及び対象物抽出画像を取得する。撮像画像は、撮像装置３１により撮影され制御ユニット２００を介して情報処理装置１で取得される。撮像画像は、動画像で得られ、例えば１秒間に６０フレーム等の所定のフレームレートに基づき取得された複数のフレームの静止画像から構成される。なお、撮像画像は所定間隔で取得された複数枚の静止画像の組であってもよい。 The information processing device 1 first acquires a grayscale captured image and an object extracted image in association with the captured image. The captured image is captured by the image pickup device 31 and acquired by the information processing device 1 via the control unit 200. The captured image is obtained as a moving image, and is composed of a still image of a plurality of frames acquired based on a predetermined frame rate such as 60 frames per second. The captured image may be a set of a plurality of still images acquired at predetermined intervals.

情報処理装置１は、ＲＧＢ（Red Green Blue）値を含む撮像画像から、グレースケールの画像に変換したグレースケール撮像画像を生成する。グレースケール撮像画像は、０から２５５の画素値を含むものであってもよく、０から１までの連続値に正規化された値を含むものであってもよい。なお撮像装置３１から取得する撮像画像そのものがグレースケール画像であってもよい。 The information processing device 1 generates a grayscale captured image converted into a grayscale image from the captured image including RGB (Red Green Blue) values. The grayscale captured image may include pixel values from 0 to 255, or may include values normalized to continuous values from 0 to 1. The captured image itself acquired from the imaging device 31 may be a grayscale image.

さらに、ＲＧＢ画像又はグレースケール画像いずれかの撮像画像に基づき、対象物抽出画像を生成する。第１実施形態では、対象物抽出画像とは、撮像画像から対象物を抽出した画像データであり、２値化画像である。撮像画像から抽出される対象物とは、例えば、走行路上の白線、ガードレール、中央分離帯、信号機、道路標識、車両、周辺の広告物、人等である。第１実施形態では、撮像画像は、移動体２の走行路を示す白線を撮影した画像であり、対象物抽出画像は、対象物として白線を抽出した２値の白線抽出画像である例を説明する。 Further, an object extraction image is generated based on the captured image of either the RGB image or the grayscale image. In the first embodiment, the object extraction image is image data obtained by extracting an object from a captured image, and is a binarized image. The objects extracted from the captured image are, for example, white lines on the road, guardrails, medians, traffic lights, road signs, vehicles, surrounding advertisements, people, and the like. In the first embodiment, the captured image is an image obtained by capturing a white line indicating the traveling path of the moving body 2, and the object extraction image is a binary white line extraction image obtained by extracting the white line as the object. do.

撮像画像から白線抽出画像を生成する方法は公知の方法を用いてよい。例えば、情報処理装置１は、機械学習モデルにより、ＬａｎｅＮｅｔ、Ｕ−Ｎｅｔ等のアルゴリズムを用いて対象物を抽出した画像を生成してよい。ＬａｎｅＮｅｔモデルは、画像のセグメンテーションを行う学習モデルであり、撮像画像に基づき対象物である白線を抽出した２値化画像を生成する。情報処理装置１は、撮像画像の各フレームについて上記の処理を行い、各撮像画像に対応する白線抽出画像を生成する。 A known method may be used as a method for generating a white line extracted image from the captured image. For example, the information processing device 1 may generate an image in which an object is extracted by using an algorithm such as LaneNet or U-Net by a machine learning model. The LaneNet model is a learning model that performs image segmentation, and generates a binarized image in which a white line, which is an object, is extracted based on an captured image. The information processing device 1 performs the above processing for each frame of the captured image to generate a white line extracted image corresponding to each captured image.

白線抽出画像は、例えばパターンマッチング等の手法を用いて生成してもよい。この場合、情報処理装置１は、初めに撮像画像から局所特徴量を検出し、当該検出した局所特徴量と予め保持している対象物の特徴量とをパターンマッチングすることにより、対象物を含む領域を特定し白線抽出画像を生成する。さらに、情報処理装置１は、生成された白線抽出画像に対し、撮像画像の輝度値に基づいて２値化する処理を実行し、２値化画像を生成するとよい。画像の２値化処理の方法は公知の方法を用いてよい。例えば、グレースケールに変換した白線抽出画像中の画素毎に、画素の輝度値（画素値）と所定の方法により算出された閾値とを比較する。情報処理装置１は、画素の画素値が閾値よりも大きい場合、該画素を黒画素と判定し、画素値を１にする。一方、画素値が閾値以下の場合、該画素を白画素と判定し、画素値を０にする。情報処理装置１は、各画素について上記の処理を行い、２値の白線抽出画像を生成する。 The white line extracted image may be generated by using a technique such as pattern matching. In this case, the information processing apparatus 1 first detects the local feature amount from the captured image, and pattern-matches the detected local feature amount with the feature amount of the object held in advance to include the object. The area is specified and a white line extraction image is generated. Further, the information processing apparatus 1 may execute a process of binarizing the generated white line extracted image based on the brightness value of the captured image to generate the binarized image. A known method may be used as the method of binarizing the image. For example, for each pixel in the white line extracted image converted to grayscale, the brightness value (pixel value) of the pixel is compared with the threshold value calculated by a predetermined method. When the pixel value of a pixel is larger than the threshold value, the information processing device 1 determines the pixel as a black pixel and sets the pixel value to 1. On the other hand, when the pixel value is equal to or less than the threshold value, the pixel is determined to be a white pixel and the pixel value is set to 0. The information processing device 1 performs the above processing for each pixel to generate a binary white line extraction image.

なお、上記では撮像画像から対象物抽出画像を生成する例を説明したが、対象物抽出画像は撮像画像から生成されるものに限定されない。ライダー等により対象物を抽出したセンサデータを取得した場合には、情報処理装置１は、取得したセンサデータを画像データに変換し対象物抽出画像を生成してもよい。 Although the example of generating the object extracted image from the captured image has been described above, the object extracted image is not limited to the one generated from the captured image. When the sensor data obtained by extracting the object by a rider or the like is acquired, the information processing device 1 may convert the acquired sensor data into image data to generate the object extracted image.

次に、情報処理装置１は、グレースケール撮像画像と複数の対象物抽出画像（白線抽出画像）とで構成されるデータユニットを生成する。図５は、データユニットの内容を説明する説明図である。時刻ｔにおける時刻ｔデータユニットは、時刻ｔにおけるグレースケール撮像画像と、時刻ｔ及び時刻ｔの前後夫々ｎ個の時刻における対象物抽出画像とを含む。すなわち、時刻ｔデータユニットには、時刻ｔのグレースケール撮像画像と、時刻ｔ、時刻ｔ-1、時刻ｔ+1、時刻ｔ-2、時刻ｔ+2、…、時刻ｔ-n、時刻ｔ+nの各白線抽出画像（合計２ｎ＋１個）とが含まれる。ｎは正の自然数である。図５の例では、ｎ＝１であり、時刻ｔデータユニットには、時刻ｔのグレースケール撮像画像と、時刻ｔ、時刻ｔ+1及び時刻ｔ-1の白線抽出画像とが含まれる。同様に、時刻ｔ-1データユニットには、時刻ｔ-1のグレースケール撮像画像と、時刻ｔ-2、時刻ｔ-1及び時刻ｔの白線抽出画像とが含まれる。時刻ｔ+1データユニットには、時刻ｔ+1のグレースケール撮像画像と、時刻ｔ、時刻ｔ+1及び時刻ｔ+2の白線抽出画像とが含まれる。本実施形態においてはｎは正の自然数として説明するが、ｎは０であってもよい。なお、結合白線抽出画像は、時刻ｔを中心に前後同数のｎ個づつの白線抽出画像を結合するものに限定されない。結合される白線抽出画像の時刻ｔの前後における個数、すなわち、過去側の個数と未来側の個数は異なるものであってもよい。 Next, the information processing device 1 generates a data unit composed of a grayscale captured image and a plurality of object extraction images (white line extraction images). FIG. 5 is an explanatory diagram illustrating the contents of the data unit. The time t data unit at time t includes a grayscale captured image at time t and an object extracted image at n times before and after time t and time t, respectively. That is, the time t data unit contains a grayscale image of time t, time t, time t-1, time t + 1, time t-2, time t + 2, ..., Time t-n, time t. Each + n white line extracted image (2n + 1 in total) is included. n is a positive natural number. In the example of FIG. 5, n = 1, and the time t data unit includes a grayscale captured image at time t and a white line extracted image at time t, time t + 1, and time t-1. Similarly, the time t-1 data unit includes a grayscale captured image at time t-1 and a white line extracted image at time t-2, time t-1 and time t. The time t + 1 data unit includes a grayscale image taken at time t + 1 and a white line extracted image at time t, time t + 1 and time t + 2. In this embodiment, n is described as a positive natural number, but n may be 0. The combined white line extraction image is not limited to the one in which n white line extraction images of the same number before and after are combined around the time t. The number of the white line extracted images to be combined before and after the time t, that is, the number on the past side and the number on the future side may be different.

各時刻におけるデータユニットからは、夫々１個のデータユニット値が生成される。１個のデータユニット値は、例えば、グレースケール撮像画像及び結合白線抽出画像夫々の画素値を一次元又は二次元配列形式に変換し連結して得られる１個のデータとして記憶され処理される。第１実施形態では、データユニット値は、グレースケール撮像画像の画像データと、２ｎ＋１個の白線抽出画像を結合した結合白線抽出画像の画像データとの２チャンネルデータの組み合わせからなる１個のマトリクスデータである。 One data unit value is generated from each data unit at each time. One data unit value is stored and processed as one data obtained by converting the pixel values of each of the grayscale captured image and the combined white line extracted image into a one-dimensional or two-dimensional array format and concatenating them. In the first embodiment, the data unit value is one matrix data composed of a combination of two channel data of the image data of the gray scale captured image and the image data of the combined white line extracted image obtained by combining 2n + 1 white line extracted images. Is.

図６は、白線抽出画像の結合方法の一例を説明する説明図である。情報処理装置１は、時刻ｔの白線抽出画像と、時刻ｔの前後夫々ｎ個の時刻における白線抽出画像夫々とを結合し結合白線抽出画像を生成する。図６の下部では、一例としてｎ＝１の場合における、時刻ｔ、時刻ｔ+1及び時刻ｔ-1の３個の白線抽出画像を結合した結合白線抽出画像を生成する例を説明する。各白線抽出画像は、例えば同じ画素数の画像データであり、各画素の画素値を夫々有する。情報処理装置１は、各白線抽出画像の対応する画素を結合し、１個の画像データを生成する。例えば、各白線抽出画像における同じ配列（画素番号）の画素が全て０である場合、結合白線抽出画像における該配列の画素値を０にする。各白線抽出画像における同じ配列の画素が０、０、１である場合、結合白線抽出画像における該配列の画素値を１にする。このように、前後の複数データの情報を結合することで、１個の対象物抽出画像では検出されなかった情報を補填したデータを生成する。なお、複数画像の結合方法は一例であり上記の例に限定されるものではない。 FIG. 6 is an explanatory diagram illustrating an example of a method of combining white line extracted images. The information processing device 1 combines the white line extracted image at time t and the white line extracted images at n times before and after time t to generate a combined white line extracted image. In the lower part of FIG. 6, as an example, an example of generating a combined white line extraction image in which three white line extraction images of time t, time t + 1 and time t-1 are combined in the case of n = 1 will be described. Each white line extracted image is, for example, image data having the same number of pixels, and has a pixel value of each pixel. The information processing device 1 combines the corresponding pixels of each white line extracted image to generate one image data. For example, when all the pixels of the same array (pixel number) in each white line extracted image are 0, the pixel value of the array in the combined white line extracted image is set to 0. When the pixels of the same array in each white line extracted image are 0, 0, 1, the pixel value of the array in the combined white line extracted image is set to 1. In this way, by combining the information of the plurality of data before and after, the data supplemented with the information not detected in one object extraction image is generated. The method of combining a plurality of images is an example and is not limited to the above example.

図４に戻り説明を続ける。情報処理装置１は、さらに上述のデータユニットの一又は複数個の組み合わせで構成される、１個のユニットグループを生成する。時刻ｔにおけるユニットグループは、時刻ｔ及び時刻ｔの前後夫々ｍ個の時刻におけるデータユニットで構成される。すなわち、時刻ｔにおけるユニットグループには、時刻ｔデータユニット、時刻ｔ-1データユニット、時刻ｔ+1データユニット、…、時刻ｔ-mデータユニット、及び時刻ｔ+mデータユニット（合計２ｍ＋１個）が含まれる。ｍは、０又は正の自然数である。なお、ユニットグループは、時刻ｔを中心に前後同数のｍ個づつのデータユニットを組み合わせたものに限定されない。組み合わせられるデータユニットの時刻ｔの前後における個数、すなわち、過去側の個数と未来側の個数は異なるものであってもよい。 The explanation will be continued by returning to FIG. The information processing device 1 further generates one unit group composed of one or a plurality of combinations of the above-mentioned data units. The unit group at time t is composed of data units at time t and m times before and after time t. That is, the unit groups at time t include time t data unit, time t-1 data unit, time t + 1 data unit, ..., Time t-m data unit, and time t + m data unit (total 2 m + 1). Is included. m is 0 or a positive natural number. The unit group is not limited to a combination of m data units of the same number before and after the time t. The number of data units to be combined before and after the time t, that is, the number on the past side and the number on the future side may be different.

上述のように生成された時刻ｔにおけるユニットグループから、１個のユニットグループ値Ｘが生成される。１個のユニットグループ値Ｘは、２ｍ＋１個のデータユニット値を組み合わせて得られる値であり、例えば、２ｍ＋１個のグレースケール撮像画像及び結合白線抽出画像夫々の画素値を一次元又は二次元配列形式に変換し連結して得られる１個のデータとして記憶され処理される。第１実施形態では、ユニットグループ値Ｘは、２ｍ＋１個のグレースケール撮像画像の画像データと、２ｍ＋１個の結合白線抽出画像の画像データとの４ｍ＋２チャンネルの組み合わせからなる、１個のマトリクスデータである。 One unit group value X is generated from the unit group at the time t generated as described above. One unit group value X is a value obtained by combining 2m + 1 data unit values. For example, pixel values of 2m + 1 grayscale captured images and combined white line extracted images are set in a one-dimensional or two-dimensional array format. It is stored and processed as one data obtained by converting to and concatenating. In the first embodiment, the unit group value X is one matrix data composed of a combination of 4m + 2 channels of the image data of 2m + 1 grayscale captured images and the image data of 2m + 1 combined white line extracted images. ..

なお、上記ではデータユニット値及びユニットグループ値Ｘは夫々、グレースケールの撮像画像データと白線抽出画像データとを組み合わせたものである例を説明したが、データユニット値及びユニットグループ値Ｘに含まれるデータは上記の例に限定されるものではない。データユニット値及びユニットグループ値Ｘは夫々、ＲＧＢ値を含む撮像画像データと白線抽出画像データとを組み合わせたものであってもよい。 In the above description, the data unit value and the unit group value X are each a combination of the grayscale captured image data and the white line extracted image data, but they are included in the data unit value and the unit group value X. The data is not limited to the above example. The data unit value and the unit group value X may be a combination of the captured image data including the RGB values and the white line extracted image data, respectively.

上記のユニットグループ値Ｘを用いて、情報処理装置１は学習モデル１Ｍを生成する。図７は、学習モデル１Ｍの構成を説明する説明図である。学習モデル１Ｍは、ニューラルネットワークを用いた深層学習によって、生成され、学習される。学習モデル１Ｍは、例えばＣＮＮ（Convolution Neural Network）である。図７に示す例では、学習モデル１Ｍは、撮像画像に基づくユニットグループ値Ｘを入力する入力層と、対象物に関する情報を出力する出力層と、画像データの特徴量を抽出する中間層（隠れ層）とを備える。 Using the unit group value X described above, the information processing device 1 generates a learning model 1M. FIG. 7 is an explanatory diagram illustrating the configuration of the learning model 1M. The learning model 1M is generated and trained by deep learning using a neural network. The learning model 1M is, for example, a CNN (Convolution Neural Network). In the example shown in FIG. 7, the learning model 1M has an input layer for inputting a unit group value X based on a captured image, an output layer for outputting information about an object, and an intermediate layer (hidden) for extracting feature amounts of image data. Layer) and.

制御部１０は、撮像画像に対し、上述のようにグレースケール撮像画像及び対象物抽出画像を生成し、これらを組み合わせてなるデータユニット及びユニットグループに基づきユニットグループ値Ｘを生成する前処理を実行する。第１実施形態における学習モデル１Ｍとは、狭義の学習モデル１Ｍと、上述の前処理を含めた広義の学習モデルモジュールとを含む。 The control unit 10 generates a grayscale captured image and an object extracted image with respect to the captured image as described above, and executes preprocessing to generate a unit group value X based on a data unit and a unit group formed by combining these. do. The learning model 1M in the first embodiment includes a learning model 1M in a narrow sense and a learning model module in a broad sense including the above-mentioned preprocessing.

学習モデル１Ｍの入力層へ入力される入力データは、ユニットグループ値Ｘである。第１実施形態では、ユニットグループ値Ｘは、移動体２の走行路を示す白線を撮影したグレースケール撮像画像データと、対象物である白線を抽出した白線抽出画像データとの２チャンネルのデータを含む。白線抽出画像は、撮像画像に掛け合わせられるマスク画像としての機能を有する。 The input data input to the input layer of the learning model 1M is the unit group value X. In the first embodiment, the unit group value X is two-channel data of grayscale captured image data obtained by photographing a white line indicating a traveling path of a moving body 2 and white line extracted image data obtained by extracting a white line which is an object. include. The white line extracted image has a function as a mask image to be multiplied by the captured image.

中間層は、例えば、畳み込み層、プーリング層及び全結合層により構成される。畳み込み層及びプーリング層は交互に複数設けられてもよい。畳み込み層及びプーリング層は、各層のチャネルを用いた演算によって、入力層を通じて入力される撮像画像データ及び白線抽出画像データの特徴を抽出する。全結合層は、畳み込み層及びプーリング層によって特徴部分が抽出されたデータを１つのノードに結合し、活性化関数によって変換された特徴量を出力する。特徴量は、全結合層を通じて出力層へ出力される。 The intermediate layer is composed of, for example, a convolution layer, a pooling layer and a fully connected layer. A plurality of convolution layers and pooling layers may be provided alternately. The convolution layer and the pooling layer extract the features of the captured image data and the white line extracted image data input through the input layer by the calculation using the channels of each layer. The fully connected layer combines the data whose feature portions are extracted by the convolution layer and the pooling layer into one node, and outputs the feature amount converted by the activation function. The feature quantity is output to the output layer through the fully connected layer.

学習モデル１Ｍの出力層から出力される出力データは、対象物に関する情報である。第１実施形態では、対象物に関する情報として、移動体２が所在する走行路上の位置を示す、移動体２の所在車線を出力する。出力層は、設定されている所在車線に各々対応するチャネルを含み、各所在車線に対する確度をスコアとして出力する。情報処理装置１は、スコアが最も高い所在車線、あるいはスコアが閾値以上である所在車線を出力層の出力値とすることができる。なお出力層は、それぞれの所在車線の確度を出力する複数の出力チャネルを有する代わりに、最も確度の高い所在車線を出力する１個の出力チャネルを有してもよい。このように、学習モデル１Ｍは、ユニットグループ値Ｘが入力された場合に、対象物に関する情報を出力する。 The output data output from the output layer of the learning model 1M is information about the object. In the first embodiment, as information about the object, the location lane of the moving body 2 indicating the position on the traveling path where the moving body 2 is located is output. The output layer includes channels corresponding to the set location lanes, and outputs the accuracy for each location lane as a score. The information processing device 1 can use the location lane having the highest score or the location lane having a score equal to or higher than the threshold value as the output value of the output layer. The output layer may have one output channel that outputs the most accurate location lane instead of having a plurality of output channels that output the accuracy of each location lane. In this way, the learning model 1M outputs information about the object when the unit group value X is input.

上記の学習モデル１Ｍは、上述したｎ及びｍの値に応じて異なる学習モデル１Ｍが用意される。すなわち、学習モデル１Ｍに入力するユニットグループ値Ｘを生成するための、結合白線抽出画像に含める白線抽出画像の枚数及び組み合わせるデータユニット値の個数に応じて、異なる学習モデル１Ｍが生成される。各学習モデル１Ｍの構成は、同じ構成を有するものであってもよく、中間層の層数が異なる構成であってもよい。情報処理装置１は、例えばユーザの入力を受け付ける等によりｎ及びｍの値を決定する。情報処理装置１は、決定したｎ及びｍの組み合わせにより特定されるグループ毎に撮像画像を分別し、該グループ別に学習モデル１Ｍを生成する。図８は、ｎ及びｍに対応するグループＩＤをマトリクス状に記憶したテーブル例を示す図である。例えば、ｎ＝１且つｍ＝０の場合はグループ１、ｎ＝１且つｍ＝１の場合はグループ２、…が記憶される。情報処理装置１は、グループＩＤ別に異なる訓練データを用いて学習モデル１Ｍを生成する。 As the learning model 1M described above, different learning models 1M are prepared according to the values of n and m described above. That is, different learning models 1M are generated according to the number of white line extracted images included in the combined white line extracted image and the number of data unit values to be combined to generate the unit group value X to be input to the learning model 1M. The configuration of each learning model 1M may have the same configuration or may have a configuration in which the number of layers of the intermediate layer is different. The information processing device 1 determines the values of n and m, for example, by accepting a user's input. The information processing device 1 separates captured images for each group specified by the determined combination of n and m, and generates a learning model 1M for each group. FIG. 8 is a diagram showing an example of a table in which group IDs corresponding to n and m are stored in a matrix. For example, group 1 is stored when n = 1 and m = 0, group 2, ... Is stored when n = 1 and m = 1. The information processing device 1 generates a learning model 1M using different training data for each group ID.

上記では学習モデル１ＭがＣＮＮであるものとして説明したが、学習モデル１ＭはＣＮＮに限定されるものではない。時系列データを取得した場合にはＣＮＮ以外のニューラルネットワーク、例えばリカレントニューラルネットワーク（ＲＮＮ：Recurrent Neural Network）、ＬＳＴＭ（Long Short Term Memory）ネットワークを用いてもよい。またニューラルネットワークを用いないサポートベクタマシン、回帰木等、他のアルゴリズムによって学習されたモデルであってもよい。 Although the learning model 1M has been described above as being a CNN, the learning model 1M is not limited to the CNN. When time series data is acquired, a neural network other than CNN, for example, a recurrent neural network (RNN: Recurrent Neural Network) or an LSTM (Long Short Term Memory) network may be used. Further, the model may be a model learned by another algorithm such as a support vector machine that does not use a neural network or a regression tree.

なお、学習モデル１Ｍの内容例が図７に示した例に限られないことは勿論である。学習モデル１Ｍは、撮像画像に含まれる対象物に応じて適宜入力情報に応じた出力情報を出力するように学習されるとよい。例えば、学習モデル１Ｍは、対象物である広告に応じて広告の分類又は広告数を出力としてもよい。学習モデル１Ｍは、対象物である人に応じて人数を出力としてもよい。学習モデル１Ｍは、対象物である車両に応じて車両数を出力としてもよい。 Needless to say, the content example of the learning model 1M is not limited to the example shown in FIG. The learning model 1M may be trained to output output information corresponding to the input information as appropriate according to the object included in the captured image. For example, the learning model 1M may output the classification of advertisements or the number of advertisements according to the advertisements that are the objects. The learning model 1M may output the number of people according to the person who is the object. The learning model 1M may output the number of vehicles according to the vehicle that is the object.

図９は、学習モデル１Ｍの生成処理手順の一例を示すフローチャートである。以下の処理は、情報処理装置１の記憶部１１に記憶してあるプログラム１Ｐに従って制御部１０によって実行される。処理の実行タイミングは、例えば定期的なタイミングであってもよく、撮像装置３１により新たな撮像画像が記録され制御ユニット２００から送信されたタイミングであってもよい。 FIG. 9 is a flowchart showing an example of the generation processing procedure of the learning model 1M. The following processing is executed by the control unit 10 according to the program 1P stored in the storage unit 11 of the information processing device 1. The execution timing of the process may be, for example, a periodic timing, or may be a timing at which a new captured image is recorded by the imaging device 31 and transmitted from the control unit 200.

情報処理装置１の制御部１０は、制御ユニット２００から撮像画像を取得し（ステップＳ１１）、取得した撮像画像を一時的に記憶部１１に記憶する。撮像画像は、撮像装置３１により撮影し記録された移動体２の外部を撮影した画像であり、移動体２の走行路を示す白線が含まれる。 The control unit 10 of the information processing device 1 acquires an captured image from the control unit 200 (step S11), and temporarily stores the acquired image in the storage unit 11. The captured image is an image of the outside of the moving body 2 photographed and recorded by the imaging device 31, and includes a white line indicating a traveling path of the moving body 2.

制御部１０は、記憶部１１にアクセスし、撮像画像を取得する。制御部１０は、撮像画像をグレースケールに変換し、グレースケール撮像画像を取得する（ステップＳ１２）。なお、制御部１０は、制御ユニット２００からグレースケールの撮像画像を取得してもよい。制御部１０は、例えばＬａｎｅＮｅｔ等の機械学習モデルを用いて、撮像画像から対象物の抽出を行い、対象物抽出画像である白線抽出画像を取得する（ステップＳ１３）。白線抽出画像は、２値化画像である。制御部１０は、グレースケール撮像画像と白線抽出画像とを関連付けて記憶する。制御部１０は撮像画像に含まれる全フレームに対して上記の処理を実行する。 The control unit 10 accesses the storage unit 11 and acquires an captured image. The control unit 10 converts the captured image into grayscale and acquires the grayscale captured image (step S12). The control unit 10 may acquire a grayscale captured image from the control unit 200. The control unit 10 extracts an object from the captured image by using a machine learning model such as LaneNet, and acquires a white line extracted image which is an object extracted image (step S13). The white line extracted image is a binarized image. The control unit 10 stores the grayscale captured image and the white line extracted image in association with each other. The control unit 10 executes the above processing for all frames included in the captured image.

なお、制御部１０は、ライダー等の測距センサによるセンサデータを取得した場合には、撮像画像に基づく白線抽出画像の生成に代えて、センサデータにより白線抽出画像を生成してよい。この場合において、撮像画像及びセンサデータは夫々取得時点に関する情報が付随しており、制御部１０は、撮像画像に同一時点のセンサデータを対応付けて取得するとよい。 When the control unit 10 acquires sensor data from a distance measuring sensor such as a rider, the control unit 10 may generate a white line extracted image from the sensor data instead of generating a white line extracted image based on the captured image. In this case, the captured image and the sensor data are each accompanied by information regarding the acquisition time point, and the control unit 10 may acquire the captured image in association with the sensor data at the same time point.

制御部１０は、取得したグレースケール撮像画像及び白線抽出画像に基づき、ユニットグループ値Ｘを取得する（ステップＳ１４）。図１０は、ユニットグループ値Ｘ取得の詳細な手順の一例を示すフローチャートである。図１０のフローチャートに示す処理手順は、図９のフローチャートにおけるステップＳ１４の詳細に対応する。 The control unit 10 acquires the unit group value X based on the acquired grayscale image and the white line extracted image (step S14). FIG. 10 is a flowchart showing an example of a detailed procedure for acquiring the unit group value X. The processing procedure shown in the flowchart of FIG. 10 corresponds to the details of step S14 in the flowchart of FIG.

制御部１０は、例えば操作部１３によりユーザの入力を受け付ける等により、ユニットグループ値Ｘを生成するためのｎ及びｍの値を決定する（ステップＳ１４１）。制御部１０は、決定したｎの値に基づき、時刻ｔにおけるグレースケール撮像画像と、時刻ｔ及び時刻ｔの前後ｎ個の時刻、すなわち時刻ｔ-nから時刻ｔ+nまでの各時刻における白線抽出画像（合計２ｎ＋１個）とで構成される時刻ｔデータユニットを生成する（ステップＳ１４２）。制御部１０は、生成した時刻ｔデータユニットにおけるグレースケール撮像画像データと、各時刻の白線抽出画像に基づく結合白線抽出画像データとを組み合わせたデータユニット値を取得する（ステップＳ１４３）。 The control unit 10 determines the values of n and m for generating the unit group value X, for example, by receiving the input of the user by the operation unit 13 (step S141). Based on the determined value of n, the control unit 10 has a gray scale image taken at time t and n times before and after time t and time t, that is, white lines at each time from time t-n to time t + n. A time t data unit composed of extracted images (2n + 1 in total) is generated (step S142). The control unit 10 acquires a data unit value that combines the grayscale captured image data in the generated time t data unit and the combined white line extracted image data based on the white line extracted image at each time (step S143).

制御部１０は、決定したｍの値に基づき、時刻ｔ及び時刻ｔの前後ｍ個の時刻、すなわち時刻ｔ-mから時刻ｔ+mまでの各時刻におけるデータユニット（合計２ｍ＋１個）で構成されるユニットグループを生成する（ステップＳ１４４）。制御部１０は、生成したユニットグループの各時刻におけるデータユニット値を組み合わせたユニットグループ値Ｘを取得し（ステップＳ１４５）、図９のフローチャートにおけるステップＳ１５へ処理を戻す。 The control unit 10 is composed of time t and m time before and after time t, that is, data units (total 2 m + 1) at each time from time tm to time t + m based on the determined value of m. Unit group is generated (step S144). The control unit 10 acquires a unit group value X that combines the data unit values at each time of the generated unit group (step S145), and returns the process to step S15 in the flowchart of FIG.

図９に戻り説明を続ける。制御部１０は、取得したユニットグループ値Ｘに、対象物に関する情報（例えば移動体２の所在車線）をラベル付けした訓練データセットを生成する（ステップＳ１５）。制御部１０は、撮像画像の各フレームに対し上述の処理を行い、ユニットグループ値Ｘに所在車線を夫々ラベル付けした複数の訓練データセットを生成する。制御部１０は、大量の撮像画像に基づくユニットグループ値Ｘとラベルデータとを収集し、収集したデータを訓練データセットとして不図示のデータベースに記憶する。この場合において、ユニットグループ値Ｘは、ｎ及びｍの値を付随させて記憶される。 The explanation will be continued by returning to FIG. The control unit 10 generates a training data set in which the acquired unit group value X is labeled with information about the object (for example, the location lane of the moving body 2) (step S15). The control unit 10 performs the above-mentioned processing for each frame of the captured image, and generates a plurality of training data sets in which the location lanes are labeled with the unit group value X. The control unit 10 collects the unit group value X based on a large amount of captured images and the label data, and stores the collected data as a training data set in a database (not shown). In this case, the unit group value X is stored with the values of n and m attached.

制御部１０は、ユニットグループ値Ｘに付随するｎ及びｍの値に基づき、図８に示したテーブルを参照して、ｎ及びｍの値の組み合わせにより特定されるグループＩＤを取得し、ユニットグループ値Ｘをグループに分別する（ステップＳ１６）。 The control unit 10 refers to the table shown in FIG. 8 based on the values of n and m associated with the unit group value X, acquires the group ID specified by the combination of the values of n and m, and acquires the unit group. The value X is sorted into groups (step S16).

制御部１０は、生成した訓練データセットを用いて、時刻ｔにおけるユニットグループ値Ｘを入力した場合に、時刻ｔにおける所在車線を出力する学習モデル１Ｍを生成する（ステップＳ１７）。具体的には、制御部１０は、データベースにアクセスし、学習モデル１Ｍの生成に用いる１組の訓練データセットを取得する。訓練データセットは、ユニットグループ値Ｘと所在車線とを含む。制御部１０は、ユニットグループ値Ｘを学習モデル１Ｍの入力層に入力する。この場合において、制御部１０は、ユニットグループの構成に応じたグループＩＤ別に異なる学習モデル１Ｍ、１Ｍ、１Ｍ…を生成するため、分別したユニットグループ値Ｘのグループ別に対応する学習モデル１Ｍに訓練データを入力する。 Using the generated training data set, the control unit 10 generates a learning model 1M that outputs the location lane at time t when the unit group value X at time t is input (step S17). Specifically, the control unit 10 accesses the database and acquires a set of training data sets used for generating the learning model 1M. The training dataset includes the unit group value X and the location lane. The control unit 10 inputs the unit group value X to the input layer of the learning model 1M. In this case, since the control unit 10 generates different learning models 1M, 1M, 1M, etc. for each group ID according to the configuration of the unit group, the training data is stored in the learning model 1M corresponding to each group of the separated unit group value X. Enter.

制御部１０は、所在車線の予測値を出力層から取得する。学習が開始される前の段階では、学習モデル１Ｍを記述する定義情報には、初期設定値が与えられているものとする。制御部１０は、例えば誤差逆伝播法を用いて、所在車線の予測値と正解値である所在車線とを比較し、差分が小さくなるように中間層におけるパラメータ及び重み等を学習する。差分の大きさ、学習回数が所定基準を満たすことによって学習が完了すると、最適化されたパラメータが得られる。制御部１０は、各グループ別に生成した学習モデル１Ｍ、１Ｍ、１Ｍ…を記憶部１１に格納し、一連の処理を終了する。 The control unit 10 acquires the predicted value of the location lane from the output layer. At the stage before the start of learning, it is assumed that the definition information describing the learning model 1M is given an initial setting value. The control unit 10 compares the predicted value of the location lane with the location lane which is the correct answer value by using, for example, the back-propagation method, and learns the parameters and weights in the intermediate layer so that the difference becomes small. When the learning is completed when the magnitude of the difference and the number of learnings satisfy the predetermined criteria, the optimized parameters are obtained. The control unit 10 stores the learning models 1M, 1M, 1M ... Generated for each group in the storage unit 11, and ends a series of processes.

本実施形態によれば、撮像画像から生成される複数のデータを組み合わせたユニットグループ値Ｘを含む訓練データセットを用いて学習モデル１Ｍが生成される。学習モデル１Ｍは、マスク画像となる対象物の抽出精度を高めた対象物抽出画像と、撮像画像とを含む入力データを用いることにより、１フレームの画像をそのまま使用する場合に比べて精度の高い推定処理が可能となる。 According to this embodiment, the learning model 1M is generated using the training data set including the unit group value X that combines a plurality of data generated from the captured image. The learning model 1M has higher accuracy than the case where one frame image is used as it is by using the input data including the object extraction image in which the extraction accuracy of the object to be the mask image is improved and the captured image. Estimation processing becomes possible.

学習モデル１Ｍは、データの組み合わせ内容に応じて複数の学習モデル１Ｍが用意される。各学習モデル１Ｍは、個々に固定されたｎ及びｍの値に基づくユニットグループ値Ｘを含む訓練データにより学習することができ、検出精度を高めることが可能となる。 As the learning model 1M, a plurality of learning models 1M are prepared according to the combination contents of data. Each learning model 1M can be trained by training data including a unit group value X based on individually fixed values of n and m, and it is possible to improve the detection accuracy.

（第２実施形態）
第２実施形態では、撮像画像に応じた各種の情報に基づき、ｎ及びｍの値が決定される。以下では、第２実施形態について、第１実施形態と異なる点を説明する。後述する構成を除く他の構成については第１実施形態の学習モデル生成システム１００と同様であるので、共通する構成については同一の符号を付してその詳細な説明を省略する。第２実施形態における情報処理装置１は、制御ユニット２００から、移動体の速度及び撮像画像のフレームレート等を含む付加情報と、撮像装置３１により撮影された動画とを受信する。 (Second Embodiment)
In the second embodiment, the values of n and m are determined based on various information according to the captured image. Hereinafter, the differences between the second embodiment and the first embodiment will be described. Since the other configurations other than the configurations described later are the same as those of the learning model generation system 100 of the first embodiment, the common configurations are designated by the same reference numerals and detailed description thereof will be omitted. The information processing device 1 in the second embodiment receives additional information including the speed of the moving body and the frame rate of the captured image from the control unit 200, and the moving image captured by the imaging device 31.

第２実施形態におけるｎの値は、対象物抽出画像における対象物の検出精度に基づき決定される。検出精度は、例えばＬａｎｅＮｅｔから得られる出力精度、又は対象物抽出画像におけるパターンマッチングの相関値等により導出されてよい。検出精度は、例えば精度が高い順に高/中/低等の複数の段階に分別される。情報処理装置１は、予め検出精度の各段階とｎの値とを関連付けた不図示のテーブルを記憶している。対象物の検出精度が低い場合にはｎの値は大きくなるよう設定される。すなわち、対象物の検出精度が低い場合には、より多くの前後の対象物抽出画像を使用することで学習モデルの検出精度を向上させる。一方、対象物の検出精度が高い場合にはｎの値は小さくなるよう設定される。対象物の検出精度が高い場合には、少ない数の対象物抽出画像であっても学習モデルから高い検出精度が得られるからである。なお、検出精度の導出方法は上記の例に限定されるものでない。検出精度は、その他対象物抽出画像と該対象物抽出画像における対象物の検出精度とを学習した機械学習モデル等を用いて推定されてもよい。 The value of n in the second embodiment is determined based on the detection accuracy of the object in the object extracted image. The detection accuracy may be derived from, for example, the output accuracy obtained from LaneNet, the correlation value of pattern matching in the object extraction image, or the like. The detection accuracy is classified into a plurality of stages such as high / medium / low in descending order of accuracy, for example. The information processing device 1 stores in advance a table (not shown) in which each stage of detection accuracy and a value of n are associated with each other. When the detection accuracy of the object is low, the value of n is set to be large. That is, when the detection accuracy of the object is low, the detection accuracy of the learning model is improved by using more front and rear object extraction images. On the other hand, when the detection accuracy of the object is high, the value of n is set to be small. This is because when the detection accuracy of the object is high, the high detection accuracy can be obtained from the learning model even if the number of objects extracted is small. The method for deriving the detection accuracy is not limited to the above example. The detection accuracy may be estimated by using a machine learning model or the like that has learned the object extraction image and the detection accuracy of the object in the object extraction image.

第２実施形態におけるｍの値は、撮像画像の１フレームの移動距離に基づき決定される。１フレームの移動距離は、移動体２の速度を撮像画像のフレームレートで除算することで得られ、例えば大/中/小等の複数の段階に分別される。情報処理装置１は、予め移動距離とｍの値とを関連付けた不図示のテーブルを記憶している。移動距離が大きい程、ｍの値は大きくなるよう設定される。速度が速い程移動距離は大きくなるため、ｍの値を大きく、すなわちより多くのデータユニットを組み合わせるよう設定することで、学習モデル１Ｍの検出精度を向上させる。フレームレートが低い程移動距離は大きくなるため、ｍの値は大きくなるよう設定される。なお、移動距離が所定の閾値を超える場合には、連続するフレームが大きく異なる画像である虞があるため、ｍの値は小さく、例えば０に設定されるとよい。 The value of m in the second embodiment is determined based on the moving distance of one frame of the captured image. The moving distance of one frame is obtained by dividing the speed of the moving body 2 by the frame rate of the captured image, and is classified into a plurality of stages such as large / medium / small. The information processing device 1 stores a table (not shown) in which the moving distance and the value of m are associated with each other in advance. The larger the moving distance, the larger the value of m is set. Since the moving distance increases as the speed increases, the detection accuracy of the learning model 1M is improved by setting the value of m to be large, that is, to combine more data units. Since the moving distance increases as the frame rate decreases, the value of m is set to increase. If the moving distance exceeds a predetermined threshold value, the continuous frames may be images that are significantly different. Therefore, the value of m is small, for example, it may be set to 0.

図１１は、第２実施形態における学習モデル１Ｍの生成処理手順の一例を示すフローチャートである。以下の処理は、情報処理装置１の記憶部１１に記憶してあるプログラム１Ｐに従って制御部１０によって実行される。第１実施形態における図９と共通する処理については同一のステップ番号を付してその詳細な説明を省略する。 FIG. 11 is a flowchart showing an example of the generation processing procedure of the learning model 1M in the second embodiment. The following processing is executed by the control unit 10 according to the program 1P stored in the storage unit 11 of the information processing device 1. The same step numbers are assigned to the processes common to those in FIG. 9 in the first embodiment, and detailed description thereof will be omitted.

情報処理装置１の制御部１０は、制御ユニット２００から撮像画像及び付加情報を取得し（ステップＳ２１）、取得した撮像画像及び付加情報を一時的に記憶部１１に記憶する。撮像画像には、移動体２の走行路を示す白線が含まれる。付加情報は、例えば移動体２の速度の履歴データ及び撮像画像のフレームレート等が含まれる。付加情報には、時刻情報が対応付けられている。なお、速度データは、移動体２のスピードメータから取得してもよく、移動体２のＧＰＳ（Global Positioning System）データ等の位置情報に基づき取得してもよい。 The control unit 10 of the information processing device 1 acquires an captured image and additional information from the control unit 200 (step S21), and temporarily stores the acquired captured image and additional information in the storage unit 11. The captured image includes a white line indicating the traveling path of the moving body 2. The additional information includes, for example, historical data of the speed of the moving body 2, the frame rate of the captured image, and the like. Time information is associated with the additional information. The speed data may be acquired from the speedometer of the moving body 2 or may be acquired based on position information such as GPS (Global Positioning System) data of the moving body 2.

制御部１０は、撮像画像から、グレースケール撮像画像を取得し（ステップＳ１２）、対象物である白線を抽出した２値の対象物抽出画像（白線抽出画像）を取得する（ステップＳ１３）。制御部１０は、取得したグレースケール撮像画像及び白線抽出画像に基づき、ユニットグループ値Ｘを取得する（ステップＳ１４）。図１２は、第２実施形態におけるユニットグループ値Ｘ取得の詳細な手順の一例を示すフローチャートである。図１２のフローチャートに示す処理手順は、図１１のフローチャートにおけるステップＳ１４の詳細に対応する。第１実施形態における図１０と共通する処理については同一のステップ番号を付してその詳細な説明を省略する。 The control unit 10 acquires a grayscale captured image from the captured image (step S12), and acquires a binary object extracted image (white line extracted image) from which the white line that is the object is extracted (step S13). The control unit 10 acquires the unit group value X based on the acquired grayscale image and the white line extracted image (step S14). FIG. 12 is a flowchart showing an example of a detailed procedure for acquiring the unit group value X in the second embodiment. The processing procedure shown in the flowchart of FIG. 12 corresponds to the details of step S14 in the flowchart of FIG. The same step numbers are assigned to the processes common to those in FIG. 10 in the first embodiment, and detailed description thereof will be omitted.

制御部１０は、例えばＬａｎｅＮｅｔの検出精度を示すスコア等に基づき、時刻ｔにおける白線抽出画像に対応する対象物の検出精度を取得する（ステップＳ２４１）。ついで、制御部１０は、付加情報として取得した移動体２の速度及び撮像画像のフレームレートに基づき、時刻ｔにおける移動距離を取得する（ステップＳ２４２）。制御部１０は、不図示のテーブルを夫々参照して、取得した検出精度及び移動距離に基づき、時刻ｔデータユニットを生成するためのｎ及びｍの値を決定する（ステップＳ１４１）。 The control unit 10 acquires the detection accuracy of the object corresponding to the white line extracted image at time t based on, for example, a score indicating the detection accuracy of LaneNet (step S241). Next, the control unit 10 acquires the moving distance at time t based on the speed of the moving body 2 acquired as additional information and the frame rate of the captured image (step S242). The control unit 10 refers to each of the tables (not shown) and determines the values of n and m for generating the time t data unit based on the acquired detection accuracy and the movement distance (step S141).

制御部１０は、特定したｎの値に基づき、時刻ｔにおけるグレースケール撮像画像と白線抽出画像（合計２ｎ＋１個）とで構成される時刻ｔデータユニットを生成する（ステップＳ１４２）。制御部１０は、生成した時刻ｔデータユニットの結合白線抽出画像データと、グレースケール撮像画像データとを組み合わせたデータユニット値を取得する（ステップＳ１４３）。 The control unit 10 generates a time t data unit composed of a grayscale captured image and a white line extracted image (total 2n + 1) at time t based on the specified value of n (step S142). The control unit 10 acquires a data unit value in which the combined white line extraction image data of the generated time t data unit and the grayscale captured image data are combined (step S143).

制御部１０は、特定したｍの値に基づき、時刻ｔ及び時刻ｔの前後ｍ個の時刻におけるデータユニット（合計２ｍ＋１個）で構成されるユニットグループを生成する（ステップＳ１４４）。制御部１０は、生成したユニットグループのデータユニット値を組み合わせたユニットグループ値Ｘを取得し（ステップＳ１４５）、図１１のフローチャートにおけるステップＳ１５へ処理を戻す。 Based on the specified value of m, the control unit 10 generates a unit group composed of data units (total 2m + 1) at time t and m times before and after time t (step S144). The control unit 10 acquires the unit group value X in which the generated unit group data unit values are combined (step S145), and returns the process to step S15 in the flowchart of FIG.

図１１に戻り説明を続ける。制御部１０は、取得したユニットグループ値Ｘに、対象物に関する情報をラベル付けした訓練データセットを生成する（ステップＳ１５）。制御部１０は、ユニットグループ値Ｘに付随するｎ及びｍの値に基づき、図８に示したテーブルを参照して、ｎ及びｍの値の組み合わせにより特定されるグループＩＤを取得し、ユニットグループ値Ｘをグループに分別する（ステップＳ１６）。 The explanation will be continued by returning to FIG. The control unit 10 generates a training data set in which the acquired unit group value X is labeled with information about the object (step S15). The control unit 10 refers to the table shown in FIG. 8 based on the values of n and m associated with the unit group value X, acquires the group ID specified by the combination of the values of n and m, and acquires the unit group. The value X is sorted into groups (step S16).

制御部１０は、生成した訓練データセットを用いて、時刻ｔにおけるユニットグループ値Ｘを入力した場合に、時刻ｔにおける所在車線を出力する学習モデル１Ｍを生成する（ステップＳ１７）。制御部１０は、各グループ別に生成した学習モデル１Ｍを記憶部１１に格納し、一連の処理を終了する。 Using the generated training data set, the control unit 10 generates a learning model 1M that outputs the location lane at time t when the unit group value X at time t is input (step S17). The control unit 10 stores the learning model 1M generated for each group in the storage unit 11, and ends a series of processes.

なお、上記の各実施形態ではｎ及びｍに基づきユニットグループ値Ｘがグループに分別される例を説明したが、グループ分別はｎ及びｍに基づくものに限定されるものではない。ユニットグループ値Ｘは、ｎ又はｍの一方、すなわち、例えば検出精度又は移動速度のいずれか一方に基づきグループに分別されてもよい。 In each of the above embodiments, an example in which the unit group value X is classified into groups based on n and m has been described, but the group classification is not limited to those based on n and m. The unit group value X may be divided into groups based on either n or m, that is, for example, detection accuracy or movement speed.

本実施形態によれば、撮像画像に応じた検出精度及び移動距離に応じてユニットグループを構成するデータの組み合わせ内容が決定される。撮像画像に含まれる対象物の状態に応じた学習モデル１Ｍを生成することにより、より精度の高い推定処理が可能となる。 According to the present embodiment, the combination content of the data constituting the unit group is determined according to the detection accuracy according to the captured image and the moving distance. By generating the learning model 1M according to the state of the object included in the captured image, more accurate estimation processing becomes possible.

（第３実施形態）
第３実施形態では、学習モデルを用いて推定した対象物に関する情報を提供する。図１３は、第３実施形態における推定システム１１０の構成例を示すブロック図である。以下では、第３実施形態について、第１実施形態と異なる点を説明する。後述する構成を除く他の構成については第１実施形態の学習モデル生成システム１００と同様であるので、共通する構成については同一の符号を付してその詳細な説明を省略する。推定システム１１０は、移動体２の制御ユニット２００及び撮像装置３１を含む。 (Third Embodiment)
In the third embodiment, information about the object estimated by using the learning model is provided. FIG. 13 is a block diagram showing a configuration example of the estimation system 110 according to the third embodiment. Hereinafter, the differences between the third embodiment and the first embodiment will be described. Since the other configurations other than the configurations described later are the same as those of the learning model generation system 100 of the first embodiment, the common configurations are designated by the same reference numerals and detailed description thereof will be omitted. The estimation system 110 includes a control unit 200 of the mobile body 2 and an image pickup device 31.

第３実施形態の制御ユニット２００は、記憶部２１に、プログラム２Ｐ、複数の学習モデル１Ｍを含む制御部２０が参照するプログラム及びデータを記憶する。記憶部２１に記憶されるプログラム２Ｐは、記録媒体にコンピュータ読み取り可能に記録されている態様であってもよい。記憶部２１は、図示しない読出装置によって記録媒体２Ａから読み出されたプログラム２Ｐを記憶する。また、図示しない通信網に接続されている図示しない外部コンピュータからプログラム２Ｐをダウンロードし、記憶部２１に記憶させたものであってもよい。制御部２０は、プログラム２Ｐを読み出して実行することにより、撮像画像に基づき対象物に関する情報を推定する情報処理装置として機能する。 The control unit 200 of the third embodiment stores the program 2P and the program and data referred to by the control unit 20 including the plurality of learning models 1M in the storage unit 21. The program 2P stored in the storage unit 21 may be recorded on a recording medium so that it can be read by a computer. The storage unit 21 stores the program 2P read from the recording medium 2A by a reading device (not shown). Further, the program 2P may be downloaded from an external computer (not shown) connected to a communication network (not shown) and stored in the storage unit 21. The control unit 20 functions as an information processing device that estimates information about the object based on the captured image by reading and executing the program 2P.

第１通信部２２には、移動体内通信回線Ｎ２を介して表示装置３２が更に接続される。表示装置３２は、例えば液晶ディスプレイなどであり、制御ユニット２００から出力される情報を表示する。また表示装置３２は、タッチパネル等の操作部３３を有しており、操作部３３に対するユーザの操作を受け付けて、制御ユニット２００へ受け付けた操作内容を通知する。表示装置３２は、例えばカーナビゲーション装置と共用のものであってよい。 The display device 32 is further connected to the first communication unit 22 via the mobile communication line N2. The display device 32 is, for example, a liquid crystal display or the like, and displays information output from the control unit 200. Further, the display device 32 has an operation unit 33 such as a touch panel, receives the user's operation on the operation unit 33, and notifies the control unit 200 of the received operation content. The display device 32 may be shared with, for example, a car navigation device.

上記のように構成される推定システム１１０にて、学習モデル１Ｍを用いた推定処理が実行される。第３実施形態では、学習モデル１Ｍを用いて、移動体２の走行路を示す白線を含む撮像画像に応じた走行路上の位置の推定結果を出力する。移動体２が所在する走行路上の位置は、例えば移動体２の所在車線で示される。図１４は、推定処理方法を示す概念図である。図１４では、例えばＡ地点からＢ地点までの２地点間における所在車線の推定結果を出力する処理に関して説明する。 In the estimation system 110 configured as described above, the estimation process using the learning model 1M is executed. In the third embodiment, the learning model 1M is used to output the estimation result of the position on the traveling path according to the captured image including the white line indicating the traveling path of the moving body 2. The position on the traveling path where the moving body 2 is located is indicated by, for example, the location lane of the moving body 2. FIG. 14 is a conceptual diagram showing an estimation processing method. In FIG. 14, for example, a process of outputting an estimation result of the location lane between two points from the point A to the point B will be described.

移動体２の制御ユニット２００は、初めに、撮像装置３１にて記録された２地点間の撮像画像を取得する。制御ユニット２００は、ユーザの入力を受け付ける等により、学習モデル１Ｍに対するｎ及びｍの値を決定する。制御ユニット２００は、図８に示したテーブルを参照し、決定された各時点におけるｎ及びｍの組み合わせに応じたグループＩＤを特定する。特定されたグループＩＤに応じて、撮像画像の各フレームがグループに分別される。 The control unit 200 of the moving body 2 first acquires an captured image between two points recorded by the imaging device 31. The control unit 200 determines the values of n and m with respect to the learning model 1M by accepting the input of the user or the like. The control unit 200 refers to the table shown in FIG. 8 and identifies the group ID according to the combination of n and m at each determined time point. Each frame of the captured image is divided into groups according to the specified group ID.

分別された撮像画像の各フレームは、各グループに応じたｎ及びｍの値に基づきフレームからユニットグループ値Ｘを生成する前処理が施された後、グループＩＤに対応する学習モデル１Ｍに入力される。各学習モデル１Ｍからは、各時点におけるフレームに対応する所在車線の推定結果が出力される。各時点における推定結果は、時系列に組み合わせられ、一連の推定結果データとして表示装置３２等を介して出力される。 Each frame of the separated captured image is input to the learning model 1M corresponding to the group ID after being preprocessed to generate a unit group value X from the frame based on the values of n and m corresponding to each group. NS. From each learning model 1M, the estimation result of the location lane corresponding to the frame at each time point is output. The estimation results at each time point are combined in a time series and output as a series of estimation result data via the display device 32 or the like.

図１５は、学習モデル１Ｍを用いた推定処理手順の一例を示すフローチャートである。以下の処理は、制御ユニット２００の記憶部２１に記憶してあるプログラム２Ｐに従って制御部２０によって実行される。処理の実行タイミングは、例えば撮像装置３１により新たな動画が記録されたタイミングである。 FIG. 15 is a flowchart showing an example of an estimation processing procedure using the learning model 1M. The following processing is executed by the control unit 20 according to the program 2P stored in the storage unit 21 of the control unit 200. The execution timing of the process is, for example, the timing at which a new moving image is recorded by the image pickup apparatus 31.

制御部２０は、撮像装置３１により撮影し記録された撮像画像を取得する（ステップＳ３１）。撮像画像は、例えばＡ地点からＢ地点までの２地点間における移動体２の外部を撮影した動画像であり、移動体２の走行路を示す白線が含まれる。撮像画像には、撮影時点に関する情報が付随している。 The control unit 20 acquires an captured image captured and recorded by the imaging device 31 (step S31). The captured image is, for example, a moving image obtained by photographing the outside of the moving body 2 between two points from the point A to the point B, and includes a white line indicating a traveling path of the moving body 2. The captured image is accompanied by information about the time of shooting.

制御部２０は、撮像画像をグレースケールに変換し、グレースケール撮像画像を取得する（ステップＳ３２）。なお、制御部２０は、撮像装置３１からグレースケールで撮影された撮像画像を取得してもよい。制御部２０は、例えばＬａｎｅＮｅｔ等の機械学習モデルを用いて、撮像画像から対象物の抽出を行い、対象物抽出画像を取得する（ステップＳ３３）。第３実施形態では、抽出する対象物は白線であり、対象物抽出画像は２値の白線抽出画像である。制御部２０は、グレースケール撮像画像と白線抽出画像とを関連付けて記憶する。 The control unit 20 converts the captured image into grayscale and acquires the grayscale captured image (step S32). The control unit 20 may acquire a captured image captured in grayscale from the imaging device 31. The control unit 20 extracts an object from the captured image by using a machine learning model such as LaneNet, and acquires the object extracted image (step S33). In the third embodiment, the object to be extracted is a white line, and the object extraction image is a binary white line extraction image. The control unit 20 stores the grayscale captured image and the white line extracted image in association with each other.

制御部２０は、取得したグレースケール撮像画像及び白線抽出画像に基づき、ユニットグループ値Ｘを取得する（ステップＳ３４）。図１６は、第３実施形態におけるユニットグループ値Ｘ取得の詳細な手順の一例を示すフローチャートである。図１６のフローチャートに示す処理手順は、図１５のフローチャートにおけるステップＳ３４の詳細に対応する。 The control unit 20 acquires the unit group value X based on the acquired grayscale image and the white line extracted image (step S34). FIG. 16 is a flowchart showing an example of a detailed procedure for acquiring the unit group value X in the third embodiment. The processing procedure shown in the flowchart of FIG. 16 corresponds to the details of step S34 in the flowchart of FIG.

制御部２０は、操作部３３によりユーザの入力を受け付ける等によりユニットグループ値Ｘを生成するためのｎ及びｍの値を決定する（ステップＳ３４１）。ｎ及びｍは夫々、結合する白線抽出画像の枚数及び組み合わせるデータユニット値の個数を決定するための値である。 The control unit 20 determines the values of n and m for generating the unit group value X by receiving the input of the user by the operation unit 33 (step S341). n and m are values for determining the number of white line extracted images to be combined and the number of data unit values to be combined, respectively.

制御部２０は、決定したｎの値に基づき、時刻ｔにおけるグレースケール撮像画像と、時刻ｔ及び時刻ｔの前後ｎ個の時刻における白線抽出画像（合計２ｎ＋１個）とで構成される時刻ｔデータユニットを生成する（ステップＳ３４２）。制御部１０は、生成した時刻ｔデータユニットにおけるグレースケール撮像画像データと、各時刻の白線抽出画像に基づく結合白線抽出画像データとを組み合わせたデータユニット値を取得する（ステップＳ３４３）。 Based on the determined value of n, the control unit 20 is time t data composed of a grayscale image taken at time t and white line extracted images (total 2n + 1) at time t and n times before and after time t. Generate a unit (step S342). The control unit 10 acquires a data unit value that combines the grayscale captured image data in the generated time t data unit and the combined white line extracted image data based on the white line extracted image at each time (step S343).

制御部１０は、決定したｍの値に基づき、時刻ｔ及び時刻ｔの前後ｍ個の時刻、すなわち時刻ｔ-mから時刻ｔ+mまでの各時刻におけるデータユニット（合計２ｍ＋１個）で構成されるユニットグループを生成する（ステップＳ３４４）。制御部１０は、生成したユニットグループの各時刻におけるデータユニット値を組み合わせたユニットグループ値Ｘを取得し（ステップＳ３４５）、図１５のフローチャートにおけるステップＳ３５へ処理を戻す。なおユニットグループ値Ｘは、ｎ及びｍの値を付随させて取得される。 The control unit 10 is composed of time t and m time before and after time t, that is, data units (total 2 m + 1) at each time from time tm to time t + m based on the determined value of m. Unit group is generated (step S344). The control unit 10 acquires a unit group value X that combines the data unit values at each time of the generated unit group (step S345), and returns the process to step S35 in the flowchart of FIG. The unit group value X is acquired with the values of n and m attached.

図１５に戻り説明を続ける。制御部２０は、ユニットグループ値Ｘに付随するｎ及びｍの値に基づき、図８で説明したｎ及びｍの値とグループＩＤとを関連付けたテーブルを参照し、ｎ及びｍの値の組み合わせにより特定されるグループＩＤを取得する。制御部２０は、取得したグループＩＤに基づき、撮像画像の各フレームをグループに分別する（ステップＳ３５）。 Returning to FIG. 15, the description will be continued. The control unit 20 refers to the table in which the values of n and m and the group ID described in FIG. 8 are associated with each other based on the values of n and m associated with the unit group value X, and by combining the values of n and m. Acquire the specified group ID. The control unit 20 classifies each frame of the captured image into groups based on the acquired group ID (step S35).

制御部２０は、記憶する複数の学習モデル１Ｍから、取得したグループＩＤに対応する学習モデル１Ｍを選択する（ステップＳ３６）。制御部２０は、撮像画像に上述の前処理を施して得られたユニットグループ値Ｘを、選択した学習モデル１Ｍに入力情報として入力する（ステップＳ３７）。制御部２０は、学習モデル１Ｍから出力される対象物に関する情報を取得する（ステップＳ３８）。出力情報は、例えば各フレームに対応する所在車線である。制御部２０は、取得した各時点における推定結果を時系列に組み合わせた一連の推定結果データを生成する（ステップＳ３９）。制御部２０は、生成した推定結果データを撮像画像に関連付けて記憶部１１に記憶するとともに、表示装置３２等を介して推定結果データを出力し（ステップＳ４０）、一連の処理を終了する。 The control unit 20 selects the learning model 1M corresponding to the acquired group ID from the plurality of learning models 1M to be stored (step S36). The control unit 20 inputs the unit group value X obtained by performing the above-mentioned preprocessing on the captured image into the selected learning model 1M as input information (step S37). The control unit 20 acquires information about the object output from the learning model 1M (step S38). The output information is, for example, the location lane corresponding to each frame. The control unit 20 generates a series of estimation result data in which the acquired estimation results at each time point are combined in a time series (step S39). The control unit 20 stores the generated estimation result data in association with the captured image in the storage unit 11, outputs the estimation result data via the display device 32 or the like (step S40), and ends a series of processes.

上記では、一連の動画像を取得した後に学習モデル１Ｍによる推定処理を実行する例を説明したが、推定処理を実行するタイミングは限定されるものではない。制御部２０は、撮像装置３１で撮影が開始されたタイミングで上述の処理を実行し、リアルタイムで取得した撮像画像に基づき推定結果データを出力してもよい。この場合においては、推定結果データは、一連のデータとして生成されるものでなく随時出力されてよい。 In the above, an example of executing the estimation process by the learning model 1M after acquiring a series of moving images has been described, but the timing of executing the estimation process is not limited. The control unit 20 may execute the above-mentioned processing at the timing when the imaging device 31 starts photographing, and output the estimation result data based on the captured image acquired in real time. In this case, the estimation result data is not generated as a series of data but may be output at any time.

更に、制御部２０は、推定結果データに応じた情報を出力してもよい。例えば、所在車線の推定結果の推移により車両が走行車線から逸脱していると判定される場合には、制御部２０は、表示装置３２又は不図示のスピーカー等を介して画像、警報、音声、振動等による支援情報を出力するものであってよい。制御部２０は、移動体２の装備品に制御信号を出力するものであってもよい。 Further, the control unit 20 may output information according to the estimation result data. For example, when it is determined that the vehicle deviates from the traveling lane based on the transition of the estimation result of the location lane, the control unit 20 uses the display device 32, a speaker (not shown), or the like to display an image, an alarm, a sound, or the like. It may output support information due to vibration or the like. The control unit 20 may output a control signal to the equipment of the moving body 2.

図１７は、表示装置３２で表示される画面例を示す図である。図１７は、推定結果データを含む推定結果画面３２０の一例を示す図である。推定結果画面３２０には、一の時刻における撮像画像３２１と、該撮像画像３２１に対応付けられた推定結果３２２及び推定情報３２３とが含まれる。 FIG. 17 is a diagram showing an example of a screen displayed by the display device 32. FIG. 17 is a diagram showing an example of the estimation result screen 320 including the estimation result data. The estimation result screen 320 includes the captured image 321 at one time, the estimation result 322 and the estimation information 323 associated with the captured image 321.

撮像画像３２１は、記録された動画像から切り出した１フレームの静止画像である。推定結果３２２は、学習モデル１Ｍにより出力された対象物に関する情報であり、例えば所在車線、車両台数、広告数等である。図１７の例では、推定結果３２２は所在車線であり、歩道に近い順に１、２、３等の車線番号を用いて示される。推定情報３２３は、撮像画像に基づく推定処理に関する情報である。図１７の例では、推定情報３２３には、撮像画像のファイル名、撮影時刻、推定に用いたｎ及びｍの値、推定確度の数値を夫々示すテキストデータが含まれている。なお、推定情報３２３は、テキストデータによるものに限定されず、イラスト、音声等によるものであってもよい。なお、推定確度は、確度に応じて、例えば数値の色、大きさ、点滅／点灯、表示状態を変化させて強調表示を行ってもよい。制御部２０は、撮像画像３２１に対応付けて同時点における推定結果３２２及び推定情報３２３を取得する。制御部２０は、取得した撮像画像３２１と、推定結果３２２及び推定情報３２３のテキストデータ等とを含む推定結果画面３２０の画面データを生成し、表示装置３２を介して出力する。ユーザは、表示装置３２により学習モデル１Ｍの推定結果を認識することができる。 The captured image 321 is a one-frame still image cut out from the recorded moving image. The estimation result 322 is information about the object output by the learning model 1M, such as the location lane, the number of vehicles, the number of advertisements, and the like. In the example of FIG. 17, the estimation result 322 is the location lane, and is shown using lane numbers such as 1, 2, and 3 in the order of proximity to the sidewalk. The estimation information 323 is information related to the estimation process based on the captured image. In the example of FIG. 17, the estimation information 323 includes text data indicating a file name of the captured image, a shooting time, values of n and m used for estimation, and numerical values of estimation accuracy, respectively. The estimated information 323 is not limited to text data, but may be illustration, voice, or the like. The estimated accuracy may be highlighted by changing, for example, the color, size, blinking / lighting, and display state of the numerical value according to the accuracy. The control unit 20 acquires the estimation result 322 and the estimation information 323 at the same point in association with the captured image 321. The control unit 20 generates screen data of the estimation result screen 320 including the acquired captured image 321 and the text data of the estimation result 322 and the estimation information 323, and outputs the screen data via the display device 32. The user can recognize the estimation result of the learning model 1M by the display device 32.

上記では、学習モデル１Ｍは、制御ユニット２００にて処理に用いられるとして説明した。しかしながらこれに限らず、学習モデル１Ｍは、制御ユニット２００と通信可能に接続された他の情報処理装置に記憶されており、制御ユニット２００から得られる撮像画像に基づいて、他の情報処理装置にて推定結果を出力する処理に用いられてもよい。 In the above, the learning model 1M has been described as being used for processing in the control unit 200. However, not limited to this, the learning model 1M is stored in another information processing device communicably connected to the control unit 200, and is stored in the other information processing device based on the captured image obtained from the control unit 200. It may be used in the process of outputting the estimation result.

また、学習モデル１Ｍは制御ユニット２００とは直接通信接続されていないサーバ等の解析装置にて用いられてもよい。解析装置は、制御ユニット２００と通信可能に接続された他の情報処理装置を介して、制御ユニット２００で録画された撮像画像を取得し、取得した撮像画像に基づき学習モデル１Ｍを用いて対象物に関する情報の解析処理を実行してもよい。さらに、解析処理により取得した新たなデータに等に基づき、学習モデル１Ｍは再学習を実行してもよい。解析装置は、新たな修正情報を用いて訓練データを更に作成し、当該訓練データを用いて学習モデル１Ｍの再学習を行う。再学習を行うことにより、学習モデル１Ｍの推定の精度を更に向上させることができる。 Further, the learning model 1M may be used in an analysis device such as a server that is not directly connected to the control unit 200 by communication. The analysis device acquires an image captured by the control unit 200 via another information processing device communicably connected to the control unit 200, and uses the learning model 1M based on the acquired image to obtain an object. You may execute the analysis process of the information about. Further, the learning model 1M may perform re-learning based on new data acquired by the analysis process or the like. The analysis device further creates training data using the new correction information, and retrains the learning model 1M using the training data. By performing re-learning, the accuracy of estimation of the learning model 1M can be further improved.

本実施形態によれば、撮像画像のフレーム毎に異なる学習モデル１Ｍを用いて推定処理が実行される。各フレームに応じた学習モデル１Ｍを使用することで、高い推定精度の出力情報を取得することができる。 According to this embodiment, the estimation process is executed using a learning model 1M that is different for each frame of the captured image. By using the learning model 1M corresponding to each frame, it is possible to acquire output information with high estimation accuracy.

（第４実施形態）
第４実施形態では、撮像画像に応じた検出精度及び移動速度を取得し、ｎ及びｍの値を決定する。以下では、第４実施形態について、第３実施形態と異なる点を説明する。後述する構成を除く他の構成については第３実施形態の推定システム１１０と同様であるので、共通する構成については同一の符号を付してその詳細な説明を省略する。 (Fourth Embodiment)
In the fourth embodiment, the detection accuracy and the moving speed according to the captured image are acquired, and the values of n and m are determined. Hereinafter, the differences between the fourth embodiment and the third embodiment will be described. Since the other configurations other than the configurations described later are the same as those of the estimation system 110 of the third embodiment, the common configurations are designated by the same reference numerals and detailed description thereof will be omitted.

図１８は、第４実施形態における学習モデル１Ｍを用いた推定処理手順の一例を示すフローチャートである。以下の処理は、制御ユニット２００の記憶部２１に記憶してあるプログラム２Ｐに従って制御部２０によって実行される。第３実施形態の図１５と共通する処理については同一のステップ番号を付してその詳細な説明を省略する。 FIG. 18 is a flowchart showing an example of an estimation processing procedure using the learning model 1M in the fourth embodiment. The following processing is executed by the control unit 20 according to the program 2P stored in the storage unit 21 of the control unit 200. The same step numbers are assigned to the processes common to those in FIG. 15 of the third embodiment, and detailed description thereof will be omitted.

制御部２０は、撮像装置３１により撮影し記録された撮像画像及び付加情報を取得する（ステップＳ５１）。撮像画像には、移動体２の走行路を示す白線が含まれる。付加情報は、例えば移動体２の速度の履歴データ及び撮像画像のフレームレート等が含まれる。付加情報には、時点に関する情報が対応付けられている。 The control unit 20 acquires the captured image and additional information captured and recorded by the imaging device 31 (step S51). The captured image includes a white line indicating the traveling path of the moving body 2. The additional information includes, for example, historical data of the speed of the moving body 2, the frame rate of the captured image, and the like. Information about the time point is associated with the additional information.

制御部２０は、撮像画像をグレースケールに変換し、グレースケール撮像画像を取得する（ステップＳ３２）。制御部２０は、例えばＬａｎｅＮｅｔ等の機械学習モデルを用いて、撮像画像から対象物の抽出を行い、対象物抽出画像を取得する（ステップＳ３３）。第４実施形態では、抽出する対象物は白線であり、対象物抽出画像は２値の白線抽出画像である。制御部２０は、グレースケール撮像画像と白線抽出画像とを関連付けて記憶する。 The control unit 20 converts the captured image into grayscale and acquires the grayscale captured image (step S32). The control unit 20 extracts an object from the captured image by using a machine learning model such as LaneNet, and acquires the object extracted image (step S33). In the fourth embodiment, the object to be extracted is a white line, and the object extraction image is a binary white line extraction image. The control unit 20 stores the grayscale captured image and the white line extracted image in association with each other.

制御部２０は、取得したグレースケール撮像画像及び白線抽出画像に基づき、ユニットグループ値Ｘを取得する（ステップＳ３４）。図１９は、第４実施形態におけるユニットグループ値Ｘ取得の詳細な手順の一例を示すフローチャートである。図１９のフローチャートに示す処理手順は、図１８のフローチャートにおけるステップＳ３４の詳細に対応する。第３実施形態の図１６と共通する処理については同一のステップ番号を付してその詳細な説明を省略する。 The control unit 20 acquires the unit group value X based on the acquired grayscale image and the white line extracted image (step S34). FIG. 19 is a flowchart showing an example of a detailed procedure for acquiring the unit group value X in the fourth embodiment. The processing procedure shown in the flowchart of FIG. 19 corresponds to the details of step S34 in the flowchart of FIG. The same step numbers are assigned to the processes common to those in FIG. 16 of the third embodiment, and detailed description thereof will be omitted.

制御部２０は、操作部３３によりユーザの入力を受け付ける等により、学習モデル１Ｍに対し要求する検出精度を取得する（ステップＳ４４１）。制御部２０は、その他機械学習モデル等を用いて判定した検出精度を取得してもよい。 The control unit 20 acquires the detection accuracy required for the learning model 1M by accepting the user's input by the operation unit 33 (step S441). The control unit 20 may acquire the detection accuracy determined by using another machine learning model or the like.

制御部２０は、付加情報として取得した移動体２の速度及び撮像画像のフレームレートに基づき、時刻ｔにおける移動距離を取得する（ステップＳ４４２）。制御部２０は、不図示のテーブルを夫々参照して、検出精度及び移動距離に基づき、ユニットグループ値Ｘを生成するためのｎ及びｍの値を決定する（ステップＳ３４１）。 The control unit 20 acquires the moving distance at time t based on the speed of the moving body 2 acquired as additional information and the frame rate of the captured image (step S442). The control unit 20 refers to each of the tables (not shown) and determines the values of n and m for generating the unit group value X based on the detection accuracy and the moving distance (step S341).

制御部１０は、決定したｍの値に基づき、時刻ｔ及び時刻ｔの前後ｍ個の時刻におけるデータユニット（合計２ｍ＋１個）で構成されるユニットグループを生成する（ステップＳ３４４）。制御部１０は、生成したユニットグループの各時刻におけるデータユニット値を組み合わせたユニットグループ値Ｘを取得し（ステップＳ３４５）、図１８のフローチャートにおけるステップＳ３５へ処理を戻す。 Based on the determined value of m, the control unit 10 generates a unit group composed of data units (total 2m + 1) at time t and m times before and after time t (step S344). The control unit 10 acquires a unit group value X that combines the data unit values at each time of the generated unit group (step S345), and returns the process to step S35 in the flowchart of FIG.

図１８に戻り説明を続ける。制御部２０は、ユニットグループ値Ｘに付随するｎ及びｍの値に基づき、図８で説明したｎ及びｍの値とグループＩＤとを関連付けたテーブルを参照し、ｎ及びｍの値の組み合わせにより特定されるグループＩＤを取得する。制御部２０は、取得したグループＩＤに基づき、撮像画像の各フレームをグループに分別する（ステップＳ３５）。 Returning to FIG. 18, the description will be continued. The control unit 20 refers to the table in which the values of n and m and the group ID described in FIG. 8 are associated with each other based on the values of n and m associated with the unit group value X, and by combining the values of n and m. Acquire the specified group ID. The control unit 20 classifies each frame of the captured image into groups based on the acquired group ID (step S35).

制御部２０は、記憶する複数の学習モデル１Ｍから、取得したグループＩＤに対応する学習モデル１Ｍを選択する（ステップＳ３６）。制御部２０は、撮像画像に上述の前処理を施して得られたユニットグループ値Ｘを、選択した学習モデル１Ｍに入力情報として入力する（ステップＳ３７）。制御部２０は、学習モデル１Ｍから出力される対象物に関する情報を取得する（ステップＳ３８）。制御部２０は、取得した各時点における推定結果を時系列に組み合わせた一連の推定結果データを生成する（ステップＳ３９）。制御部２０は、生成した推定結果データを撮像画像に関連付けて記憶部１１に記憶するとともに、表示装置３２等を介して推定結果データを出力し（ステップＳ４０）、一連の処理を終了する。 The control unit 20 selects the learning model 1M corresponding to the acquired group ID from the plurality of learning models 1M to be stored (step S36). The control unit 20 inputs the unit group value X obtained by performing the above-mentioned preprocessing on the captured image into the selected learning model 1M as input information (step S37). The control unit 20 acquires information about the object output from the learning model 1M (step S38). The control unit 20 generates a series of estimation result data in which the acquired estimation results at each time point are combined in a time series (step S39). The control unit 20 stores the generated estimation result data in association with the captured image in the storage unit 11, outputs the estimation result data via the display device 32 or the like (step S40), and ends a series of processes.

本実施形態によれば、要求する検出精度及び移動距離に応じて、撮像画像のフレーム毎に異なる学習モデル１Ｍを用いて推定処理が実行される。撮像画像の状態に応じて用いる学習モデル１Ｍが選択されるため、より高い推定精度の出力情報を取得することができる。 According to the present embodiment, the estimation process is executed using the learning model 1M that is different for each frame of the captured image according to the required detection accuracy and the moving distance. Since the learning model 1M to be used is selected according to the state of the captured image, it is possible to acquire output information with higher estimation accuracy.

なお、上述の各実施形態で説明した各処理シーケンスは限定されるものではなく、その性質に反しない限り、手順の変更を許容し得る。上述の処理シーケンスに対して、例えば各処理ステップの実行順序を変更してもよく、複数の処理ステップを同時に実行させてもよく、一連の処理シーケンスを実行する毎に、各処理ステップの順序が異なるようにしてもよい。 It should be noted that each processing sequence described in each of the above-described embodiments is not limited, and changes in the procedure can be tolerated as long as the properties are not violated. For the above processing sequence, for example, the execution order of each processing step may be changed, or a plurality of processing steps may be executed at the same time. Each time a series of processing sequences are executed, the order of each processing step is changed. It may be different.

なお、上述のように開示された実施の形態はすべての点で例示であって、制限的なものではないと考えられるべきである。各実施例にて記載されている技術的特徴は互いに組み合わせることができ、本発明の範囲は、特許請求の範囲内での全ての変更及び特許請求の範囲と均等の範囲が含まれることが意図される。 It should be noted that the embodiments disclosed as described above are exemplary in all respects and should not be considered restrictive. The technical features described in each example can be combined with each other and the scope of the invention is intended to include all modifications within the claims and scope equivalent to the claims. Will be done.

１情報処理装置
２移動体
２００制御ユニット
３１撮像装置
３２表示装置
１０，２０制御部
１１，２１記憶部
１Ｐ，２Ｐプログラム
１Ｍ学習モデル 1 Information processing device 2 Mobile 200 Control unit 31 Imaging device 32 Display device 10, 20 Control unit 11,21 Storage unit 1P, 2P Program 1M Learning model

Claims

Acquires an image captured including an object imaged by an image pickup device mounted on a moving body, and obtains an image.
A plurality of object extraction images from which the objects have been extracted are acquired in association with the captured images, and the images are acquired.
Based on the acquired captured image and training data including multiple object extracted images and information about the object, a learning model that outputs information about the object when the captured image and a plurality of object extracted images are input is generated. How to generate a training model.

The method for generating a learning model according to claim 1, wherein a plurality of binarized images obtained by extracting the object from each of the captured image and the captured image adjacent to the captured image in time series are acquired.

Acquire an captured image including a white line showing the traveling path of a moving body,
A plurality of white line extracted images from which the white lines have been extracted are acquired in association with the captured image, and the images are acquired.
Based on the training data including the acquired captured image and the plurality of white line extracted images and the position on the traveling path where the moving body is located, the captured image including the white line indicating the traveling path of the moving body and the plurality of extracted white lines. The method for generating a training model according to claim 1 or 2, wherein when a white line extracted image is input, the training model that outputs the position on the traveling path where the moving body is located is generated.

The captured image at the first time and a plurality of object extraction images at a plurality of times before and after the first time and the first time are acquired.
The learning model is generated based on the acquired training data including a plurality of object extraction images at the first time and a plurality of times before and after the first time, an image captured at the first time, and information about the object. The method for generating a learning model according to any one of claims 1 to 3.

The method for generating a learning model according to claim 4, wherein the number of the object extracted images is determined based on the detection accuracy of the object included in the object extracted image at the first time.

The first time and the first time including a data unit composed of a captured image at the first time and a plurality of object extraction images at the first time and a plurality of times before and after the first time. Acquire captured images and multiple object extraction images in multiple data units at multiple times before and after,
When the captured images in a plurality of data units and the extracted images of a plurality of objects are input based on the training data including the acquired images captured in the plurality of data units and the extracted images of the plurality of objects and the information about the objects, the objects The method for generating a learning model according to claim 4 or 5, wherein the learning model for which information is output is generated.

The method for generating a learning model according to claim 6, wherein the number of data units is determined based on the moving speed of the moving body and the frame rate of the captured image at the first time.

Acquires an image captured including an object imaged by an image pickup device mounted on a moving body, and obtains an image.
A plurality of object extraction images from which the objects have been extracted are acquired in association with the captured images, and the images are acquired.
Based on training data including captured images and multiple object extracted images and information about the objects, a learning model trained to output information about the objects when the captured images and multiple object extracted images are input. , A program for causing a computer to perform a process of inputting an acquired captured image and a plurality of object extraction images and outputting information on the object.

The program according to claim 8, wherein a computer is made to execute a process of acquiring a plurality of binarized images obtained by extracting the object from each of the captured image and the captured images adjacent to the captured image in time series.

Acquire an captured image including a white line showing the traveling path of a moving body,
A plurality of white line extracted images from which the white lines have been extracted are acquired in association with the captured image, and the images are acquired.
Based on the training data including the captured image and the plurality of white line extraction images and the position on the traveling path where the moving body is located, the captured image including the white line indicating the traveling path of the moving body and the plurality of white line extractions from which the white lines are extracted are extracted. When an image is input, a plurality of captured images including a white line indicating the acquired traveling path of the moving body and a plurality of extracted white lines are extracted from the learning model trained to output the position on the traveling path where the moving body is located. The program according to claim 8 or 9, for causing a computer to execute a process of inputting a white line extracted image and outputting a position on a traveling path where the moving body is located.

The learning model is learned based on training data including a plurality of object extraction images at a first time and a plurality of times before and after the first time, an image captured at the first time, and information about the object. The program according to any one of claims 8 to 10.

Obtain the detection accuracy of the object included in the object extraction image at the first time,
The program according to claim 11, wherein a computer is made to execute a process of selecting a learning model corresponding to the acquired detection accuracy from a plurality of types of the learning models prepared according to the detection accuracy.

The first time and the first time including a data unit composed of a captured image at the first time and a plurality of object extraction images at the first time and a plurality of times before and after the first time. Acquire captured images and multiple object extraction images in multiple data units at multiple times before and after,
Information about an object when an image captured by a plurality of data units and an image extracted from a plurality of objects are input based on training data including an image captured by a plurality of data units and an image extracted from a plurality of objects and information about the object. A request for causing a computer to perform a process of inputting an image captured in a plurality of acquired data units and a plurality of object extraction images into the training model trained to output the data and outputting information on the object. The program according to any one of claim 11 and claim 12.

The moving speed of the moving body and the frame rate of the captured image at the first time are acquired.
The program according to claim 13, wherein a computer is made to execute a process of selecting a learning model corresponding to the acquired movement speed and frame rate from a plurality of types of the learning models prepared according to the movement speed and the frame rate.

A first acquisition unit that acquires an captured image including an object imaged by an imaging device mounted on a moving body, and a first acquisition unit.
A second acquisition unit that acquires a plurality of object extraction images from which the objects have been extracted in association with the captured images, and a second acquisition unit.
The captured image and the plurality of object extraction images are input based on the training data including the captured image acquired by the first acquisition unit, the plurality of object extraction images acquired by the second acquisition unit, and information about the object. An information processing device equipped with a generator that generates a learning model that outputs information about an object when the information is processed.

A first acquisition unit that acquires an captured image including an object imaged by an imaging device mounted on a moving body, and a first acquisition unit.
A second acquisition unit that acquires a plurality of object extraction images from which the objects have been extracted in association with the captured images, and a second acquisition unit.
A learning model learned to output information about an object when a captured image and a plurality of object extracted images are input based on training data including a captured image and a plurality of object extracted images and information about the object. ,
Information processing including an image captured by the first acquisition unit and an output unit that inputs a plurality of object extraction images acquired by the second acquisition unit and outputs information about the object to the learning model. Device.