JP7252775B2

JP7252775B2 - Video analysis support device and method

Info

Publication number: JP7252775B2
Application number: JP2019024915A
Authority: JP
Inventors: 健一森田; 竜慈大竹; 啓太山田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2019-02-14
Filing date: 2019-02-14
Publication date: 2023-04-05
Anticipated expiration: 2039-02-14
Also published as: JP2020135152A

Description

本発明は、概して、検知対象物の検知のような映像解析処理を支援する技術に関する。 The present invention relates generally to techniques for assisting video analytics processes, such as detection of objects to be sensed.

本技術分野の背景技術として、例えば、特開２０１８－０２２２３４号公報（特許文献１）がある。特許文献１には、「検知領域選択部１０３は、路面検出部１０２による路面領域の検出結果と、時系列検証部１０５による時系列検証の結果とに基づいて、検知対象物である他車両を検知するための車両検知領域を入力画像内に設定する。このとき検知領域選択部１０３は、時系列検証部１０５による時系列検証の結果に基づいて路面領域補間部１０６が路面領域の補間を行った場合には、その補間結果を基に、補間された路面領域を含めて車両検知領域を設定する。」と記載されている。 As a background art of this technical field, there is, for example, Japanese Patent Application Laid-Open No. 2018-022234 (Patent Document 1). In Patent Document 1, "The detection area selection unit 103 selects another vehicle as a detection target based on the detection result of the road surface area by the road surface detection unit 102 and the time series verification result by the time series verification unit 105. A vehicle detection region for detection is set in the input image, and the detection region selection unit 103 causes the road surface region interpolation unit 106 to interpolate the road surface region based on the results of the time-series verification by the time-series verification unit 105. In this case, the vehicle detection area including the interpolated road surface area is set based on the interpolation result."

特開２０１８－０２２２３４号公報JP 2018-022234 A

映像中の検知対象物（人物、荷物、動物、船舶または車両といった物体）を検知する映像解析では、検知対象物の現れる可能性の無い場所に存在する物体を検知対象物であると誤って検知する場合や、壁面、ガラス面または鏡などに映った検知対象物の鏡像を検知対象物として誤って検知する場合がある。 In video analysis that detects detection targets (objects such as people, packages, animals, ships, and vehicles) in images, objects that exist in places where there is no possibility of detection targets being detected are mistakenly detected as detection targets. In some cases, the mirror image of the detection target reflected on a wall, glass surface, mirror, or the like may be erroneously detected as the detection target.

このような誤検知を低減するためには、映像のうち検知対象物の出現する可能性のある領域（以下、検知対象領域）についてのみ検知対象物の有無を判断するか、あるいは、検知結果の画像上の位置情報が検知対象領域内に含まれていない場合にはその検知結果を棄却すればよい。以上の処理によって、映像解析における物体検知の精度を向上することが可能である。 In order to reduce such erroneous detection, it is necessary to determine the presence or absence of the detection target only in areas where the detection target may appear in the image (hereinafter referred to as the detection target area), or to determine the presence or absence of the detection result. If the position information on the image is not included in the detection target area, the detection result may be rejected. Through the above processing, it is possible to improve the accuracy of object detection in video analysis.

ただし、検知対象領域が不適切であると、検知精度向上（言い換えれば誤検知低減）の効果が十分に発揮されない。 However, if the detection target area is inappropriate, the effect of improving detection accuracy (in other words, reducing false detections) will not be sufficiently exhibited.

このため、適切な検知対象領域を設定することが、検知精度の向上に重要な要素の一つである。 Therefore, setting an appropriate detection target area is one of the important factors for improving the detection accuracy.

しかし、適切な検知対象領域の設定は、下記のうちの少なくとも一つが理由から困難である。
・カメラにより撮影された映像において、一つのピクセルに対応した実際の大きさは、カメラの姿勢、撮影倍率（画角）および位置といった種々の撮影条件によって、ピクセル位置が異なれば異なり得る。ピクセル位置毎に実際の大きさを推定することは困難である。
・実際の大きさの基準となる物体（検知対象物またはその他の物体）が映っていれば、映像における各ピクセルについて対応する実際の大きさを推定することは期待できる。しかし、そのような物体が映っている映像を必ず撮影できるとは限らない。そのような物体を映っている映像を取得するため一定時間撮影し続ける必要がある。この問題は、物体が映ることが比較的少ないケース（例えばカメラが監視カメラである）では、特に大きいと考えられる。
・現場でのカメラの設置工事は、通常、当該カメラにより検知したい検知対象物が存在しない時間帯で行われる。このため、カメラの設置が終了し検知対象物が存在する時間帯になったときに検知対象領域の設定を始めることが可能になるといった問題がある。 However, setting an appropriate detection target area is difficult for at least one of the following reasons.
・In an image captured by a camera, the actual size corresponding to one pixel may differ depending on the pixel position due to various shooting conditions such as camera posture, shooting magnification (angle of view) and position. Estimating the actual magnitude for each pixel location is difficult.
• If an object (a sensing object or other object) for which the actual size is based is visible, it is expected to estimate the corresponding actual size for each pixel in the image. However, it is not always possible to shoot an image including such an object. In order to obtain images of such objects, it is necessary to continue shooting for a certain period of time. This problem is considered to be particularly large in cases where relatively few objects are captured (for example, the camera is a surveillance camera).
- On-site camera installation work is usually carried out during hours when there are no objects to be detected by the camera. For this reason, there is a problem that setting of the detection target area can be started when the installation of the camera is finished and the time period when the detection target exists.

本発明の一態様によれば、映像解析支援装置が、入力された画像座標系の映像を基に、該当領域を推定し且つカメラパラメータを算出する。当該装置が、該当領域の座標を算出されたカメラパラメータを用いて画像座標から世界座標に変換することで世界座標系の該当領域を導出し、当該導出された該当領域を基に世界座標系の検知対象領域を導出し、当該導出された検知対象領域の座標を上記算出されたカメラパラメータを用いて世界座標から画像座標に変換することで画像座標系の検知対象領域を導出する。 According to one aspect of the present invention, a video analysis support device estimates a relevant region and calculates camera parameters based on an input video in an image coordinate system. The device derives the relevant region of the world coordinate system by converting the coordinates of the relevant region from the image coordinates to the world coordinates using the calculated camera parameters, and based on the derived relevant region, the world coordinate system A detection target region is derived, and the coordinates of the derived detection target region are converted from world coordinates to image coordinates using the calculated camera parameters, thereby deriving the detection target region of the image coordinate system.

本発明の一態様によれば、映像が一つの静止画像であり検知対象物が当該静止画像に映っていなくても適切な検知対象領域を導出することが可能となる。 According to one aspect of the present invention, it is possible to derive an appropriate detection target region even if the video is a single still image and the detection target is not shown in the still image.

上記した以外の課題、構成、および効果は、以下の実施例の説明により明らかにされる。 Problems, configurations, and effects other than those described above will be clarified by the following description of the embodiments.

本発明の実施例１の映像解析システムの全体構成図である。1 is an overall configuration diagram of a video analysis system according to Example 1 of the present invention; FIG. 本発明の実施例１の映像解析システムのハードウェア構成図である。1 is a hardware configuration diagram of a video analysis system according to Example 1 of the present invention; FIG. 本発明の実施例１の映像解析システムのシーケンス図である。1 is a sequence diagram of a video analysis system according to Example 1 of the present invention; FIG. 映像撮影装置の世界座標系の撮影範囲の一例を示す図である。FIG. 3 is a diagram showing an example of a shooting range of a world coordinate system of a video shooting device; 図４Ａに示した世界座標系のXY平面の一例を示す図である。4B is a diagram showing an example of the XY plane of the world coordinate system shown in FIG. 4A; FIG. 推定された該当領域の一例を示す図である。It is a figure which shows an example of the estimated pertinent area|region. 画像座標から世界座標に変換後の該当領域の一例を示す図である。It is a figure which shows an example of the corresponding area|region after transform|converting into world coordinates from image coordinates. 世界座標系の検知対象領域の一例を示す図である。It is a figure which shows an example of the detection object area|region of a world coordinate system. 画像座標系の検知対象領域の一例を示す図である。It is a figure which shows an example of the detection target area|region of an image coordinate system. 世界座標系の検知対象領域のＺ座標にオフセットが加算された結果の一例を示す図である。FIG. 8 is a diagram showing an example of a result of adding an offset to the Z coordinate of the detection target area in the world coordinate system; 世界座標系の検知対象領域のＺ座標により小さいオフセットが加算された結果の一例を示す図である。FIG. 10 is a diagram showing an example of the result of adding a smaller offset to the Z coordinate of the detection target area in the world coordinate system; 本発明の実施例１の設定画面例の説明図である。FIG. 10 is an explanatory diagram of an example of a setting screen according to the first embodiment of this invention; 本発明の実施例１の領域確認画面例の説明図である。FIG. 5 is an explanatory diagram of an example of an area confirmation screen according to the first embodiment of this invention; 本発明の実施例２における撮影範囲の一例を示す図である。FIG. 10 is a diagram showing an example of an imaging range in Example 2 of the present invention; 本発明の実施例２における撮影範囲の一例を示す図である。FIG. 10 is a diagram showing an example of an imaging range in Example 2 of the present invention; 本発明の実施例２において推定された該当領域の一例を示す図である。It is a figure which shows an example of the applicable area|region estimated in Example 2 of this invention. 本発明の実施例２における設定ルールの一例を示す図である。It is a figure which shows an example of the setting rule in Example 2 of this invention. 本発明の実施例２における画像座標系の複数の検知対象領域の一例を示す図である。FIG. 10 is a diagram showing an example of a plurality of detection target areas in the image coordinate system according to Example 2 of the present invention; 本発明の実施例２における画像座標系の複数の検知対象領域の一例を示す図である。FIG. 10 is a diagram showing an example of a plurality of detection target areas in the image coordinate system according to Example 2 of the present invention;

以下の説明では、「インターフェース装置」は、一つ以上のインターフェースデバイスでよい。当該一つ以上のインターフェースデバイスは、下記のうちの少なくとも一つでよい。
・一つ以上のＩ／Ｏ（Input/Output）インターフェースデバイス。Ｉ／Ｏ（Input/Output）インターフェースデバイスは、Ｉ／Ｏデバイスと遠隔の表示用計算機とのうちの少なくとも一つに対するインターフェースデバイスである。表示用計算機に対するＩ／Ｏインターフェースデバイスは、通信インターフェースデバイスでよい。少なくとも一つのＩ／Ｏデバイスは、ユーザインターフェースデバイス、例えば、キーボードおよびポインティングデバイスのような入力デバイスと、表示デバイスのような出力デバイスとのうちのいずれでもよい。
・一つ以上の通信インターフェースデバイス。一つ以上の通信インターフェースデバイスは、一つ以上の同種の通信インターフェースデバイス（例えば一つ以上のＮＩＣ（Network Interface Card））であってもよいし二つ以上の異種の通信インターフェースデバイス（例えばＮＩＣとＨＢＡ（Host Bus Adapter））であってもよい。 In the following description, an "interface device" may be one or more interface devices. The one or more interface devices may be at least one of the following:
- One or more I/O (Input/Output) interface devices. An I/O (Input/Output) interface device is an interface device for at least one of an I/O device and a remote display computer. The I/O interface device to the display computer may be a communications interface device. The at least one I/O device may be any of a user interface device, eg, an input device such as a keyboard and pointing device, and an output device such as a display device.
- One or more communication interface devices. The one or more communication interface devices may be one or more of the same type of communication interface device (e.g., one or more NICs (Network Interface Cards)) or two or more different types of communication interface devices (e.g., NIC and It may be an HBA (Host Bus Adapter).

また、以下の説明では、「メモリ」は、一つ以上のメモリデバイスであり、典型的には主記憶デバイスでよい。メモリにおける少なくとも一つのメモリデバイスは、揮発性メモリデバイスであってもよいし不揮発性メモリデバイスであってもよい。 Also, in the following description, "memory" may be one or more memory devices, typically a main memory device. At least one memory device in the memory may be a volatile memory device or a non-volatile memory device.

また、以下の説明では、「永続記憶装置」は、一つ以上の永続記憶デバイスである。永続記憶デバイスは、典型的には、不揮発性の記憶デバイス（例えば補助記憶デバイス）であり、具体的には、例えば、ＨＤＤ（Hard Disk Drive）またはＳＳＤ（Solid State Drive）である。 Also, in the following description, a "persistent storage device" is one or more persistent storage devices. A permanent storage device is typically a non-volatile storage device (for example, an auxiliary storage device), and specifically, for example, an HDD (Hard Disk Drive) or SSD (Solid State Drive).

また、以下の説明では、「記憶装置」は、メモリと永続記憶装置の少なくともメモリでよい。 Also, in the following description, "storage" may be at least memory of memory and persistent storage.

また、以下の説明では、「プロセッサ」は、一つ以上のプロセッサデバイスである。少なくとも一つのプロセッサデバイスは、典型的には、ＣＰＵ（Central Processing Unit）のようなマイクロプロセッサデバイスであるが、ＧＰＵ（Graphics Processing Unit）のような他種のプロセッサデバイスでもよい。少なくとも一つのプロセッサデバイスは、シングルコアでもよいしマルチコアでもよい。少なくとも一つのプロセッサデバイスは、プロセッサコアでもよい。少なくとも一つのプロセッサデバイスは、処理の一部または全部を行うハードウェア回路（例えばＦＰＧＡ（Field-Programmable Gate Array）またはＡＳＩＣ（Application Specific Integrated Circuit））といった広義のプロセッサデバイスでもよい。 Also, in the following description, a "processor" is one or more processor devices. The at least one processor device is typically a microprocessor device such as a CPU (Central Processing Unit), but may be another type of processor device such as a GPU (Graphics Processing Unit). At least one processor device may be single-core or multi-core. At least one processor device may be a processor core. At least one processor device may be a broadly defined processor device such as a hardware circuit (for example, FPGA (Field-Programmable Gate Array) or ASIC (Application Specific Integrated Circuit)) that performs part or all of processing.

また、以下の説明では、「ｘｘｘテーブル」といった表現にて、入力に対して出力が得られる情報を説明することがあるが、当該情報は、どのような構造のデータでもよいし、入力に対する出力を発生するニューラルネットワークのような学習モデルでもよい。従って、「ｘｘｘテーブル」を「ｘｘｘ情報」と言うことができる。また、以下の説明において、各テーブルの構成は一例であり、一つのテーブルは、二つ以上のテーブルに分割されてもよいし、二つ以上のテーブルの全部または一部が一つのテーブルであってもよい。 In the following description, the expression "xxx table" may be used to describe information that provides an output for an input. A learning model such as a neural network that generates Therefore, the "xxx table" can be called "xxx information". Also, in the following description, the configuration of each table is an example, and one table may be divided into two or more tables, or all or part of two or more tables may be one table. may

また、以下の説明では、「プログラム」を主語として処理を説明する場合があるが、プログラムは、プロセッサによって実行されることで、定められた処理を、適宜に記憶装置および／またはインターフェース装置等を用いながら行うため、処理の主語が、プロセッサ（或いは、そのプロセッサを有するコントローラのようなデバイス）とされてもよい。プログラムは、プログラムソースから計算機のような装置にインストールされてもよい。プログラムソースは、例えば、プログラム配布サーバまたは計算機が読み取り可能な（例えば非一時的な）記録媒体であってもよい。また、以下の説明において、二つ以上のプログラムが一つのプログラムとして実現されてもよいし、一つのプログラムが二つ以上のプログラムとして実現されてもよい。 Further, in the following explanation, the processing may be explained with the subject of "program", but the program is executed by the processor, so that the specified processing can be performed by the storage device and/or the interface device as appropriate. As it occurs while in use, the subject of processing may be a processor (or a device, such as a controller, having that processor). A program may be installed on a device, such as a computer, from a program source. The program source may be, for example, a program distribution server or a computer-readable (eg, non-temporary) recording medium. Also, in the following description, two or more programs may be implemented as one program, and one program may be implemented as two or more programs.

また、以下の説明では、「ｋｋｋ部」の表現にて機能を説明することがあるが、機能は、一つ以上のコンピュータプログラムがプロセッサによって実行されることで実現されてもよいし、一つ以上のハードウェア回路（例えばＦＰＧＡまたはＡＳＩＣ）によって実現されてもよい。プログラムがプロセッサによって実行されることで機能が実現される場合、定められた処理が、適宜に記憶装置および／またはインターフェース装置等を用いながら行われるため、機能はプロセッサの少なくとも一部とされてもよい。機能を主語として説明された処理は、プロセッサあるいはそのプロセッサを有する装置が行う処理としてもよい。プログラムは、プログラムソースからインストールされてもよい。プログラムソースは、例えば、プログラム配布計算機または計算機が読み取り可能な記録媒体（例えば非一時的な記録媒体）であってもよい。各機能の説明は一例であり、複数の機能が一つの機能にまとめられたり、一つの機能が複数の機能に分割されたりしてもよい。 In addition, in the following description, the function may be described using the expression “kkk unit”, but the function may be realized by executing one or more computer programs by a processor, or may be realized by executing one or more computer programs. It may be realized by the above hardware circuits (FPGA or ASIC, for example). When a function is realized by executing a program by a processor, the defined processing is performed while appropriately using a storage device and/or an interface device, etc., so the function may be at least part of the processor. good. A process described with a function as the subject may be a process performed by a processor or a device having the processor. Programs may be installed from program sources. The program source may be, for example, a program distribution computer or a computer-readable recording medium (for example, a non-temporary recording medium). The description of each function is an example, and multiple functions may be combined into one function, or one function may be divided into multiple functions.

また、以下の説明では、「映像解析支援装置」は、一つ以上の計算機で構成されてよい。具体的には、例えば、計算機が表示デバイスを有していて計算機が自分の表示デバイスに情報を表示する場合、当該計算機が映像解析支援装置でよい。また、例えば、第１計算機（例えばサーバ計算機）が表示用情報を遠隔の第２計算機（表示用計算機（例えばクライアント計算機））に送信し表示用計算機がその情報を表示する場合（第１計算機が第２計算機に情報を表示する場合）、第１計算機と第２計算機とのうちの少なくとも第１計算機が映像解析支援装置でよい。つまり、映像解析支援装置が「表示用情報を表示する」ことは、当該装置が有する表示デバイスに表示用情報を表示することであってもよいし、当該装置が表示用計算機に表示用情報を送信することであってもよい（後者の場合は表示用計算機によって表示用情報が表示される）。 Also, in the following description, the “video analysis support device” may be configured with one or more computers. Specifically, for example, when a computer has a display device and displays information on its own display device, the computer may be the video analysis support device. Also, for example, when the first computer (eg server computer) transmits display information to a remote second computer (display computer (eg client computer)) and the display computer displays the information (the first computer When displaying information on the second computer), at least the first computer out of the first computer and the second computer may be the video analysis support device. In other words, the fact that the video analysis support device "displays the display information" may be displaying the display information on the display device of the device, or the device may display the display information on the display computer. (In the latter case, the information for display is displayed by a computer for display).

以下、図面を参照して本発明の実施例を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の実施例１の映像解析システム１００の全体構成図である。 FIG. 1 is an overall configuration diagram of a video analysis system 100 of Example 1 of the present invention.

本実施例の映像解析システム１００は、映像撮影装置１０１、映像記憶装置１０２、入力デバイス１０３、表示デバイス１０４、およびサーバ計算機１１０を備える。 A video analysis system 100 of this embodiment comprises a video camera 101 , a video storage device 102 , an input device 103 , a display device 104 and a server computer 110 .

映像撮影装置１０１は、映像を撮影して映像データを作成し、それを出力する装置である。映像撮影装置１０１は、ビデオカメラ、スチルカメラ、高感度カメラ、低照度カメラ、暗視カメラ、サーマルカメラ、Ｘ線カメラなどであってもよいし、これらのうちのいずれかを含む複数台で構成されてもよい。つまり、映像撮影装置１０１は、少なくとも一台のカメラを有する装置（例えば一台のカメラそれ自体）でよい。 The image capturing device 101 is a device that captures an image, creates image data, and outputs the image data. The image capturing device 101 may be a video camera, a still camera, a high-sensitivity camera, a low-illuminance camera, a night-vision camera, a thermal camera, an X-ray camera, etc., or may be composed of a plurality of units including any of these. may be That is, the image capture device 101 may be a device having at least one camera (eg, one camera itself).

映像記憶装置１０２は、映像データを保存し、要求に応じて出力する記憶装置であり、コンピュータ内蔵のハードディスクドライブ、または、ＮＡＳ（Network Attached Storage）もしくはＳＡＮ（Storage Area Network）などのネットワークで接続されたストレージシステムを用いて構成することができる。 The video storage device 102 is a storage device that stores video data and outputs it in response to a request. can be configured using a storage system with

映像撮影装置１０１または映像記憶装置１０２から出力された映像は、いずれもサーバ計算機１１０の映像入力部１１１に入力される。映像解析システム１００は、図１に示すように映像撮影装置１０１および映像記憶装置１０２の両方を備えてもよいが、いずれか一方のみを備えてもよい。映像解析システム１００が映像撮影装置１０１および映像記憶装置１０２の両方を備える場合、映像入力部１１１への映像データの入力元が必要に応じて映像撮影装置１０１または映像記憶装置１０２に切り替えられてもよいし、映像撮影装置１０１から出力された映像データが一旦映像記憶装置１０２に記憶され、そこから映像入力部１１１に入力されてもよい。その場合、映像記憶装置１０２は、例えば、映像撮影装置１０１から継続的に入力される映像データを一時的に保持するキャッシュメモリであってもよい。 Images output from the image capturing device 101 or the image storage device 102 are both input to the image input unit 111 of the server computer 110 . The image analysis system 100 may include both the image capturing device 101 and the image storage device 102 as shown in FIG. 1, or may include only one of them. When the video analysis system 100 includes both the video shooting device 101 and the video storage device 102, even if the input source of the video data to the video input unit 111 is switched to the video shooting device 101 or the video storage device 102 as necessary. Alternatively, the image data output from the image capturing device 101 may be temporarily stored in the image storage device 102 and input to the image input unit 111 from there. In that case, the video storage device 102 may be, for example, a cache memory that temporarily holds video data continuously input from the video shooting device 101 .

なお、映像記憶装置１０２に保存される映像データ及び映像撮影装置１０１によって作成される映像データは、どのような形式のデータであってもよい。例えば、映像撮影装置１０１がビデオカメラであり、それによって撮影された動画像データが映像データとして出力されてもよいし、そのような映像データが映像記憶装置１０２に記憶されてもよい。あるいは、映像撮影装置１０１がスチルカメラであり、それによって所定の間隔（少なくとも撮影された物体を追跡できる程度の間隔）で撮影された一連の静止画像データが映像データとして出力されてもよいし、そのような映像データが映像記憶装置１０２に記憶されてもよい。 Note that the image data stored in the image storage device 102 and the image data created by the image capturing device 101 may be data of any format. For example, the image capturing device 101 may be a video camera, and moving image data captured by it may be output as image data, or such image data may be stored in the image storage device 102 . Alternatively, the image capturing device 101 may be a still camera, and a series of still image data captured at predetermined intervals (at least intervals at which the captured object can be tracked) may be output as image data. Such video data may be stored in video storage device 102 .

入力デバイス１０３は、マウス、キーボード、タッチデバイスなど、ユーザの操作をサーバ計算機１１０に伝えるための入力インターフェースである。表示デバイス１０４は、液晶ディスプレイなどの出力インターフェースであり、サーバ計算機１１０の映像解析結果の表示、ユーザとの対話的操作などのために用いられる。例えば、いわゆるタッチパネル等を用いることによって入力デバイス１０３と表示デバイス１０４は一体化されてもよい。 The input device 103 is an input interface, such as a mouse, keyboard, or touch device, for transmitting user operations to the server computer 110 . The display device 104 is an output interface such as a liquid crystal display, and is used for displaying the video analysis results of the server computer 110 and interactive operations with the user. For example, the input device 103 and the display device 104 may be integrated by using a so-called touch panel or the like.

サーバ計算機１１０は、映像解析支援装置の一例である。サーバ計算機１１０は、入力された映像データが示す画像座標系の映像における物体検知処理の対象とする領域を限定するためのマスク領域（画像座標系の検知対象領域）を生成（導出）し、入力された映像データがのうち生成されたマスク領域内について映像解析処理（検知対象物がマスク領域に存在するか否かを判断することを含んだ処理）を行う装置として機能する。なお、サーバ計算機１１０は、入力された映像データが示す映像における物体検知処理の対象とする領域を限定するためのマスク領域の生成を行うが映像解析処理を行わない装置として機能してもよい。 The server computer 110 is an example of a video analysis support device. The server computer 110 generates (derives) a mask region (detection target region in the image coordinate system) for limiting the region targeted for object detection processing in the video in the image coordinate system indicated by the input video data, and inputs It functions as a device that performs image analysis processing (processing including determining whether or not the object to be detected exists in the mask area) within the generated mask area of the generated image data. Note that the server computer 110 may function as a device that generates a mask area for limiting an area targeted for object detection processing in an image indicated by input image data but does not perform image analysis processing.

サーバ計算機１１０が扱う映像は、一箇所以上の場所で撮影された定点観測の映像であってもよいし、車載カメラ、ドローン搭載のカメラ、ウェアラブルカメラ、アクションカメラなどのような移動体に設置されたカメラによって撮影された映像であってもよい。 The images handled by the server computer 110 may be fixed-point observation images captured at one or more locations, or may be images installed on mobile objects such as in-vehicle cameras, drone-mounted cameras, wearable cameras, action cameras, and the like. It may also be a video shot by a camera that has been used.

サーバ計算機１１０は、映像入力部１１１、カメラパラメータ算出部１２１、領域推定部１２３、領域導出部３０（例えば、座標変換部１２２および領域設定部１２４）、領域ストア１２５、映像解析部１３１および管理情報ストア１０を備える。 The server computer 110 includes an image input unit 111, a camera parameter calculation unit 121, an area estimation unit 123, an area derivation unit 30 (for example, a coordinate conversion unit 122 and an area setting unit 124), an area store 125, an image analysis unit 131, and management information. A store 10 is provided.

映像入力部１１１は、映像撮影装置１０１によって撮影された映像データを受け取るか、または、映像記憶装置１０２から映像データを読み出し、それをサーバ計算機１１０内部で使用するデータ形式に変換する。具体的には、例えば、映像入力部１１１は、動画像（動画データ形式）をフレーム（静止画データ形式）に分解する動画デコード処理を行う。得られたフレームは、カメラパラメータ算出部１２１、領域推定部１２３、および映像解析部１３１に送られる。すなわち、本実施例では、映像入力部１１１から入力される映像は、静止画像である。 The video input unit 111 receives video data shot by the video shooting device 101 or reads video data from the video storage device 102 and converts it into a data format used inside the server computer 110 . Specifically, for example, the video input unit 111 performs video decoding processing for decomposing a moving image (moving image data format) into frames (still image data format). The obtained frames are sent to camera parameter calculator 121 , region estimator 123 , and video analyzer 131 . That is, in this embodiment, the video input from the video input unit 111 is a still image.

カメラパラメータ算出部１２１は、映像入力部１１１から受け付けた映像を基に、カメラパラメータの推定を行う。座標変換部１２２は、領域設定部１２４よりカメラパラメータと座標を受け付け、画像座標と世界座標の相互変換処理を実施する。画像座標とは、画像平面上の座標（画像座標系の座標）であり、世界座標とは画像中におさめられた被写体が存在する空間上の座標（世界座標系の画像）である。座標変換部１２２は、領域設定部１２４から受け付けた座標に対する座標変換処理を行い、変換された座標は領域設定部１２４に送られる。本実施例では、カメラパラメータ算出部１２１は、映像が受け付けられる都度に当該映像を基にカメラパラメータを算出するが、本発明では、例えば下記のうちの少なくとも一つが採用されてもよい。
・カメラパラメータ算出部１２１が無くてもよい。この場合、全てのカメラの各々について、予めカメラパラメータは決まっていてよい。少なくとも一つのカメラについて、カメラパラメータの少なくとも一部は、例えば後述するようにユーザにより手動で変更されてもよい。
・一部のカメラについては、映像が受け付けられる都度に当該映像を基にカメラパラメータ算出部１２１によりカメラパラメータが算出され、残りのカメラについては、カメラパラメータ算出部１２１は動作せず予めカメラパラメータは決まっていてよい。予め決まっているカメラパラメータの少なくとも一部は、例えば後述するようにユーザにより手動で変更されてもよい。 The camera parameter calculator 121 estimates camera parameters based on the image received from the image input unit 111 . The coordinate transformation unit 122 receives camera parameters and coordinates from the area setting unit 124 and performs mutual transformation processing between image coordinates and world coordinates. The image coordinates are the coordinates on the image plane (coordinates of the image coordinate system), and the world coordinates are the coordinates of the space in which the subject in the image exists (image of the world coordinate system). Coordinate transformation unit 122 performs coordinate transformation processing on the coordinates received from area setting unit 124 , and the transformed coordinates are sent to area setting unit 124 . In the present embodiment, the camera parameter calculation unit 121 calculates camera parameters based on an image each time an image is received. However, in the present invention, at least one of the following may be employed.
- The camera parameter calculator 121 may be omitted. In this case, camera parameters may be determined in advance for each camera. For at least one camera, at least some of the camera parameters may be changed manually by a user, for example as described below.
・For some cameras, the camera parameters are calculated by the camera parameter calculation unit 121 based on the video each time the video is received. You can decide. At least some of the predetermined camera parameters may be manually changed by the user, for example, as described below.

カメラパラメータとは、座標変換部１２２が画像座標と世界座標の変換を行うために必要な数値データであり、カメラの姿勢（例えば、世界座標上のカメラ位置、カメラのパン角、カメラのチルト角、およびカメラのロール角）、カメラの画角（焦点距離）、撮像する画像のアスペクト比などと紐づいている。画像座標（u,v）と世界座標（X,Y,Z）は（数１）により相互に変換が可能であり、（数１）の行列Ｃがカメラパラメータである。カメラパラメータの導出には公知のいかなる手法を用いてもよい。例えば、世界座標が既知の構造物が含まれるフレーム複数枚を映像入力部１１１より受け付け、世界座標と画像座標の対応点のデータセットを基にバンドル調整法による計算を行うことで求めてもよい。

Camera parameters are numerical data necessary for the coordinate transformation unit 122 to transform image coordinates and world coordinates, and are camera orientations (for example, camera position on world coordinates, camera pan angle, camera tilt angle , and roll angle of the camera), the angle of view (focal length) of the camera, the aspect ratio of the captured image, and so on. Image coordinates (u, v) and world coordinates (X, Y, Z) can be mutually converted by (Equation 1), and matrix C of (Equation 1) is a camera parameter. Any known technique may be used to derive the camera parameters. For example, a plurality of frames including a structure whose world coordinates are known may be received from the video input unit 111, and calculation may be performed using the bundle adjustment method based on a data set of corresponding points between world coordinates and image coordinates. .

領域推定部１２３は、映像入力部１１１より受け付けた映像を基に、機械学習の手法を適用して、予め設定された領域、または、入力デバイス１０３によりユーザに指定された領域である該当領域を推定する。 Based on the image received from the image input unit 111, the area estimation unit 123 applies a machine learning technique to determine a corresponding area, which is a preset area or an area specified by the user using the input device 103. presume.

領域推定に関する機械学習手法とは、例えば、画像を部分領域に分割し、特徴抽出器により各々の部分領域から画像特徴を抽出し、識別器により画像特徴を基に部分領域画像が所望の領域であるかを識別し、それら部分領域画像の識別結果を統合して元の画像中の所望の領域を提示する領域推定機能について、複数の正解情報を有する画像（訓練データ）を基に、特徴抽出器と識別器のパラメータを自動最適化することで、訓練データに含まれない未知の画像に対しても有効な領域推定機能を獲得する手法である。 A machine learning method related to region estimation, for example, divides an image into partial regions, extracts image features from each partial region with a feature extractor, and classifies the partial region image in a desired region based on the image features with a classifier. A region estimation function that identifies whether there is a partial region image and presents a desired region in the original image by integrating the identification results of these partial region images. By automatically optimizing the parameters of the classifier and the classifier, it is a method that acquires an effective region estimation function even for unknown images that are not included in the training data.

このような処理手法の具体例として、Deep Learning（深層学習）が知られている。Deep learningを用いた特徴パラメータの抽出方法の一例としては、畳み込みニューラルネットワークを用いた特徴抽出方法が知られている。畳み込みニューラルネットワークでは、ニューロンユニットと呼ばれる入出力関数（活性化関数）が、小画像領域毎に多数組み合わされており、さらにこれが複数の層状に積み重ねられてピラミッド構造になっている。この方法によれば、検知対象とする対象物の位置や画像サイズを変えながら、段階的に対象物を識別できるようにニューロンユニットの各層毎に識別器パラメータを抽出して、最終的には、対象物全体を識別可能な識別器パラメータを得ることができる。畳み込みニューラルネットワークを応用した領域推定手法は、一般に意味領域推定（セマンティックセグメンテーション）と呼ばれており、具体的なネットワークモデルとして、FCN、U-Net、SegNet、PSPNet、Mask R-CNNなどが知られている。 Deep learning is known as a specific example of such a processing technique. As an example of a feature parameter extraction method using deep learning, a feature extraction method using a convolutional neural network is known. In a convolutional neural network, a large number of input/output functions (activation functions) called neuron units are combined for each small image area, and these are stacked in multiple layers to form a pyramid structure. According to this method, classifier parameters are extracted for each layer of the neuron unit so that the object can be identified step by step while changing the position and image size of the object to be detected, and finally, It is possible to obtain discriminator parameters that can discriminate the entire object. Region estimation methods that apply convolutional neural networks are generally called semantic region estimation (semantic segmentation), and specific network models include FCN, U-Net, SegNet, PSPNet, and Mask R-CNN. ing.

領域設定部１２４は、領域推定部１２３が推定した該当領域に、予め用意された設定ルール、または、入力デバイス１０３によりユーザから指定された設定ルールを適用し、座標変換部１２２による座標変換を活用して検知対象領域を導出する。領域設定部１２４により導出された検知対象領域は、領域ストア１２５に保存される。なお、サーバ計算機１１０が映像解析部１３１を有しない場合には、領域設定部１２４により導出された検知対象領域は、ファイル出力されてもよいし、外部のシステムに送られてもよい。検知対象領域の導出手順の詳細は後述する。また、領域設定部１２４は、領域ストア１２５内の画像座標系の検知対象領域を表示デバイス１０４に表示してもよい。 The area setting unit 124 applies a setting rule prepared in advance or a setting rule specified by the user through the input device 103 to the corresponding area estimated by the area estimation unit 123, and utilizes coordinate transformation by the coordinate transformation unit 122. to derive the detection target area. The detection target area derived by the area setting unit 124 is stored in the area store 125 . If the server computer 110 does not have the video analysis unit 131, the detection target area derived by the area setting unit 124 may be output as a file or sent to an external system. The details of the detection target area derivation procedure will be described later. Also, the area setting unit 124 may display the detection target area in the image coordinate system in the area store 125 on the display device 104 .

映像解析部１３１は、映像入力部１１１より受け付けたフレーム（静止画像としての映像）に対し、領域ストア１２５より取得された検知対象領域内について映像解析処理を実施する。映像解析部１３１が実施する映像解析処理は、検知対象領域に検知対象物が存在するか否かの判断を含む検知処理でもよいし、その他の処理（例えば、物体追跡処理、特徴抽出処理など）を含んでもよい。ここで言う物体は、人物、動物、車両、船舶、荷物など、あらゆる一般的な物体でなく、それらのうちいずれの物体も検知対象物として指定されてもよい。映像解析部１３１における映像解析結果は、表示に適切な情報に加工されたのち、表示デバイス１０４に送られる。 The video analysis unit 131 performs video analysis processing on the frame (video as a still image) received from the video input unit 111 within the detection target area acquired from the area store 125 . The video analysis processing performed by the video analysis unit 131 may be detection processing including determination of whether or not a detection target exists in the detection target area, or other processing (for example, object tracking processing, feature extraction processing, etc.). may include The objects referred to herein are not all common objects such as people, animals, vehicles, ships, and luggage, but any of them may be designated as a detection target. The video analysis result in the video analysis unit 131 is processed into information suitable for display, and then sent to the display device 104 .

また、映像解析部１３１は、映像入力部１１１より受け付けたフレーム全体に対し映像解析処理を実施し、映像解析処理により得られた結果が検知対象領域に収まらない場合、その情報を棄却してもよい。例えば、映像解析部１３１が映像入力部１１１より受け付けたフレームに対し物体検知処理を行う場合、映像解析部１３１は、下記を含むフィルタリング処理を実施してもよい。
・検知結果の領域が領域ストア１２５より受け付けた検知対象領域内に収まる場合は、検知結果を出力する。
・取得された任意の物体の検知結果の領域が領域ストア１２５より受け付けた検知対象領域内に収まらない場合は、検知結果を出力しない。 Further, the image analysis unit 131 performs image analysis processing on the entire frame received from the image input unit 111, and if the result obtained by the image analysis processing does not fit in the detection target area, the information can be discarded. good. For example, when the video analysis unit 131 performs object detection processing on frames received from the video input unit 111, the video analysis unit 131 may perform filtering processing including the following.
- If the detection result area fits within the detection target area received from the area store 125, the detection result is output.
- If the acquired detection result area of any object does not fit within the detection target area received from the area store 125, the detection result is not output.

図２は、本発明の実施例１の映像解析システム１００のハードウェア構成図である。 FIG. 2 is a hardware configuration diagram of the video analysis system 100 of Example 1 of the present invention.

サーバ計算機１１０は、例えば、相互に接続されたプロセッサ２０１および記憶装置２０２を有する一般的な計算機である。プロセッサ２０１は、映像解析処理の演算が可能な任意の処理装置によって構成される。例えば、プロセッサ２０１は、ＣＰＵ、ＧＰＵ、ＦＰＧＡ、ＡＳＩＣのいずれかを含んでもよい。記憶装置２０２は任意の種類の記憶媒体によって構成される。例えば、記憶装置２０２は、半導体メモリ及びハードディスクドライブを含んでもよい。 The server computer 110 is, for example, a general computer having processors 201 and storage devices 202 interconnected. The processor 201 is configured by any processing device capable of computation of video analysis processing. For example, processor 201 may include any of a CPU, GPU, FPGA, and ASIC. Storage device 202 comprises any type of storage medium. For example, storage device 202 may include semiconductor memory and hard disk drives.

この例において、図１に示した映像入力部１１１、カメラパラメータ算出部１２１、座標変換部１２２、領域推定部１２３、領域設定部１２４、映像解析部１３１といった機能部は、プロセッサ２０１が記憶装置２０２に格納された処理プログラム２０３を実行することによって実現される。言い換えると、この例において、上記の各機能部が実行する処理は、実際には、処理プログラム２０３に記述された命令に従うプロセッサ２０１によって実行される。また、領域ストア１２５および管理情報ストア１０の各々は、記憶装置２０２に含まれる記憶領域でよい。管理情報ストア１０には、設定ルールを示す情報を含んだ管理情報が格納される。管理情報は、例えばカメラパラメータを含んでよい。 In this example, functional units such as the image input unit 111, the camera parameter calculation unit 121, the coordinate transformation unit 122, the area estimation unit 123, the area setting unit 124, and the image analysis unit 131 shown in FIG. It is realized by executing the processing program 203 stored in the . In other words, in this example, the processing executed by each of the above functional units is actually executed by processor 201 according to instructions written in processing program 203 . Also, each of the region store 125 and the management information store 10 may be a storage region included in the storage device 202 . The management information store 10 stores management information including information indicating setting rules. Management information may include, for example, camera parameters.

サーバ計算機１１０は、さらに、プロセッサに接続されたネットワークインターフェース装置（ＮＩＦ）２０４を含む。映像撮影装置１０１は、例えば、ネットワークインターフェース装置２０４を介してサーバ計算機１１０に接続される。映像記憶装置１０２は、ネットワークインターフェース装置２０４を介してサーバ計算機１１０に接続されたＮＡＳまたはＳＡＮであってもよいし、記憶装置２０２に含まれてもよい。 Server computer 110 further includes a network interface device (NIF) 204 connected to the processor. The image capturing device 101 is connected to the server computer 110 via the network interface device 204, for example. The video storage device 102 may be NAS or SAN connected to the server computer 110 via the network interface device 204 or may be included in the storage device 202 .

入力デバイス１０３および表示デバイス１０４は、サーバ計算機１１０に接続されたクライアント計算機が有する入力デバイスおよび表示デバイスでよい。 The input device 103 and display device 104 may be the input device and display device of the client computer connected to the server computer 110 .

図３は、本発明の実施例１の映像解析システム１００のシーケンス図である。 FIG. 3 is a sequence diagram of the video analysis system 100 of Example 1 of the present invention.

図３を用いて、映像解析システム１００の各構成要素の動作シーケンスについて述べる。 The operation sequence of each component of the video analysis system 100 will be described with reference to FIG.

はじめに、映像入力部１１１は、映像撮影装置１０１または映像記憶装置１０２より受け付けた映像からフレームを抽出する。抽出されたフレームは、カメラパラメータ算出部１２１（ステップＳ３０１）および領域推定部１２３（ステップＳ３０２）に送られる。なお、映像入力部１１１は、映像が背景以外の物体を含む場合、背景のみからなるフレームの選別処理、または、映像データの学習による背景画像に相当するフレームの生成処理により背景画像に相当するフレームを導出してもよい。この場合、ステップＳ３０１およびステップＳ３０２では、導出された背景画像に相当するフレームがカメラパラメータ算出部１２１および領域推定部１２３に送られる。 First, the image input unit 111 extracts frames from the image received from the image capturing device 101 or the image storage device 102 . The extracted frames are sent to the camera parameter calculator 121 (step S301) and the region estimator 123 (step S302). Note that when the video includes an object other than the background, the video input unit 111 selects a frame consisting only of the background, or generates a frame corresponding to the background image by learning video data. can be derived. In this case, in steps S301 and S302, the derived frame corresponding to the background image is sent to the camera parameter calculator 121 and the region estimator 123. FIG.

次に、カメラパラメータ算出部１２１は、映像入力部１１１より受け付けたフレームを基にカメラパラメータを算出する。算出されたカメラパラメータは領域設定部１２４に送られる（ステップＳ３０３）。カメラパラメータの算出において、入力デバイス１０３によるユーザの入力情報を用いる場合は、入力デバイス１０３によるユーザの入力情報がカメラパラメータ算出部１２１に送られる（ステップＳ３０４）。あるいは、カメラパラメータ算出部１２１は、予め記憶装置２０２内に格納されたカメラパラメータ算出に必要な情報（例えばカメラ姿勢および画角）を読み出してカメラパラメータを算出してもよい。なお、カメラパラメータ算出部１２１は、カメラパラメータの算出を行わず、予め記憶装置２０２内に格納されたカメラパラメータを読み出して使用してもよい。この場合、映像入力部１１１より受け付けたフレームに対応するカメラパラメータを読み出すものとする。 Next, the camera parameter calculator 121 calculates camera parameters based on the frames received from the video input unit 111 . The calculated camera parameters are sent to the area setting unit 124 (step S303). When the user's input information from the input device 103 is used in calculating the camera parameters, the user's input information from the input device 103 is sent to the camera parameter calculation unit 121 (step S304). Alternatively, the camera parameter calculation unit 121 may read information necessary for camera parameter calculation (for example, camera posture and angle of view) stored in advance in the storage device 202 to calculate camera parameters. Note that the camera parameter calculation unit 121 may read and use camera parameters stored in advance in the storage device 202 without calculating the camera parameters. In this case, it is assumed that the camera parameters corresponding to the frame received from the video input unit 111 are read.

次に、領域推定部１２３は、映像入力部１１１より受け付けたフレームに対し、予め設定された領域を推定する。領域推定部１２３は、単一の領域を推定してもよいし、同一種類の複数の領域を推定してもよいし、異種の複数の領域を推定してもよい。推定された領域は、領域設定部１２４に送られる（ステップＳ３０５）。なお、領域設定部１２４に送られる領域は、どのようなデータ形式であってもよい。例えば、領域を示す画像であってもよいし、領域の輪郭を示す点集合であってもよい。 Next, the area estimation unit 123 estimates a preset area for the frame received from the video input unit 111 . The area estimation unit 123 may estimate a single area, multiple areas of the same type, or multiple areas of different types. The estimated area is sent to the area setting unit 124 (step S305). Note that the area sent to the area setting unit 124 may be in any data format. For example, it may be an image showing the area, or a set of points showing the outline of the area.

領域設定部１２４は、領域推定部１２３より受け付けた領域とカメラパラメータ算出部１２１より受け付けたカメラパラメータを基に、検知対象領域を導出する。導出においてユーザの入力が必要な場合、入力デバイス１０３によるユーザの入力を受け付ける（ステップＳ３０６）。また、検知対象領域の導出において、領域を構成する点の画像座標と世界座標の変換を行うため、領域設定部１２４は、領域を構成する点の座標とカメラパラメータを座標変換部１２２に通知し、画像座標から世界座標へ、または世界座標から画像座標へ変換された結果を受け取る（ステップＳ３０７）。なお、領域設定部１２４から座標変換部１２２に通知される領域を構成する点の座標は、領域内に含まれる座標から等間隔に抜粋した座標の集合であってもよいし、領域を示す輪郭線上の座標の集合であってもよい。輪郭線の座標の集合を用いる場合には、領域設定部１２４は領域推定部１２３より受け付けた領域に対して輪郭線抽出処理を実施する。ステップＳ３０７は、図３では一度の記載となっているが、複数回実施されてもよい。導出された検知対象領域は領域ストア１２５に送られる（ステップＳ３０８）。なお、ステップＳ３０８と同時に、領域ストア１２５の検知対象領域が更新されたことが、映像解析部１３１に通知されてもよい。また、検知対象領域は、ＮＩＦ２０４を介して、映像解析システム１００とは異なるシステムや装置に送られてもよい。 The region setting unit 124 derives a detection target region based on the region received from the region estimation unit 123 and the camera parameters received from the camera parameter calculation unit 121 . If the derivation requires user input, the input device 103 accepts user input (step S306). In deriving the detection target region, the region setting unit 124 notifies the coordinates of the points forming the region and the camera parameters to the coordinate conversion unit 122 in order to convert the image coordinates of the points forming the region and the world coordinates. , the result of the transformation from image coordinates to world coordinates or from world coordinates to image coordinates is received (step S307). Note that the coordinates of the points forming the area notified from the area setting unit 124 to the coordinate conversion unit 122 may be a set of coordinates extracted at equal intervals from the coordinates included in the area, or may be a set of coordinates extracted at equal intervals from the coordinates included in the area. It may be a set of coordinates on a line. When using a set of contour line coordinates, the region setting unit 124 performs contour line extraction processing on the region received from the region estimation unit 123 . Step S307 is described once in FIG. 3, but may be performed multiple times. The derived detection target area is sent to the area store 125 (step S308). At the same time as step S308, the video analysis unit 131 may be notified that the detection target area in the area store 125 has been updated. Also, the detection target area may be sent to a system or device different from the video analysis system 100 via the NIF 204 .

上記のステップＳ３０１～Ｓ３０８の処理により、検知対象領域の導出および登録が完了する（図３において“Ｒ”）。上記の処理により、映像解析システム１００は、検知対象物が映像に映っていなくても当該映像から検知対象物の出現する可能性ある領域である検知対象領域を導出することが可能である。 Derivation and registration of the detection target area are completed by the above steps S301 to S308 (“R” in FIG. 3). Through the above processing, the video analysis system 100 can derive a detection target region, which is a region in which the detection target may appear from the video, even if the detection target is not shown in the video.

次に映像解析部１３１は、領域ストア１２５の登録済みの検知対象領域を読み出す（ステップＳ３０９）。映像解析部１３１は、検知対象領域を表示デバイス１０４によりユーザに提示してもよく、入力デバイス１０３を用いたユーザによる検知対象領域の修正を受け付けてもよい（ステップＳ３１０）。 Next, the video analysis unit 131 reads the registered detection target area of the area store 125 (step S309). The video analysis unit 131 may present the detection target area to the user through the display device 104, and may receive correction of the detection target area by the user using the input device 103 (step S310).

次に映像解析部１３１は、映像入力部１１１よりフレームを受け付け（ステップＳ３１１）、フレームの検知対象領域内について映像解析処理を実施する。映像解析結果は、表示デバイス１０４に送られる（ステップＳ３１２）。なお、映像解析部１３１は、映像入力部１１１より受け付けたフレーム全体について映像解析処理を実施した後に、検知対象領域内のみを映像解析結果として選択するフィルタ処理を行ってもよい。また、映像解析部１３１の映像解析結果は、表示デバイス１０４に送られず、記憶装置２０２に保存されてもよいし、ＮＩＦ２０４を介して映像解析システム１００とは異なるシステムに送られてもよい。 Next, the video analysis unit 131 receives a frame from the video input unit 111 (step S311), and performs video analysis processing on the detection target area of the frame. The video analysis result is sent to the display device 104 (step S312). Note that the image analysis unit 131 may perform image analysis processing on the entire frame received from the image input unit 111, and then perform filter processing for selecting only the detection target area as the image analysis result. Also, the video analysis result of the video analysis unit 131 may be saved in the storage device 202 without being sent to the display device 104 or may be sent to a system different from the video analysis system 100 via the NIF 204 .

上記のステップＳ３０１～ステップＳ３１２により、検知対象領域の導出、登録および検知対象領域を用いた映像解析処理が完了する。上記の処理により、映像解析システム１００は、検知対象領域内についての映像解析結果のみを導出することが可能である。 Through steps S301 to S312 described above, the derivation and registration of the detection target area and the video analysis processing using the detection target area are completed. Through the above processing, the video analysis system 100 can derive only the video analysis result within the detection target area.

次に、カメラパラメータ算出部１２１が映像入力部１１１よりフレームを受け付け（ステップＳ３２１）、カメラパラメータを算出し、算出されたパラメータが領域設定部１２４に送られる（ステップＳ３２２）という一連の処理が、周期的あるいは断続的に実施される場合に、領域設定部１２４がカメラパラメータの変動を検知したならば、画像座標系の検知対象領域を変動前のカメラパラメータを用いて座標変換部１２２により世界座標に変換（ステップＳ３２３）し、さらに、世界座標に変換された検知対象領域を変動後のカメラパラメータを用いて座標変換部１２２により画像座標に変換する（ステップＳ３２７）ことにより、カメラパラメータの変動分に追従して、検知対象領域を補正（変更）する。補正された検知対象領域は領域ストア１２５に送られ、領域ストア１２５の保有する検知対象領域が更新されてもよい（ステップ３２４）。ステップＳ３２１～ステップＳ３２４の処理は、ステップＳ３０６においてユーザによる検知対象領域の修正が実施される場合に、カメラの姿勢や画角が変化したとしても、再度のユーザによる修正を省略できるという効果を持つ。 Next, the camera parameter calculation unit 121 receives a frame from the video input unit 111 (step S321), calculates camera parameters, and sends the calculated parameters to the area setting unit 124 (step S322). If the region setting unit 124 detects a change in the camera parameters in the case of periodic or intermittent implementation, the detection target region in the image coordinate system is converted to world coordinates by the coordinate transformation unit 122 using the camera parameters before the change. (step S323), and further, the detection target area converted to the world coordinates is converted to image coordinates by the coordinate conversion unit 122 using the changed camera parameters (step S327). to correct (change) the detection target area. The corrected detection target area may be sent to the area store 125, and the detection target area held by the area store 125 may be updated (step 324). The processing of steps S321 to S324 has the effect that when the user corrects the detection target area in step S306, even if the posture or angle of view of the camera changes, it is possible to omit the correction by the user again. .

図４Ａ～図４Ｈを参照して、本発明の実施例１の映像解析システム１００による領域設定手順の一例を説明する。 An example of the region setting procedure by the video analysis system 100 according to the first embodiment of the present invention will be described with reference to FIGS. 4A to 4H.

図４Ａは、映像撮影装置１０１の世界座標系の撮影範囲の一例を示す。図４Ａによれば、家屋、歩道および車道が被写体となるように街頭に映像撮影装置１０１が設置されている。図４Ａに示した矢印は、世界座標（X, Y, Z）が存在する世界座標系を示す。画像座標（u,v）が存在する画像座標系は、図中記載を省略するが、図４Ａが示すフレーム枠左上を原点とし、原点から右へ水平に延びた軸がu軸となり、原点から鉛直に下へ延びた軸がv軸となる座標系である。 FIG. 4A shows an example of the shooting range of the world coordinate system of the video shooting device 101 . According to FIG. 4A, a video camera 101 is installed on the street so that a house, a sidewalk, and a roadway are subjects. The arrows shown in FIG. 4A indicate the world coordinate system in which the world coordinates (X, Y, Z) exist. The image coordinate system in which the image coordinates (u, v) exist is omitted from the drawing, but the origin is the upper left corner of the frame frame shown in FIG. 4A, and the u axis extends horizontally to the right from the origin. It is a coordinate system in which the vertically downward axis is the v-axis.

図４Ｂは、図４Ａに示した世界座標系のXY平面（Z＝０）に沿って、家屋、歩道および車道の位置関係を示している。図４Ａ中における黒い領域８１は人物であり、領域設定手順には直接関連がないが参考として記載した。 FIG. 4B shows the positional relationship of houses, sidewalks and roadways along the XY plane (Z=0) of the world coordinate system shown in FIG. 4A. A black area 81 in FIG. 4A is a person, and although it is not directly related to the area setting procedure, it is described for reference.

図４Ｃ～図４Ｈを用いて、図４Ａに例示の撮影範囲の背景画像のフレームに対し、映像解析システム１００を用いて検知対象領域を抽出する例について説明する。 4C to 4H, an example of extracting a detection target region using the video analysis system 100 for a frame of a background image in the shooting range illustrated in FIG. 4A will be described.

図４Ｃは、ステップＳ３０５において領域推定部１２３が図４Ａにおける車道領域を推定した結果である。この例では、領域推定部１２３が車道領域を推定する例について記載するが、車道以外の物体を推定対象としてもよい。ただし、領域推定手法として、Deep Learningを応用した意味領域推定手法を活用する場合、場所によらず類似した外観である可能性が高い物体を領域推定対象とすることで、訓練データ収集が行いやすく、かつ、汎化による適用場所の拡大が期待できる。一般に、家屋や歩道に比較して、車道は外観の場所依存性が低いと考えられるため、本実施例においては領域推定部１２３による推定の対象が車道とされている。図４Ａの例以外の場合、領域推定部１２３による領域推定対象は、例えば、商業施設におけるレジや入口、埠頭における海面、高速道路における道路、料金所またはサービスエリア入口、海上付近における橋脚、空または地平線でよい。 FIG. 4C shows the result of estimating the roadway area in FIG. 4A by the area estimation unit 123 in step S305. In this example, an example in which the region estimation unit 123 estimates the roadway region will be described, but an object other than the roadway may also be the estimation target. However, when using a semantic region estimation method that applies deep learning as a region estimation method, it is easier to collect training data by targeting objects that are likely to have similar appearances regardless of location. Moreover, it can be expected to expand the application place by generalization. In general, compared to houses and sidewalks, roadways are considered to be less dependent on the location of their appearance. In cases other than the example in FIG. 4A , the area estimation target by the area estimation unit 123 is, for example, a cash register or entrance at a commercial facility, a sea surface at a wharf, a road, a tollgate or a service area entrance on a highway, a bridge pier near the sea, the sky or Horizon is fine.

図４Ｄの斜線部領域は、ステップＳ３０７により、図４Ｃの車道推定領域が画像座標から世界座標に変換された結果である。図４Ｄに記載のXY平面は、図４Ｃにおける車道推定領域を含む面であり、図４Ｃに記載のXY平面（Z＝０）としている。ただし、画像座標に対する世界座標の取り方は任意であり、図４記載以外の面がXY平面となるようにXYZ座標が定められてもよい。 The shaded area in FIG. 4D is the result of converting the estimated roadway area in FIG. 4C from image coordinates to world coordinates in step S307. The XY plane shown in FIG. 4D is a plane including the estimated roadway area in FIG. 4C, and is the XY plane (Z=0) shown in FIG. 4C. However, the method of setting the world coordinates for the image coordinates is arbitrary, and the XYZ coordinates may be determined so that the planes other than those shown in FIG. 4 are on the XY plane.

図４Ｅの斜線部領域は、領域設定部１２４において図４Ｄの世界座標に変換された車道推定領域を基に、オフセット補正によりその両端を歩道として推定された領域である。図４Ｅの例によれば、世界座標系の該当領域に対するオフセット補正により該当領域を含まない領域が世界座標系の検知対象領域として導出される。図４Ｅの例では、車道推定領域を基にその外周両端の任意の距離の範囲を歩道として指定する例について示したが、領域推定部１２３により推定した領域を参照するものであれば、どのような領域指定手法を用いてもよい。 The hatched area in FIG. 4E is an area in which both ends are estimated as sidewalks by offset correction based on the estimated roadway area converted to the world coordinates in FIG. 4D by the area setting unit 124 . According to the example of FIG. 4E, an area that does not include the relevant area is derived as the detection target area of the world coordinate system by offset correction for the relevant area of the world coordinate system. The example of FIG. 4E shows an example in which an arbitrary range of distances at both ends of the circumference of the estimated roadway area is specified as a sidewalk. Any other area specification method may be used.

図４Ｆの斜線部領域は、ステップＳ３０７により、領域設定部１２４において推定された歩道推定領域を画像座標に変換し、フレーム中に図示したものである。上記の通り、図４Ｃ、図４Ｄ、図４Ｅ、図４Ｆの順に説明した処理によれば、領域推定部１２３により歩道を推定しない場合においても、車道の推定結果を基に歩道領域を推定することが可能となる。 The shaded area in FIG. 4F is a frame obtained by transforming the estimated sidewalk area estimated by the area setting unit 124 into image coordinates in step S307. As described above, according to the processes described in the order of FIGS. 4C, 4D, 4E, and 4F, even when the area estimation unit 123 does not estimate the sidewalk, the sidewalk area can be estimated based on the roadway estimation result. becomes possible.

図４Ｅにおいて、さらに、世界座標のXY平面上の歩道推定領域について、異なるZの値に対応する画像座標を求めることによって、歩道推定領域の上空の画像内の位置を取得することができる。この手法によれば、歩道推定領域上を任意の高さの物体が移動する際の画像上の領域を求めることができる。このように、領域設定部１２４は、このように複数のZ座標において画像座標に変換した歩道推定領域を用いて検知対象領域を導出する。図４Ｇおよび図４Ｈはともに歩道推定領域上を人物が移動する場合のフレーム上の出現範囲を示したものである。図４Ｇは図４Ｈよりも背の高い人物を解析対象とした場合の結果を示している。つまり、図４Ｇの場合は、図４Ｈの場合に比較して、より大きいZにおける歩道推定領域の画像座標を用いて検知対象領域を導出した結果について示している。このように、図４Ｃ、図４Ｄ、図４Ｅ、図４Ｆおよび図４Ｇ、または、図４Ｃ、図４Ｄ、図４Ｅ、図４Ｆ、図４Ｈの順に説明した処理によれば、領域推定部１２３により推定した領域を基に、検知対象のサイズを考慮した検知対象領域の設定が可能となる。 In FIG. 4E, furthermore, for the estimated sidewalk area on the XY plane of the world coordinates, the image coordinates corresponding to different Z values can be obtained to obtain the position in the sky image of the estimated sidewalk area. According to this method, it is possible to obtain an area on the image when an object of arbitrary height moves on the sidewalk estimation area. In this manner, the area setting unit 124 derives the detection target area using the sidewalk estimation area converted into image coordinates at a plurality of Z coordinates. 4G and 4H both show the appearance range on the frame when a person moves on the sidewalk estimation area. FIG. 4G shows the results when a person taller than that in FIG. 4H is analyzed. That is, FIG. 4G shows the result of deriving the detection target area using the image coordinates of the sidewalk estimation area at a larger Z than in the case of FIG. 4H. In this way, according to the processes described in the order of FIGS. Based on the obtained area, it is possible to set the detection target area in consideration of the size of the detection target.

図５は、ユーザがカメラと検知対象に適した検知対象領域設定ルールを設定する際の設定画面である。領域設定部１２４は、設定画面を通じて設定されたルールに基づき、図４を用いて説明した手順で検知対象領域を設定する。また、図５および図６におけるＵＩ（ユーザインターフェース）は、ＧＵＩ（Graphical User Interface）部品でよい。このため、図５および図６に示す画面は、ＧＵＩ画面でよい。 FIG. 5 is a setting screen when the user sets a detection target area setting rule suitable for a camera and a detection target. The region setting unit 124 sets the detection target region according to the procedure described with reference to FIG. 4 based on the rule set through the setting screen. Also, the UI (user interface) in FIGS. 5 and 6 may be a GUI (Graphical User Interface) part. Therefore, the screens shown in FIGS. 5 and 6 may be GUI screens.

図５に示す画面には、例えばＵＩ５０１～５０４がある。 The screen shown in FIG. 5 includes UIs 501 to 504, for example.

はじめに、ユーザはカメラ選択ＵＩ５０１のドロップダウンリストより検知対象ルールを設定したい映像撮影装置１０１または映像記憶装置１０２に含まれるカメラまたは映像を選択する。カメラ選択ＵＩ５０１は、一つ以上のカメラのうちの所望のカメラの選択を受け付けるユーザインターフェースである。 First, the user selects a camera or video included in the video shooting device 101 or video storage device 102 for which a detection target rule is to be set from the dropdown list of the camera selection UI 501 . A camera selection UI 501 is a user interface that receives selection of a desired camera from one or more cameras.

次に、ユーザは、検知対象選択ＵＩ５０２において検知対象領域設定ルールを指定する。ここでは、ユーザはルール名を新規に入力してもよいし、ドロップダウンリストに表示される候補を選択してもよい。 Next, the user specifies a detection target area setting rule on the detection target selection UI 502 . Here, the user may enter a new rule name or select a candidate displayed in the drop-down list.

次に、ユーザは、サイズ入力ＵＩ５０３において、検知対象サイズを入力する。ここでは、検知対象サイズとして高さのみを入力する例を示したが、幅または奥行などを含む複雑な形状について指定してもよく、この場合、設定画面は、検知対象サイズ入力部に相当の入力ＵＩを有するものとする。 Next, the user inputs the detection target size on the size input UI 503 . Here, an example of inputting only the height as the detection target size is shown, but it is also possible to specify a complicated shape including width or depth. Suppose we have an input UI.

次に、ユーザは、ルール指定ＵＩ５０４において、世界座標上で、領域推定部１２３が推定した領域（図中黒色）を基に別の領域を設定するための設定ルールが入力される。図５のルール指定ＵＩ５０４は、領域推定部１２３が推定した領域の左端側に指定した幅の領域が１つと、領域推定部１２３が推定した領域の右端側に指定した幅の領域が１つの合計２つの領域（図中斜線部）を設定するルールを入力している例である。ルール指定ＵＩ５０４には、二次元画像に代えて三次元画像が表示されてよい。検知対象選択ＵＩ５０２、サイズ入力ＵＩ５０３およびルール指定ＵＩ５０４が、設定ＵＩの一例でよい。 Next, the user inputs a setting rule for setting another area based on the area (black in the figure) estimated by the area estimation unit 123 on the world coordinates in the rule specifying UI 504 . The rule specification UI 504 in FIG. 5 is a total of one area with a specified width on the left end side of the area estimated by the area estimation unit 123 and one area with a specified width on the right end side of the area estimated by the area estimation unit 123. This is an example of inputting a rule for setting two areas (hatched areas in the figure). A three-dimensional image may be displayed on the rule designation UI 504 instead of the two-dimensional image. The detection target selection UI 502, the size input UI 503, and the rule designation UI 504 may be examples of setting UIs.

以上の通り、設定画面によりユーザは、領域設定部１２４が検知対象領域を設定するための領域設定ルールを設定することが可能である。 As described above, the setting screen allows the user to set the area setting rule for the area setting unit 124 to set the detection target area.

図６は、ユーザがカメラと検知対象に適した検知対象領域設定ルールが設定されているかを確認するための領域確認画面である。 FIG. 6 is an area confirmation screen for the user to confirm whether the detection target area setting rule suitable for the camera and the detection target is set.

図６に示す画面には、例えばＵＩ５０１～５０３およびＵＩ６０１および６０２がある。カメラ選択ＵＩ５０１、検知対象選択ＵＩ５０２、サイズ入力ＵＩ５０３は、図５と同様であるから説明を省略する。 The screen shown in FIG. 6 includes UIs 501 to 503 and UIs 601 and 602, for example. The camera selection UI 501, detection target selection UI 502, and size input UI 503 are the same as in FIG.

領域ＵＩ６０１には、ステップＳ３０８またはステップＳ３２４によって導出された画像座標系の検知対象領域が表示される。カメラパラメータＵＩ６０２にはステップＳ３０８またはステップＳ３２４における検知対象領域の導出に用いられたカメラパラメータが表示される。なお、図６のカメラパラメータＵＩ６０２では、カメラパラメータとして行列Ｃを表示する例を示したが、公知の手法により、カメラのチルト角、パン角、画角、カメラの設置高さなどのいずれか一つ以上を含むパラメータに変換した値を表示してもよい。 The area UI 601 displays the detection target area in the image coordinate system derived in step S308 or step S324. The camera parameter UI 602 displays the camera parameters used for deriving the detection target area in step S308 or step S324. Note that the camera parameter UI 602 in FIG. 6 shows an example in which the matrix C is displayed as a camera parameter. A value converted to a parameter containing one or more may be displayed.

ユーザは、領域ＵＩ６０１により、領域設定部１２４により導出された検知対象領域が所望の領域となっているか確認することができる。さらに、検知対象領域またはカメラパラメータのいずれかを修正したい場合には、領域ＵＩ６０１に表示された領域やカメラパラメータＵＩ６０２に表示されたカメラパラメータを直接編集することができる。 The user can confirm through the area UI 601 whether the detection target area derived by the area setting unit 124 is a desired area. Furthermore, when it is desired to modify either the detection target area or the camera parameters, the area displayed on the area UI 601 and the camera parameters displayed on the camera parameter UI 602 can be directly edited.

次に、実施例２の映像解析システム１００について説明する。その際、実施例１との相違点を主に説明し、実施例１との共通点については説明を省略または簡略する。 Next, the video analysis system 100 of Example 2 will be described. At that time, the points of difference from the first embodiment will be mainly described, and the explanations of the points in common with the first embodiment will be omitted or simplified.

本実施例では、リバーシブルレーン（中央線変移）とバス専用レーンを有する車道に対し、時間により異なる検知対象領域を設定する例について説明する。実施例２において、以降に記載する内容以外の部分は、全て実施例１記載の映像解析システム１００と共通である。 In this embodiment, an example will be described in which different detection target regions are set depending on time on a roadway having a reversible lane (center line transition) and a bus-only lane. In the second embodiment, all parts other than the contents described below are common to the video analysis system 100 described in the first embodiment.

図７Ａは、左側２斜線右側３車線で、右側１車線はバス専用レーンとなっている車道を車道中央上方にあるカメラを用いて撮像した画像である。図７Ｂは、図７Ａと同じ場所を撮像しているが、リバーシブルレーンにより中央線が移動し、左側３車線右側２車線となっている状態を撮像した例である。 FIG. 7A is an image of a roadway with two diagonal lines on the left side and three lanes on the right side, and the one lane on the right side is a dedicated bus lane, taken using a camera located above the center of the roadway. FIG. 7B is an example of imaging the same location as in FIG. 7A, but the center line has moved due to the reversible lanes, resulting in three lanes on the left side and two lanes on the right side.

リバーシブルレーンとリバーシブルレーンに伴う曜日および時間帯指定のバス専用レーンがある車道では、図７Ａ、図７Ｂのように、時間および曜日により通行区分が切り替わる。このような車道において、進行方向違反や車両区分違反を検知するためには、進行方向と車両区分の変化に対応できるよう、時間および曜日により異なる検知対象領域を設定する必要がある。 On a roadway with reversible lanes and dedicated bus lanes with designated days of the week and time slots associated with the reversible lanes, traffic classifications change depending on the time and day of the week, as shown in FIGS. 7A and 7B. In order to detect traveling direction violations and vehicle classification violations on such roads, it is necessary to set different detection target areas depending on the time and day of the week so as to be able to respond to changes in the traveling direction and vehicle classification.

図７Ｃ～図７Ｅを用いて、図７Ａまたは図７Ｂの背景画像のフレームに対し、映像解析システム１００を用いて検知対象領域を抽出する例について説明する。背景画像は、図７Ａおよび図７Ｂのいずれであってもよい。 An example of extracting a detection target region using the video analysis system 100 for the frame of the background image in FIG. 7A or 7B will be described with reference to FIGS. 7C to 7E. The background image may be either of Figures 7A and 7B.

図７Ｃのハッチング領域は、ステップＳ３０５において領域推定部１２３が図７Ａまたは図７Ｂを基に車道を推定し、ステップＳ３０７により、車道推定領域が画像座標から世界座標に変換された結果である。図７Ｃに記載のY軸は、図７Ａまたは図７Ｂの車道における車両の進行方向であり、図７Ｃに記載のX軸は、図７Ａまたは図７Ｂの車道を横断する方向である。 The hatched area in FIG. 7C is the result of the area estimating unit 123 estimating the roadway based on FIG. 7A or 7B in step S305, and converting the estimated roadway area from image coordinates to world coordinates in step S307. The Y-axis shown in FIG. 7C is the traveling direction of the vehicle on the roadway of FIG. 7A or 7B, and the X-axis shown in FIG. 7C is the crossing direction of the roadway of FIG. 7A or 7B.

図７Ｄは領域設定部１２４が検知対象領域を設定する際に参照する設定ルールを表として示したものである。図７Ｄの表は、時間帯によって領域設定のルールが変わることと、それぞれの領域が対応する車線および通行区分を示している。 FIG. 7D shows a table of setting rules that the area setting unit 124 refers to when setting the detection target area. The table in FIG. 7D shows that the rules for area setting change depending on the time period, and the corresponding lanes and traffic divisions for each area.

実施例２の場合、図７Ｃに示した通り、領域推定部１２３は車道領域の推定のみを行い、領域設定部１２４は、図７Ｄ記載の時間帯別の領域設定ルールによって車両領域から図７Ａおよび図７Ｂのように時間帯によって切り替わる場所に対応した検知対象領域の導出を実現する。図７Ｄでは１箇所の時間帯別の領域設定ルールが記載されている例を示したが、さらに、カメラＩＤまたはカメラ設定場所を示すデータや曜日、天候、気温などによる場合分けが行われてもよい。また、図７Ｄに示したような設定ルールは、領域設定部１２４が保持してもよいが、本実施例では、記憶装置２０２内の管理情報ストア１０に格納される。さらに、設定ルールに関するデータのフォーマットやデータの形式はいかなるものであってもよい。例えば、テキストデータであってもよいしデータベースであってもよい。また、これらのデータはあらかじめ領域設定部１２４または記憶装置２０２に登録されていてもよいし、入力デバイス１０３によってユーザの入力を受け付けてもよい。 In the case of the second embodiment, as shown in FIG. 7C, the region estimation unit 123 only estimates the roadway region, and the region setting unit 124 estimates the vehicle region from the vehicle region according to the time zone region setting rule shown in FIG. 7D. As shown in FIG. 7B, derivation of the detection target area corresponding to the location that changes depending on the time period is realized. FIG. 7D shows an example in which an area setting rule for each time zone is described. good. The setting rule as shown in FIG. 7D may be held by the area setting unit 124, but is stored in the management information store 10 in the storage device 202 in this embodiment. Furthermore, the data format and data format related to the setting rule may be of any type. For example, it may be text data or a database. Further, these data may be registered in the area setting unit 124 or the storage device 202 in advance, or may be input by the user through the input device 103 .

図７Ｅの斜線部領域3箇所は、領域設定部１２４において図７Ｃの世界座標に変換された車道推定領域を基に、図７Ｄの領域設定ルールのうちNo.１の行に基づき導出された左２車線、右2車線、右１車線の領域である。図７Ｄの領域設定ルールは、車道推定領域の分割方法が記載されており、No.１を例にとると、X軸方向に左端から０～４0％の距離の領域が左車線の領域、４０～８０％の距離の領域が右車線、８０～１００％の距離の領域が右車線かつバス専用レーンであることを示している。同様に、図７Ｆの斜線部領域２箇所は、領域設定部１２４において図７Ｃの世界座標に変換された車道推定領域を基に、図７Ｄの領域設定ルールのうちNo.２の行に基づき導出された左３車線、右2車線の領域である。 The three shaded areas in FIG. 7E are the left coordinates derived based on the row No. 1 of the area setting rules in FIG. 7D based on the estimated roadway area converted to the world coordinates in FIG. It is an area of 2 lanes, 2 right lanes, and 1 right lane. The area setting rule in FIG. 7D describes how to divide the estimated roadway area. Taking No. 1 as an example, the area 0 to 40% from the left end in the X-axis direction is the left lane area. A range of distances from 80% to 80% indicates a right lane, and a range of distances from 80% to 100% indicates a right lane and a dedicated bus lane. Similarly, the two hatched areas in FIG. 7F are derived based on the estimated roadway area converted to the world coordinates in FIG. It is an area with 3 lanes on the left and 2 lanes on the right.

ステップＳ３０７により、図７Ｅおよび図７Ｆで求められた領域は画像座標に変換され、領域設定部１２４において検知対象領域が導出される。 By step S307, the regions obtained in FIGS. 7E and 7F are converted into image coordinates, and the region setting unit 124 derives a detection target region.

このように、上記説明した処理によれば、領域推定部１２３により推定した領域を基に、時間・曜日・天候・気温などによって変化する検知対象領域の設定が可能となる。 Thus, according to the above-described processing, it is possible to set a detection target area that changes depending on the time, day of the week, weather, temperature, etc., based on the area estimated by the area estimation unit 123 .

以上の実施例１および２の説明を、例えば下記のように総括することができる。なお、下記の総括は、上述の説明に無い事項（例えば変形例）を含んでもよい。 The description of Examples 1 and 2 above can be summarized, for example, as follows. It should be noted that the following summary may include matters (for example, modified examples) that are not included in the above description.

映像解析支援装置の一例であるサーバ計算機１１０が、映像入力部１１１と、領域推定部１２３と、領域導出部３０とを有する。映像入力部１１１が、映像撮影装置１０１（カメラの一例）により撮影された画像座標系の映像を、例えば映像撮影装置１０１（または映像記憶装置１０２）から入力する。領域推定部１２３が、当該入力された画像座標系の映像を基に当該映像における該当領域を推定する。領域導出部３０が、当該推定された該当領域の座標をカメラパラメータを用いて画像座標から世界座標に変換することで世界座標系の該当領域を導出し、当該導出された該当領域を基に世界座標系の検知対象領域を導出し、当該設定された検知対象領域の座標をカメラパラメータを用いて世界座標から画像座標に変換することで画像座標系の検知対象領域を導出する。このように、映像が一つの静止画像であり検知対象物が当該静止画像に映っていなくても、カメラパラメータを用いて世界座標系の該当領域が導出されて当該該当領域を基に世界座標系の検知対象領域が導出され、同カメラパラメータを用いて当該領域が、マスク領域（または他の用途）に使用可能な画像座標系の検知対象領域が導出される。これにより、映像が一つの静止画像であり検知対象物が当該静止画像に映っていなくても適切な検知対象領域を導出することが可能となる。 A server computer 110 , which is an example of a video analysis support device, has a video input unit 111 , an area estimation unit 123 , and an area derivation unit 30 . The image input unit 111 inputs an image in an image coordinate system captured by the image capturing device 101 (an example of a camera) from the image capturing device 101 (or the image storage device 102), for example. The area estimation unit 123 estimates the corresponding area in the image based on the input image in the image coordinate system. A region derivation unit 30 derives a relevant region in the world coordinate system by converting the estimated coordinates of the relevant region from image coordinates to world coordinates using camera parameters, and calculates a world coordinate based on the derived relevant region. A detection target area in the coordinate system is derived, and the coordinates of the set detection target area are converted from world coordinates to image coordinates using camera parameters, thereby deriving the detection target area in the image coordinate system. In this way, even if the video is a single still image and the object to be detected does not appear in the still image, the corresponding area in the world coordinate system is derived using the camera parameters, and the corresponding area in the world coordinate system is derived based on the corresponding area. is derived, and the same camera parameters are used to derive a detection target area in an image coordinate system that can be used as a mask area (or other application). This makes it possible to derive an appropriate detection target area even if the video is a single still image and the detection target is not shown in the still image.

世界座標系の検知対象領域の導出は、世界座標系の該当領域の世界座標変更に関するルールである設定ルール（例えば、世界座標系の該当領域の世界座標に適用されるオフセットを含んだルール）を適用することでよい。このように、世界座標系の該当領域を基に世界座標系の検知対象領域が導出されるので、当該世界座標系の検知対象領域は適切であることが期待でき、以って、適切な画像座標系の検知対象領域の導出が期待できる。 Derivation of the detection target area in the world coordinate system is based on setting rules (e.g., rules including offsets applied to the world coordinates of the relevant area in the world coordinate system) that are rules for changing the world coordinates of the relevant area in the world coordinate system. can be applied. In this way, since the detection target area of the world coordinate system is derived based on the relevant area of the world coordinate system, it can be expected that the detection target area of the world coordinate system is appropriate. Derivation of the detection target area of the coordinate system can be expected.

領域導出部３０は、世界座標系の該当領域の変更後の世界座標を決める情報（例えば、該当領域の世界座標に適用されるオフセット、又は、該当領域の変更後の世界座標それ自体）の指定を受け付けるユーザインターフェースである設定ＵＩを提供してよい。上記適用される設定ルールは、設定ＵＩを介して指定された情報に従う設定ルールでよい。これにより、該当領域に設定ルールを適用することで導出される世界座標系の検知対象領域を適切にすることが期待できる。 The area derivation unit 30 specifies information that determines the changed world coordinates of the relevant area in the world coordinate system (for example, the offset applied to the world coordinates of the relevant area, or the changed world coordinates of the relevant area itself). A setting UI, which is a user interface that accepts The applied setting rule may be a setting rule according to information specified via a setting UI. As a result, it can be expected that the detection target area of the world coordinate system derived by applying the setting rule to the relevant area will be appropriate.

領域導出部３０は、設定ＵＩを介して指定された情報に従う設定ルールを用いて設定された世界座標系の検知対象領域と当該世界座標系の検知対象領域に基づく画像座標系の検知対象領域とのうちの少なくとも一つを出力するユーザインターフェースである領域ＵＩ６０１を提供してよい。領域ＵＩ６０１に出力されている検知対象領域を見て、ユーザは、設定ＵＩを介して指定した情報が適切か否かを判断できる。 The area deriving unit 30 determines a detection target area in a world coordinate system set using a setting rule according to information specified via a setting UI and a detection target area in an image coordinate system based on the detection target area in the world coordinate system. A region UI 601, which is a user interface for outputting at least one of, may be provided. By looking at the detection target area output to the area UI 601, the user can determine whether the information specified via the setting UI is appropriate.

領域導出部３０は、指定された情報が設定ＵＩを介して変更された場合、当該変更された情報に従う設定ルールに基づき、領域ＵＩに出される領域を変更してよい。これにより、ユーザは、変更後の情報が適切か否かを判断できる。 When the specified information is changed via the setting UI, the area derivation unit 30 may change the area displayed on the area UI based on the setting rule according to the changed information. This allows the user to determine whether or not the changed information is appropriate.

領域導出部３０は、世界座標系の検知対象領域と画像座標系の検知対象領域のうちの少なくとも一つと当該領域の導出に使用され当該領域に関連付けられカメラパラメータとを記憶装置２０２に格納してよい。領域導出部３０は、領域ＵＩ６０１に出力される領域に関連付けられているカメラパラメータを出力し当該カメラパラメータの変更を受け付けるユーザインターフェースであるカメラパラメータＵＩ６０２を提供してよい。これにより、ユーザは、検知対象領域に影響したカメラパラメータを知ることができる。 The region deriving unit 30 stores at least one of a detection target region in the world coordinate system and a detection target region in the image coordinate system, and camera parameters associated with the region and used for deriving the region in the storage device 202. good. The region derivation unit 30 may provide a camera parameter UI 602, which is a user interface that outputs camera parameters associated with the region output to the region UI 601 and receives changes to the camera parameters. This allows the user to know the camera parameters that have affected the detection target area.

領域導出部３０は、指定された情報が設定ＵＩを介して変更された場合とカメラパラメータＵＩ６０２を介してカメラパラメータが変更された場合との少なくとも一つの場合、当該変更に基づき、領域ＵＩ６０１に出される領域を変更してよい。これにより、ユーザは、変更後の情報またはカメラパラメータが適切か否かを判断できる。 In at least one of the case where the designated information is changed via the setting UI and the case where the camera parameters are changed via the camera parameter UI 602, the region deriving unit 30 outputs the information to the region UI 601 based on the change. You may change the area where This allows the user to determine whether the changed information or camera parameters are appropriate.

複数の時間帯の各々について、当該時間帯に関し一つまたは複数の世界座標系の検知対象領域をそれぞれ設定することに適用される一つまたは複数の設定ルールがあってよい（例えば図７Ｄ参照）。領域導出部３０は、世界座標系の検知対象領域を設定するときの時刻に属する時間帯に対応した一つ以上の設定ルールを用いて一つ以上の世界座標系の検知対象領域を導出してよい。これにより、撮影範囲に関する環境に適切な世界座標系の検知対象領域を導出することが期待できる。 For each of the multiple time slots, there may be one or more configuration rules that are applied to respectively configure one or more world coordinate system sensing regions for that time slot (see, for example, FIG. 7D). . The region deriving unit 30 derives one or more detection target regions in the world coordinate system using one or more setting rules corresponding to the time zone belonging to the time when setting the detection target region in the world coordinate system. good. As a result, it can be expected to derive a detection target area in the world coordinate system that is appropriate for the environment related to the imaging range.

領域導出部３０は、映像撮影装置１０１による撮影される範囲に関わる情報の指定を受け付けるユーザインターフェースである設定ＵＩを提供してよい。上記適用される設定ルールは、設定ＵＩを介して指定された情報から決定された設定ルールでよい。設定ルールに関する詳細よりも撮影範囲に関わる情報の方をユーザが把握している場合には、当該情報が指定されることで適切な設定ルールが領域導出部３０により生成されることになり、結果として、ユーザにとって利便性が高い場合があり得る。この仕組みが採用される場合、サーバ計算機１１０の記憶装置２０２は、映像撮影装置１０１による撮影される範囲に関わる情報と複数の設定ルールとの関連付けを示す情報である変換情報を格納してよい。変換情報は、管理情報の一部でよい。領域導出部３０は、設定ＵＩを介して、映像撮影装置１０１による撮影される範囲に関わる情報が指定された場合、当該指定された情報に対応する一つ以上の設定ルールを変換情報から特定し、特定された一つ以上の設定ルールを設定または設定ＵＩに表示してよい。なお、設定ＵＩを介して指定される情報（撮影範囲に関する情報）は、下記のうちの少なくとも一つ、
・撮影時刻、
・撮影場所、
・撮影時刻および撮影場所における天候、および、
・検知対象物のサイズ、または当該サイズに影響する対象物属性、
でよい。このような情報はユーザが把握し易い情報であると考えられ、故に、ユーザにとっての利便性の向上が期待される。 The region deriving unit 30 may provide a setting UI, which is a user interface for accepting specification of information related to the range captured by the video capturing device 101 . The applied setting rule may be a setting rule determined from information specified via the setting UI. If the user knows more about the information about the shooting range than about the details about the setting rule, the area derivation unit 30 generates an appropriate setting rule by specifying the information. As such, it may be highly convenient for the user. When this mechanism is employed, the storage device 202 of the server computer 110 may store conversion information, which is information indicating association between information relating to the range captured by the video capturing device 101 and a plurality of setting rules. Conversion information may be part of the management information. When information relating to the range to be captured by the image capturing device 101 is specified via the setting UI, the region derivation unit 30 identifies one or more setting rules corresponding to the specified information from the conversion information. , the identified one or more configuration rules may be configured or displayed in a configuration UI. Information specified via the setting UI (information on the shooting range) must be at least one of the following:
・ Shooting time,
・Shooting location,
・The weather at the shooting time and shooting location, and
- the size of the object to be detected, or object attributes that affect the size;
OK. Such information is considered to be information that is easy for the user to grasp, and therefore, it is expected that the convenience for the user will be improved.

映像解析部１３１が更に備えられてもよい。映像解析部１３１は、導出された画像座標系の検知対象領域を用いて、撮影され入力された画像座標系の映像に検知対象物が映っているか否かの判断を含む映像解析処理を行ってよい。このように、映像解析部１３１と映像解析部１３１により行われる映像解析処理で使用される画像座標系の検知対象領域を導出する処理とが同一の装置で行われるので、利便性の向上が期待される。 A video analysis unit 131 may be further provided. The video analysis unit 131 uses the derived detection target area in the image coordinate system to perform video analysis processing including determination of whether or not the detection target object appears in the captured and input video in the image coordinate system. good. In this way, since the video analysis unit 131 and the process of deriving the detection target area in the image coordinate system used in the video analysis process performed by the video analysis unit 131 are performed by the same device, an improvement in convenience is expected. be done.

カメラパラメータ算出部１２１が更に備えられてもよい。カメラパラメータ算出部１２１は、映像入力部１１１により入力された映像を基に、カメラパラメータを算出（推定）してよい。記世界座標系の該当領域の導出と画像座標系の検知対象領域の導出との各々において用いられるカメラパラメータは、カメラパラメータ算出部１２１により算出されたカメラパラメータでよい。これにより、撮影条件（例えば、カメラの姿勢、撮影倍率（画角）および位置）が変更された後の映像を基に算出されるカメラパラメータにも変更があるため、撮影条件が変更されても適切な検知対象領域を導出することを維持することができる。 A camera parameter calculator 121 may be further provided. The camera parameter calculation unit 121 may calculate (estimate) camera parameters based on the video input by the video input unit 111 . The camera parameters calculated by the camera parameter calculation unit 121 may be used as the camera parameters used in each of the derivation of the relevant region in the world coordinate system and the derivation of the detection target region in the image coordinate system. As a result, even if the shooting conditions are changed, the camera parameters calculated based on the image after the shooting conditions (e.g. camera posture, shooting magnification (angle of view) and position) are changed are also changed. Deriving a suitable sensing target area can be maintained.

領域導出部３０は、世界座標系の検知対象領域と画像座標系の検知対象領域のうちの少なくとも一つと当該領域の導出に使用され当該領域に関連付けられカメラパラメータとを記憶装置２０２に格納してもよい。領域導出部３０は、画像座標系の検知対象領域に関連付けられているカメラパラメータが、算出されたカメラパラメータと異なっていれば、当該算出されたカメラパラメータに基づき、記憶装置内の画像座標系の検知対象領域を変更してよい。このようにして、撮影条件の変更に伴いカメラパラメータの変更があっても適切な検知対象領域を導出することを維持することができる。 The region deriving unit 30 stores at least one of a detection target region in the world coordinate system and a detection target region in the image coordinate system, and camera parameters associated with the region and used for deriving the region in the storage device 202. good too. If the camera parameters associated with the detection target region in the image coordinate system are different from the calculated camera parameters, the region derivation unit 30 calculates the image coordinate system in the storage device based on the calculated camera parameters. The detection target area may be changed. In this way, it is possible to maintain the derivation of an appropriate detection target area even if the camera parameters are changed as the imaging conditions are changed.

以上、幾つかの実施例を説明したが、これらは本発明の説明のための例示であって、本発明の範囲をこれらの実施例にのみ限定する趣旨ではない。本発明は、他の種々の形態でも実行することが可能である。 Although several embodiments have been described above, these are examples for explaining the present invention, and are not meant to limit the scope of the present invention only to these embodiments. The invention can also be implemented in various other forms.

１００：映像解析システム 100: Video analysis system

Claims

an image input unit for inputting an image in an image coordinate system captured by a camera;
a region estimating unit for estimating a corresponding region in the video based on the input video in the image coordinate system, using a network model trained using images having a plurality of correct information;
Deriving a corresponding region in a world coordinate system by converting the estimated coordinates of the corresponding region from image coordinates to world coordinates using camera parameters, and obtaining world coordinates different from the corresponding region estimated by the region estimating unit. By applying the setting rule for setting the above area to the derived corresponding area, the offset correction for the derived corresponding area can be performed in another world coordinate system that does not include the derived corresponding area . Deriving a detection target area in the world coordinate system, which is an area in which the detection target may appear, and converting the coordinates of the derived detection target area from the world coordinates to the image coordinates using the camera parameters. A video analysis support device, comprising: a region derivation unit that derives a detection target region in an image coordinate system by transformation.

The area derivation unit provides a setting UI, which is a user interface for accepting specification of information that determines the changed world coordinates of the corresponding area of the world coordinate system,
The applicable setting rule is a setting rule according to information specified via the setting UI.
The video analysis support device according to claim 1 .

The area derivation unit is configured to: detect a detection target area in a world coordinate system derived using a setting rule according to information specified via the setting UI; and a detection target area in an image coordinate system based on the detection target area in the world coordinate system. Provide a region UI that is a user interface that outputs at least one of
3. The video analysis support device according to claim 2 .

When the specified information is changed via the setting UI, the area deriving unit changes the area displayed on the area UI based on a setting rule according to the changed information.
The video analysis support device according to claim 3 .

The region derivation unit stores at least one of a detection target region in the world coordinate system and a detection target region in the image coordinate system, and camera parameters associated with the region and used for deriving the region in a storage device. ,
The area derivation unit provides a camera parameter UI, which is a user interface that outputs camera parameters associated with the area output to the area UI and receives changes to the camera parameters.
The video analysis support device according to claim 3 .

In at least one of a case where the specified information is changed via the setting UI and a case where camera parameters are changed via the camera parameter UI, the region derivation unit calculates the change the region displayed in the region UI,
The video analysis support device according to claim 5 .

for each of the plurality of time zones, there are one or more configuration rules applied to respectively derive one or more world coordinate system sensing regions of interest for that time zone;
The region derivation unit derives one or more detection target regions in the world coordinate system using one or more setting rules corresponding to the time zone belonging to the time when setting the detection target region in the world coordinate system,
The video analysis support device according to claim 1 .

The region derivation unit provides a setting UI, which is a user interface for accepting designation of information related to the range captured by the camera,
The applicable setting rule is a setting rule determined from information specified via the setting UI.
The video analysis support device according to claim 1 .

A video analysis unit that uses the derived detection target area of the image coordinate system to perform video analysis processing including determination of whether or not the detection target object appears in the captured and input video of the image coordinate system;
The video analysis support device according to claim 1, further comprising:

further comprising a camera parameter calculation unit that calculates camera parameters based on the input video;
The camera parameters used in each of the derivation of the corresponding region in the world coordinate system and the derivation of the detection target region in the image coordinate system are the calculated camera parameters.
The video analysis support device according to claim 1.

The region derivation unit stores at least one of a detection target region in the world coordinate system and a detection target region in the image coordinate system, and camera parameters associated with the region and used for deriving the region in a storage device. ,
If the camera parameters associated with the detection target region in the image coordinate system are different from the calculated camera parameters, the region derivation unit calculates the image coordinates in the storage device based on the calculated camera parameters. change the detection target region of the system,
The video analysis support device according to claim 10 .

a computer inputting an image in an image coordinate system captured by a camera;
a step in which a computer estimates a corresponding region in the video based on the input video in the image coordinate system, using a network model trained using an image having a plurality of correct information;
a computer deriving the region of interest in the world coordinate system by transforming the estimated coordinates of the region of interest from image coordinates to world coordinates using camera parameters;
A computer applies a setting rule for setting a region on world coordinates different from the region estimated by the estimating step to the derived region, a step of deriving , by offset correction, a detection target area in a world coordinate system that is an area in another world coordinate system that does not include the derived relevant area and is an area in which a detection target may appear;
a step of a computer deriving a detection target area in an image coordinate system by transforming the derived coordinates of the detection target area from world coordinates to image coordinates using the camera parameters;
A video analysis support method having