JP2021156764A

JP2021156764A - Calibration equipment, mobile identification equipment used in the calibration equipment, training equipment for it, and computer programs for them.

Info

Publication number: JP2021156764A
Application number: JP2020058019A
Authority: JP
Inventors: 超然劉; Chaoran Liu; カルロストシノリイシイ; Toshinori Ishi Carlos
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2021-10-07
Anticipated expiration: 2040-03-27
Also published as: JP7497188B2

Abstract

【課題】異なる種類のセンサを自動的に校正する。
【解決手段】校正装置は、２つのセンサがそれぞれした所定数の移動体の位置に関する第１及び第２の時系列データを取得する訓練データ収集装置と、これら時系列データを受け、第１及び第２の時系列データがそれぞれ表す第１及び第２の移動体第２の移動体との組合せごとに、第１及び第２の移動体の位置の時系列データを受け、第１及び第２の移動体が同一の移動体か否かを示すスコアを出力するように訓練済の移動体同定装置と、その出力に基づき第１及び第２の時系列データが表す各移動体の対応関係を推定し、その対応関係を用いて各移動体に関する第１及び第２のセンサの出力誤差が所定の条件を充足するように、第１のセンサに対する第２のセンサの位置及び姿勢を校正するセンサ校正装置とを含む。
【選択図】図５PROBLEM TO BE SOLVED: To automatically calibrate a different type of sensor.
SOLUTION: A calibration device receives first and second time-series data for acquiring first and second time-series data regarding the positions of a predetermined number of moving objects by two sensors, and receives the first and second time-series data. First and second moving bodies represented by the second time-series data For each combination with the second moving body, the time-series data of the positions of the first and second moving bodies are received, and the first and second moving bodies are received. The correspondence between the moving body identification device trained to output the score indicating whether or not the moving bodies are the same moving body and the corresponding relationship of each moving body represented by the first and second time series data based on the output. A sensor that estimates and calibrates the position and orientation of the second sensor with respect to the first sensor so that the output errors of the first and second sensors for each moving object satisfy certain conditions using the correspondence. Including calibrator.
[Selection diagram] Fig. 5

Description

この発明は移動体センサの校正技術に関し、特に、複数の移動体センサで検出した複数の人物等の移動体を移動体センサの校正のために同定する技術に関する。 The present invention relates to a technique for calibrating a moving body sensor, and more particularly to a technique for identifying a moving body such as a plurality of persons detected by a plurality of moving body sensors for calibration of the moving body sensor.

視覚的及び聴覚的環境を機械に理解可能な形で表現するために、ＲＧＢ−Ｄセンサ及びマイクロホン・アレイが広く使われている。環境の広がりを表現するためにはこれらセンサが複数個必要である。複数個のセンサを使用する場合には、それらセンサの出力を共有し互いに適切に組み合わせるために、各センサの位置及び姿勢に関する情報が必要である。しかし、そのために様々な種類のセンサを手動で校正することは煩瑣でありかつ時間を要する。そのために、人手をかけずに複数のセンサの校正を行えるような技術が望ましい。 RGB-D sensors and microphone arrays are widely used to represent the visual and auditory environment in a machine-understandable manner. A plurality of these sensors are required to express the expanse of the environment. When using a plurality of sensors, information on the position and orientation of each sensor is required in order to share the outputs of those sensors and properly combine them with each other. However, manually calibrating various types of sensors for this purpose is cumbersome and time consuming. Therefore, a technique capable of calibrating a plurality of sensors without human intervention is desirable.

後掲の非特許文献１は、人体の位置情報からＲＧＢ−Ｄセンサの位置及び姿勢情報への変換を導くために、骨格に基づく視点不変性変換を提案している。この変換では、隣りあう２個のセンサにより観測された共通の人体（骨格）を用いて、これら２個のセンサの相対位置及び姿勢が計算される。 Non-Patent Document 1 described later proposes a skeleton-based viewpoint invariant conversion in order to guide the conversion from the position information of the human body to the position and posture information of the RGB-D sensor. In this conversion, the relative positions and orientations of these two sensors are calculated using the common human body (skeleton) observed by the two adjacent sensors.

後掲の非特許文献２には、観測された骨格の関節の位置に関する情報を用いて、ＲＧＢ−Ｄセンサを校正し、自動的に再校正するアルゴリズムが提案されている。 Non-Patent Document 2 described later proposes an algorithm for calibrating an RGB-D sensor and automatically recalibrating it using information on the observed joint positions of the skeleton.

一方、聴覚的環境を知覚し、ロボットの聴覚を改善するために、マイクロホン・アレイが広く用いられている。しかし、マイクロホン・アレイを用いて環境を知覚するための技術の大部分は手作業による校正を行うものであって、複数のマイクロホン・アレイを自動的に校正するための技術はごく一部に限られていた。 On the other hand, microphone arrays are widely used to perceive the auditory environment and improve the hearing of robots. However, most of the techniques for perceiving the environment using microphone arrays are manual calibration, and only a few techniques for automatically calibrating multiple microphone arrays. Was being done.

Y. Han, S.-L. Chung, J.-S. Yeh, and Q.-J. Chen, “Localization of rgb-d camera skeleton-based viewpoint invariance transformation,” vol. 63, 10 2013, pp. 1525-1530Y. Han, S.-L. Chung, J.-S. Yeh, and Q.-J. Chen, “Localization of rgb-d camera skeleton-based viewpoint invariance transformation,” vol. 63, 10 2013, pp. 1525-1530 K. Desai, B. Prabhakaran, and S. Raghuraman, “Skeleton-based continuous extrinsic calibration of multiple rgb-d kinect cameras,” in Proceedings of the 9th ACM Multimedia Systems Conference, ser. MMSys ’18. New York, NY, USA: ACM, 2018, pp. 250-257. [Online]. Available: http://doi.acm.org/10.1145/3204949.3204969K. Desai, B. Prabhakaran, and S. Raghuraman, “Skeleton-based continuous extrinsic calibration of multiple rgb-d kinect cameras,” in Proceedings of the 9th ACM Multimedia Systems Conference, ser. MMSys '18. New York, NY, USA: ACM, 2018, pp. 250-257. [Online]. Available: http://doi.acm.org/10.1145/3204949.3204969 J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, pp. 1263-1272.J. Gilmer, SS Schoenholz, PF Riley, O. Vinyals, and GE Dahl, “Neural message passing for quantum chemistry,” in Proceedings of the 34th International Conference on Machine Learning --Volume 70, ser. ICML'17. JMLR.org , 2017, pp. 1263-1272. S. Agarwal, N. Snavely, S. M. Seitz, and R. Szeliski, “Bundle adjustment in the large,” in Proceedings of the 11th European Conference on Computer Vision: Part II, ser. ECCV’10. Berlin, Heidelberg: Springer-Verlag, 2010, pp. 29-42.S. Agarwal, N. Snavely, SM Seitz, and R. Szeliski, “Bundle adjustment in the large,” in Proceedings of the 11th European Conference on Computer Vision: Part II, ser. ECCV'10. Berlin, Heidelberg: Springer- Verlag, 2010, pp. 29-42.

前述したように、自動的に複数のセンサを校正する技術は、同種のセンサの間での技術に限られていた。例えばＲＧＢ−Ｄセンサセンサとマイクロホン・アレイのように、異なる種類のセンサを自動的に校正する技術は存在していない。 As mentioned above, the technique of automatically calibrating a plurality of sensors has been limited to the technique among sensors of the same type. There is no technique for automatically calibrating different types of sensors, such as RGB-D sensor sensors and microphone arrays.

したがってこの発明は、異なる種類のセンサを自動的に校正する校正装置、その校正装置で使用される移動体同定装置、そのための訓練装置、及びそれらのためのコンピュータ・プログラムを提供することである。 Therefore, the present invention provides a calibrator that automatically calibrates different types of sensors, a mobile identification device used in the calibrator, a training device for that purpose, and a computer program for them.

本発明の第１の局面に係る校正装置は、各々が複数の移動体の位置を離散的な時系列で検出し出力可能な、第１のセンサ及び第２のセンサの位置及び姿勢を校正するための校正装置であって、所定時間にわたり第１のセンサと第２のセンサとによりそれぞれ測定された、所定数の移動体の位置に関する第１の時系列データと第２の時系列データとを取得する取得部と、第１の時系列データ及び第２の時系列データを入力として、第１の時系列データにより表される第１の移動体と第２の時系列データにより表される第２の移動体との組合せごとに、第１の時系列データ内の第１の移動体の位置の時系列データと第２の時系列データ内の第２の移動体の位置の時系列データとを入力として受け、当該組合せを形成する第１の移動体及び第２の移動体が同一の移動体であるか否かを示すスコアを出力するように予め訓練済のニューラル・ネットワークからなる移動体同定手段と、移動体同定手段の出力に基づいて、第１の時系列データにより表される各移動体と第２の時系列データにより表される各移動体との対応関係を推定し、当該対応関係を用い、各移動体に関する第１のセンサと第２のセンサとの出力誤差が所定の条件を充足するように、第１のセンサに対する第２のセンサの位置及び姿勢を校正するセンサ校正手段とを含む。 The calibrator according to the first aspect of the present invention calibrates the positions and orientations of the first sensor and the second sensor, each capable of detecting and outputting the positions of a plurality of moving objects in a discrete time series. A calibrator for this purpose, the first time-series data and the second time-series data regarding the positions of a predetermined number of moving objects measured by the first sensor and the second sensor over a predetermined time, respectively. With the acquisition unit to be acquired and the first time-series data and the second time-series data as inputs, the first moving body represented by the first time-series data and the second time-series data represented by the second time-series data. For each combination with the two moving bodies, the time-series data of the position of the first moving body in the first time-series data and the time-series data of the position of the second moving body in the second time-series data. As input, a moving body consisting of a neural network pre-trained to output a score indicating whether the first moving body and the second moving body forming the combination are the same moving body. Based on the output of the identification means and the moving body identification means, the correspondence between each moving body represented by the first time-series data and each moving body represented by the second time-series data is estimated, and the corresponding relationship is estimated. Sensor calibration that calibrates the position and orientation of the second sensor with respect to the first sensor so that the output error between the first sensor and the second sensor for each moving object satisfies a predetermined condition using a correspondence relationship. Including means.

好ましくは、センサ校正手段は、出力誤差の和を最小化するように、第２のセンサの位置及び姿勢を校正する最小化手段を含む。 Preferably, the sensor calibration means includes a minimization means that calibrates the position and orientation of the second sensor so as to minimize the sum of the output errors.

より好ましくは、校正装置は、さらに、第１の時系列データ及び第２の時系列データを用いて、移動体同定手段の訓練を対応関係推定手段の動作と並行して行う並行訓練手段を含む。 More preferably, the calibrator further includes a parallel training means that uses the first time series data and the second time series data to train the moving body identification means in parallel with the operation of the correspondence estimation means. ..

さらに好ましくは、並行訓練手段は、移動体同定手段と、移動体同定手段の出力と、第１の時系列データと第２の時系列データの各々の同一タイムステップの位置データとを入力とするデコーダと、第１の時系列データと第２の時系列データの所定範囲にわたりデコーダの出力がデコーダに入力される同一タイムステップの位置データに近くなるように、移動体同定手段とデコーダとのパラメータを調整することで移動体同定手段の訓練を行う調整手段とを含む。 More preferably, the parallel training means inputs the moving body identification means, the output of the moving body identification means, and the position data of each of the first time series data and the second time series data at the same time step. Parameters of the decoder, the moving object identification means and the decoder so that the output of the decoder is close to the position data of the same time step input to the decoder over a predetermined range of the first time series data and the second time series data. Includes coordinating means for training mobile identification means by coordinating.

好ましくは、調整手段は、所定時間の全体にわたる第１の時系列データと第２の時系列データを用いて、デコーダの出力とデコーダに入力される同一タイムステップの位置データとの誤差を用いた誤差逆伝播法により移動体同定手段とデコーダとのパラメータを調整することで移動体同定手段の訓練を行う誤差逆伝播手段を含む。 Preferably, the adjusting means uses the first time series data and the second time series data over the entire predetermined time, and uses the error between the output of the decoder and the position data of the same time step input to the decoder. It includes an error back-propagation means for training the moving body identification means by adjusting the parameters of the moving body identification means and the decoder by the error back-propagation method.

より好ましくは、並行訓練手段は、第１の時系列データと第２の時系列データとが与えられるごとに移動体同定手段の訓練を行う。 More preferably, the parallel training means trains the mobile identification means each time the first time series data and the second time series data are given.

本発明の第２の局面に係る移動体同定装置は、所定時間にわたり第１のセンサと第２のセンサとによりそれぞれ測定された、第１の移動体及び第２の移動体の位置に関する第１の時系列データと第２の時系列データとを入力として、第１の移動体及び第２の移動体が同一の移動体であるか否かを示すスコアを出力するように予め訓練済のニューラル・ネットワークからなる。 The mobile body identification device according to the second aspect of the present invention is a first moving body and a second moving body regarding the positions of the first moving body and the second moving body, respectively, measured by the first sensor and the second sensor over a predetermined time. Pre-trained neural to output a score indicating whether the first moving body and the second moving body are the same moving body by inputting the time series data and the second time series data of・ Consists of a network.

好ましくは、第１の時系列データ及び第２の時系列データの各々は、対象となる移動体の所定時間ごとの位置データを含み、所定時間ごとの位置データの各々は、対象となる移動体の位置及び速度と、当該位置及び速度が測定された時刻を示す時刻情報とを含む。 Preferably, each of the first time-series data and the second time-series data includes the position data of the target moving object at predetermined time intervals, and each of the position data at predetermined time intervals is the target moving object. Includes the position and speed of the data and time information indicating the time when the position and speed were measured.

より好ましくは、ニューラル・ネットワークは、第１の時系列データに含まれる位置及び速度、並びに第２の時系列データに含まれる位置及び速度を受ける複数個の入力と、確率を出力する出力とを持つ、複数層からなるニューラル・ネットワークである。 More preferably, the neural network has a plurality of inputs that receive the position and velocity contained in the first time series data and the position and velocity contained in the second time series data, and an output that outputs the probability. It is a neural network consisting of multiple layers.

本発明の第３の局面に係る訓練装置は、複数の移動体の各々に対して所定の時間にわたり所定のタイムステップで得られた位置データの時系列を取得する時系列データ取得部と、時系列データ取得部により取得された位置データの時系列から、指定された順番の、同じ時刻に取得された位置データを抽出する位置データ抽出手段と、所定のタイムステップの数により定まる入力と、少なくとも一つの出力とを持つ第１のニューラル・ネットワークと、いずれも時系列を構成する位置データにより定まる同じ数の入力及び出力を有する第２のニューラル・ネットワークと、複数の移動体から２つの移動体の可能な組合せを全て抽出し、位置データの時系列のうち、抽出された当該組合せを構成する移動体の位置データの時系列を第１のニューラル・ネットワークへの入力として第１のニューラル・ネットワークに与える入力手段と、入力に応答して第１のニューラル・ネットワークが出力する値をサンプリングする第１のサンプリング手段と、可能な組合せの各々に対して第１のサンプリング手段によりサンプリングされた値のうち、最も大きな値が得られた組合せを選択する選択手段と、位置データ抽出手段により抽出された位置データのうちで、選択手段により選択された組合せに対応する２つの移動体の位置データを第２のニューラル・ネットワークに入力し、当該第２のニューラル・ネットワークの出力をサンプリングする第２のサンプリング手段と、第２のニューラル・ネットワークの入力に与えられた２つの移動体位置データと、第２のサンプリング手段が第２のニューラル・ネットワークの出力からサンプリングした値との間の誤差が小さくなるように、誤差逆伝播法により第１のニューラル・ネットワーク及び第２のニューラル・ネットワークの各々のパラメータの調整を行うパラメータ調整手段と、位置データ抽出手段、第１のニューラル・ネットワーク、入力手段、第１のサンプリング手段、選択手段、第２のサンプリング手段、及びパラメータ調整手段を、位置データの時系列の先頭から順番に位置データを指定して時系列データが終了するまで繰返して動作させる第１の繰返実行手段と、第１の繰返実行手段による繰返しを、所定の終了条件が成立するまで繰返し実行する第２の繰返実行手段と、第２の繰返実行手段による繰返が終了した時点での第１のニューラル・ネットワークのパラメータを所定の記憶装置に保存するパラメータ保存手段とを含む。 The training device according to the third aspect of the present invention includes a time series data acquisition unit that acquires a time series of position data obtained in a predetermined time step for each of a plurality of moving objects over a predetermined time, and a time series data acquisition unit. Position data extraction means that extracts position data acquired at the same time in a specified order from the time series of position data acquired by the series data acquisition unit, input determined by the number of predetermined time steps, and at least A first neural network with one output, a second neural network with the same number of inputs and outputs, both determined by the position data constituting the time series, and two moving objects from multiple moving objects. Of the time series of position data, the time series of the position data of the moving body constituting the extracted combination is used as the input to the first neural network, and the first neural network is used. The input means given to, the first sampling means that samples the values output by the first neural network in response to the input, and the values sampled by the first sampling means for each of the possible combinations. Among the selection means for selecting the combination in which the largest value is obtained and the position data extracted by the position data extraction means, the position data of the two moving bodies corresponding to the combination selected by the selection means are selected. A second sampling means that inputs to the second neural network and samples the output of the second neural network, two moving body position data given to the input of the second neural network, and a second. By the error backpropagation method, the parameters of each of the first neural network and the second neural network are reduced so that the error between the sampling means of the second neural network and the sampled value is small. A parameter adjusting means for adjusting, a position data extracting means, a first neural network, an input means, a first sampling means, a selection means, a second sampling means, and a parameter adjusting means are used in a time series of position data. The first repeat execution means that specifies the position data in order from the beginning and operates repeatedly until the time series data ends, and the first repeat execution means repeat until a predetermined end condition is satisfied. The parameters of the second repeat execution means to be executed and the first neural network at the time when the repetition by the second repeat execution means is completed are determined. Includes parameter storage means for storage in a fixed storage device.

好ましくは、パラメータ調整手段は、第２のニューラル・ネットワークの入力に与えられた２つの移動体位置データと、第２のサンプリング手段が第２のニューラル・ネットワークの出力からサンプリングした値との間の誤差を所定の回数だけ蓄積する誤差蓄積手段と、第１のサンプリング手段及び第２のサンプリング手段が所定の回数だけ動作した後に、誤差蓄積手段により蓄積された誤差が小さくなるように、誤差逆伝播法により第１のニューラル・ネットワーク及び第２のニューラル・ネットワークの各々のパラメータの調整をバッチ処理により行うバッチ調整手段とを含む。 Preferably, the parameter adjusting means is between the two moving body position data given to the input of the second neural network and the values sampled by the second sampling means from the output of the second neural network. Backpropagation of errors so that the error accumulated by the error accumulating means becomes smaller after the error accumulating means for accumulating the error a predetermined number of times and the first sampling means and the second sampling means operate a predetermined number of times. It includes a batch adjustment means for adjusting each parameter of the first neural network and the second neural network by batch processing according to the method.

本発明の第４の局面に係るコンピュータ・プログラムは、コンピュータを、上記したいずれかの校正装置として機能させる。 The computer program according to the fourth aspect of the present invention causes the computer to function as any of the above-mentioned calibration devices.

本発明の第５の局面に係るコンピュータ・プログラムは、コンピュータを、上記したいずれかの移動体同定装置として機能させる。 The computer program according to the fifth aspect of the present invention causes the computer to function as any of the above-mentioned mobile identification devices.

本発明の第６の局面に係るコンピュータ・プログラムは、コンピュータを、上記したいずれかの訓練装置として機能させる。 The computer program according to the sixth aspect of the present invention causes the computer to function as any of the training devices described above.

この発明の上記および他の目的、特徴、局面及び利点は、添付の図面と関連して理解されるこの発明に関する次の詳細な説明から明らかとなるであろう。 The above and other objectives, features, aspects and advantages of the invention will become apparent from the following detailed description of the invention as understood in connection with the accompanying drawings.

図１は、センサシステムの構成を模式的に示す図である。FIG. 1 is a diagram schematically showing a configuration of a sensor system. 図２は、２つのセンサの間での校正方法を示す模式図である。FIG. 2 is a schematic diagram showing a calibration method between two sensors. 図３は、校正プロセスを表す因子グラフを模式的に示す図である。FIG. 3 is a diagram schematically showing a factor graph showing the calibration process. 図４は、校正プロセスにおいて、同一の移動体の間のみにエッジを持つ因子グラフを模式的に示す図である。FIG. 4 is a diagram schematically showing a factor graph having edges only between the same moving bodies in the calibration process. 図５は、この発明の実施の形態に係るセンサの自動校正システムの全体構成の概略を示すブロック図である。FIG. 5 is a block diagram showing an outline of the overall configuration of the automatic sensor calibration system according to the embodiment of the present invention. 図６は、図５に示す自動校正システムを実現するコンピュータの外観を示す図である。FIG. 6 is a diagram showing the appearance of a computer that realizes the automatic calibration system shown in FIG. 図７は、図６に外観を示すコンピュータのハードウェア構成を示すブロック図である。FIG. 7 is a block diagram showing a hardware configuration of a computer whose appearance is shown in FIG. 図８は、図５に示すオートエンコーダの概略構成を模式的に示す図である。FIG. 8 is a diagram schematically showing a schematic configuration of the autoencoder shown in FIG. 図９は、非特許文献３に開示された、メッセージ伝達を行うニューラル・ネットワークの構成を模式的に示す図である。FIG. 9 is a diagram schematically showing a configuration of a neural network for transmitting a message disclosed in Non-Patent Document 3. 図１０は、図９に示すニューラル・ネットワークから着想を得た、この発明の実施の形態で使用するオートエンコーダの一部であるエンコーダの構成を模式的に示す図である。FIG. 10 is a diagram schematically showing a configuration of an encoder which is a part of an autoencoder used in the embodiment of the present invention, which is inspired by the neural network shown in FIG. 図１１は、図１０に示すエンコーダを訓練する訓練装置として図６及び図７に示すコンピュータを機能させるためのコンピュータ・プログラムの概略の制御構造を示すフローチャートである。FIG. 11 is a flowchart showing a schematic control structure of a computer program for operating the computer shown in FIGS. 6 and 7 as a training device for training the encoder shown in FIG. 図１２は、図８に示すプログラムによって訓練されたエンコーダを用いて複数のセンサの校正を行う校正装置として図６及び図７に示すコンピュータを機能させるためのコンピュータ・プログラムの制御構造の概略を示すフローチャートである。FIG. 12 outlines the control structure of a computer program for operating the computer shown in FIGS. 6 and 7 as a calibrator that calibrates a plurality of sensors using an encoder trained by the program shown in FIG. It is a flowchart. 図１３は、図１０に示すエンコーダを、その稼働時にオンラインで訓練するオンライン訓練装置として図６及び図７に示すコンピュータを機能させるためのコンピュータ・プログラムの概略の制御構造を示すフローチャートである。FIG. 13 is a flowchart showing a schematic control structure of a computer program for operating the computer shown in FIGS. 6 and 7 as an online training device for training the encoder shown in FIG. 10 online during its operation. 図１４は、図１０に示すエンコーダを訓練するための、この発明の第２の実施の形態に係る訓練装置として図６及び図７に示すコンピュータを機能させるためのコンピュータ・プログラムの概略の制御構造を示すフローチャートである。FIG. 14 is a schematic control structure of a computer program for operating the computer shown in FIGS. 6 and 7 as a training device according to a second embodiment of the present invention for training the encoder shown in FIG. It is a flowchart which shows.

以下の説明及び図面では、同一の部品には同一の参照番号を付してある。したがって、それらについての詳細な説明は繰返さない。なお、以下の実施の形態では、理解を容易にするために、特に注意しない限り、同時に校正対象のセンサが２個又は３個の場合について説明する。しかしこの発明はそのような実施の形態には限定されず、校正すべきセンサが４個以上の場合にも以下と同様にして実現できる。また以下の説明では移動する物体（移動体）として人間を想定しているが、必ずしも人間に限定されるわけではない。 In the following description and drawings, the same parts are given the same reference numbers. Therefore, detailed explanations about them will not be repeated. In the following embodiments, in order to facilitate understanding, a case where the number of sensors to be calibrated is two or three at the same time will be described unless otherwise specified. However, the present invention is not limited to such an embodiment, and can be realized in the same manner as described below when the number of sensors to be calibrated is four or more. In the following explanation, a human being is assumed as a moving object (moving body), but the present invention is not necessarily limited to human beings.

１．第１の実施の形態
１構成
（１）背景
図１に、この発明の第１の実施の形態に係る校正装置を用いるセンサシステム５０の概略構成を示す。図１を参照して、このセンサシステム５０は、２台のＲＧＢ−Ｄセンサ６０及び６２と、１台のマイクロホン・アレイ６４を含み、２人の対象人物６６及び６８の位置を検出し、時系列の位置データを出力する。ＲＧＢ−Ｄセンサ６０及びびＲＧＢ−Ｄセンサ６２はＲＧＢ画像とセンサから対象までの距離を測定可能なセンサであり、各センサが検出した対象人物６６及び６８の３次元座標を、各センサの位置を原点とする各センサのローカル座標で出力する。またマイクロホン・アレイ６４は２次元センサであり、マイクロホン・アレイ６４が得た対象人物６６及び６８の二次元座標をマイクロホン・アレイ６４の位置を原点とするローカル座標で出力する。なお、マイクロホン・アレイは人物が発話したときしか人物の位置を検出できない。したがって、マイクロホン・アレイを含むセンサシステムのための後述する訓練データの収集及び校正用データの収集では、所定領域内で人物が歩き回る際に何らかの発話を行う必要がある。 1. 1. 1st Embodiment 1 Configuration (1) Background FIG. 1 shows a schematic configuration of a sensor system 50 using the calibration device according to the 1st embodiment of the present invention. With reference to FIG. 1, the sensor system 50 includes two RGB-D sensors 60 and 62 and one microphone array 64 to detect the positions of two subjects 66 and 68 and time. Output the position data of the series. The RGB-D sensor 60 and the RGB-D sensor 62 are sensors capable of measuring the RGB image and the distance from the sensor to the target, and the three-dimensional coordinates of the target persons 66 and 68 detected by each sensor are set to the position of each sensor. Is output in the local coordinates of each sensor whose origin is. Further, the microphone array 64 is a two-dimensional sensor, and outputs the two-dimensional coordinates of the target persons 66 and 68 obtained by the microphone array 64 in local coordinates with the position of the microphone array 64 as the origin. The microphone array can detect the position of a person only when the person speaks. Therefore, in the collection of training data and the collection of calibration data, which will be described later for a sensor system including a microphone array, it is necessary to make some utterance when a person walks around in a predetermined area.

各ローカルセンサの位置及び姿勢（向き）が正確に分かっていれば、各センサが検出した人物のグローバル座標及びそれらの対応関係も分かる。したがってこれらセンサの出力を組合せてこれらセンサが置かれた視聴覚環境をコンピュータで容易に管理できる。 If the position and posture (orientation) of each local sensor are accurately known, the global coordinates of the person detected by each sensor and their correspondence can also be known. Therefore, the audiovisual environment in which these sensors are placed can be easily managed by a computer by combining the outputs of these sensors.

しかし、これらセンサの位置又は姿勢が分からない場合には、各センサの出力する座標値を共通のグローバル座標に変換できるように各センサの出力を校正する必要がある。 However, when the position or orientation of these sensors is unknown, it is necessary to calibrate the output of each sensor so that the coordinate values output by each sensor can be converted into common global coordinates.

（２）センサの校正
図２を参照して、例えばグローバル座標としてＲＧＢ−Ｄセンサ６０のローカル座標８０を採用するものとする。この場合、原点がローカル座標８２により表されるようにローカル座標８０の原点と一致するようにＲＧＢ−Ｄセンサ６２のローカル座標を平行移動する。さらにローカル座標８２の各軸（ｅ_１’、ｅ_２’及びｅ_３’の単位ベクトルで表される。）をＲＧＢ−Ｄセンサ６０のローカル座標８０の各軸（ｅ_１、ｅ_２及び_３の単位ベクトルで表される。）と一致するようにローカル座標を回転する。この際の並行移動による座標変換をｔ_２、回転による座標変換をＲ_２と表せば、ＲＧＢ−Ｄセンサ６２のローカル座標をＲＧＢ−Ｄセンサ６０のローカル座標８０によるローカル座標に変換する変換は通常は以下の式で表される。 (2) Sensor Calibration With reference to FIG. 2, for example, the local coordinates 80 of the RGB-D sensor 60 are adopted as the global coordinates. In this case, the local coordinates of the RGB-D sensor 62 are translated so that the origin coincides with the origin of the local coordinates 80 so as to be represented by the local coordinates 82. Furthermore each axis of the local coordinate _{_{82 (e 1 ', e 2}} ' and is represented by the unit vector _{e 3} '.) The axes of the local coordinate 80 of RGB-D sensor 60 _(e 1, of _{e 2} and ₃ Rotate the local coordinates to match the unit vector.). T ₂ the coordinate conversion by translational movement at this _time, if indicated coordinate transformation by rotating and R _2, converting to convert the local coordinates of the RGB-D sensor 62 to local coordinates by the local-coordinate 80 of the RGB-D sensor 60 is normal Is expressed by the following formula.

ただしＲ_２は３×３の回転行列、ｔ_２は３×１の平行移動ベクトル、Ｏは３×１のゼロベクトルである。回転行列Ｒ_２は２次元の特殊直交群ＳＯ（２）をなす。

However, R ₂ is a 3 × 3 rotation matrix, t ₂ is a 3 × 1 translation vector, and O is a 3 × 1 zero vector. The rotation matrix R ₂ forms a two-dimensional special orthogonal group SO (2).

同じ人物に関し、ｍ＋１回のタイムステップでＲＧＢ−Ｄセンサ６０及び６２が観測した３次元のセンサ出力ベクトルをそれぞれｐ_１ ^ｉ及びｐ_２ ^ｉ（ｉ＝０，１，…，ｍ）とする。すると、上式の行列Ｒ２及びベクトルｔ２はそれぞれ、以下の式により求められる。 For the same person, the three-dimensional sensor output vectors observed by the RGB-D sensors 60 and 62 in m + 1 time steps are p ₁ ⁱ and p ₂ ⁱ (i = 0, 1, ..., M), respectively. Then, the matrix R2 and the vector t2 in the above equation are obtained by the following equations, respectively.

このＲ_２及びｔ_２を記憶しておくことにより、ＲＧＢ−Ｄセンサ６２のローカル座標からＲＧＢ−Ｄセンサ６０のローカル座標、すなわちグローバル座標への変換が行える。この行列Ｒ_２及びｔ_２を求めることがＲＧＢ−Ｄセンサ６０及び６２の校正である。ＲＧＢ−Ｄセンサ６０及び６２はいずれも３Ｄセンサなので、この明細書ではこのような校正を３Ｄ＋３Ｄ校正と呼ぶ。

By _{storing the R 2} and t ₂ , the local coordinates of the RGB-D sensor 62 can be converted into the local coordinates of the RGB-D sensor 60, that is, the global coordinates. Obtaining the matrices R ₂ and t ₂ is the calibration of the RGB-

D sensors

60 and 62. Since the RGB-

D sensors

60 and 62 are both 3D sensors, such calibration is referred to as 3D + 3D calibration in this specification.

一方、ＲＧＢ−Ｄセンサ６０とマイクロホン・アレイ６４との間の校正は以下のようにして行える。マイクロホン・アレイ６４では対象までの距離は測定できず、方向が分かるだけである。そこでこのようにＲＧＢ−Ｄセンサ６０とマイクロホン・アレイ６４との校正は３Ｄ＋２Ｄ校正と呼ぶ。 On the other hand, the calibration between the RGB-D sensor 60 and the microphone array 64 can be performed as follows. The microphone array 64 cannot measure the distance to the object, only the direction. Therefore, the calibration of the RGB-D sensor 60 and the microphone array 64 in this way is called 3D + 2D calibration.

ある同一の人物に対するマイクロホン・アレイ６４の出力をｐ_３、ＲＧＢ−Ｄセンサ６０の出力をｐ_１とする。これはそれぞれ以下のように表せる。 The output of the microphone array 64 for a same person to the outputs of _p 3, RGB-D sensor 60 and _{p 1.} This can be expressed as follows.

ただしθ_１はマイクロホン・アレイ６４から対象の人物への方向のアジマス角、θ_２は仰角を表す。

However, θ ₁ represents the azimuth angle in the direction from the microphone array 64 to the target person, and θ ₂ represents the elevation angle.

マイクロホン・アレイ６４のローカル座標ｐ_３からＲＧＢ−Ｄセンサ６０のローカル座標（すなわちグローバル座標）への変換は、上記した特殊直交群ＳＯ（３）に関連するリー代数ｓｏ（３）におけるマイクロホン・アレイ６４の姿勢を表す行列ξを用いて、以下の式により容易に算出できる。 Conversion from the local coordinates _{p 3} of the microphone array 64 to the local coordinates of the RGB-D sensor 60 (i.e., global coordinates) of the microphone array in the Lie algebra so (3) associated with the above-described special orthogonal group SO (3) It can be easily calculated by the following formula using the matrix ξ representing the posture of 64.

上式の「ｉ」はｉ番目の人物の識別子を、「ｊ」はセンサの識別子を、それぞれ表す。また上式では左辺の「１」を省略してある。ξ_ｊの右上の「＾」は、３行３列の行列ξ_ｊを３×１のベクトルで表していることをす。ｊ＝３の場合、この行列ξ_３は以下の式により求めることができる。

In the above equation, "i" represents the identifier of the i-th person, and "j" represents the identifier of the sensor. In the above equation, the "1" on the left side is omitted. The "^" in the upper right of ξ _j indicates that the 3-by-3 matrix ξ _j is represented by a 3 × 1 vector. When j = 3, this matrix ξ ₃ can be obtained by the following equation.

ここでｍは測定データの先頭を０としたときの最後の測定データの番号であり、ｐ_ｉ ^１はＲＧＢ−Ｄセンサ６０のｉ番目の測定データを表し、ｐ_ｉ ^３はマイクロホン・アレイ６４のｉ番目の測定データを表す。

Where m is the number of the last measurement data when the start of the measurement data and 0, p _i ¹ denotes the i-th measurement data RGB-D sensor 60, p _i ³ is the microphone arrays 64 Represents the i-th measurement data.

こうしてξ_３を求めることにより、マイクロホン・アレイ６４のローカル座標で得られる測定データを、ＲＧＢ−Ｄセンサ６０のローカル座標（すなわちグローバル座標）に変換できる。 _{By obtaining ξ 3} in this way, the measurement data obtained at the local coordinates of the microphone array 64 can be converted into the local coordinates (that is, the global coordinates) of the RGB-D sensor 60.

（３）因子グラフ
上記した校正は、各センサが測定した人物の対応付けができていることが前提である。しかし、現実の環境では、測定誤差があるために、例えば複数の人物の位置を複数のセンサで測定したときに、各センサの出力のどの人物が互いに対応するかを正確に知ることが難しいという問題がある。 (3) Factor graph The above calibration is based on the premise that the persons measured by each sensor can be associated with each other. However, in a real environment, due to measurement errors, for example, when the positions of multiple people are measured by multiple sensors, it is difficult to know exactly which person in the output of each sensor corresponds to each other. There's a problem.

そこで、そのようなノイズを含むデータから確率的な表現を推定することが考えられる。そのような推定問題に適したツールとして因子グラフがある。因子グラフは、ベイジアンネットワークと同様、同時確率を因子の積で表すことができる。 Therefore, it is conceivable to estimate a probabilistic expression from the data including such noise. A factor graph is a suitable tool for such estimation problems. Similar to the Bayesian network, the factor graph can express the simultaneous probability as a product of factors.

図３に例としてグラフ１００を示す。図３において、Ｓ_１及びＳ_２はセンサ、ｘ_１ｉ及びｙ_１ｉ（ｉ＝１、２）はセンサＳ_１が２回にわたり測定した第１の人物及び第２の人物の位置データ、ｘ_２ｉ及びｙ_２ｉ（ｉ＝１、２）はセンサＳ_２がセンサＳ_１と同時に２回にわたり測定した第１の人物及び第２の人物の位置データをそれぞれ示す。なお添字ｉはタイムステップを示す。ｘは第１の人物の位置データを、ｙは第２の人物の位置データを、それぞれ示す。 FIG. 3 shows Graph 100 as an example. In FIG. 3, S ₁ and S ₂ are sensors, x _1i and y _1i (i = 1, 2) are the position data of the first person and the second person measured twice by the _{sensor S 1} _{, x 2i} and y _2i (i = 1,2) indicates the position data of the first person and second person sensor _{S 2} is measured over the sensor _{S 1} at the same time 2 times. The subscript i indicates a time step. x indicates the position data of the first person, and y indicates the position data of the second person.

これらは、図３に示すように、センサＳ_１及びＳ_２、センサＳ_１の測定した位置データｘ_１ｉ及びｙ_１ｉ（ｉ＝１、２）、並びにセンサＳ_２の測定した位置データｘ_２ｉ及びｙ_２ｉ（ｉ＝１、２）を頂点とし、各センサとそのセンサの測定した位置データのうち同じタイムステップで測定された位置データに対応する頂点の全ての組合せを結ぶエッジとからなるグラフ１００を形成する。 These, as shown in FIG. 3, the sensor _{S 1} and _{S 2,} the position data _{x 1i} and _y 1i measured sensor _{S 1} (i = 1,2), as well as position data _{x 2i} and measured sensor _{S 2} Graph 100 consisting of y _2i (i = 1, 2) as a vertex and an edge connecting each sensor and all combinations of the vertices corresponding to the position data measured in the same time step among the position data measured by the sensor. To form.

図３に示すように、同じ人物に対してセンサＳ_１及びＳ_２が測定した位置は、測定誤差のために一般的には互いに異なる値となり、直ちには互いに対応付けることができない。この実施の形態では、グラフ１００を用いて以下のような考え方で測定データの対応付けを行う。 As shown in FIG. 3, a position sensor S ₁ and S ₂ were measured for the same person, be different from each other in general for measurement error, immediately it can not be associated with one another. In this embodiment, the measurement data are associated with each other using the graph 100 based on the following concept.

すなわち、図４を参照して、ｘ_１１とｘ_２１，ｙ_１１とｙ_２１、ｘ_１２とｘ_２２，及びｙ_１２とｙ_２２とが同一人物を表す場合、グラフ１００においてこれらを結ぶエッジ１２０、１２２、１２４及び１２６のみを残し、他のエッジ（点線で表される）を全て削除するように因子グラフ１００を変形できれば、センサＳ_１の測定データとセンサＳ_２の測定データとの対応付けを行うことができる。 That is, referring to FIG. 4, when x ₁₁ and x ₂₁ , y ₁₁ and y ₂₁ , x ₁₂ and x ₂₂ , and y ₁₂ and y ₂₂ represent the same person, the edge 120 connecting them in the graph 100, If the factor graph 100 can be modified so that only 122, 124, and 126 are left and all other edges (represented by dotted lines) are deleted _{, the measurement data of the sensor S 1 and} the measurement data of the sensor S ₂ can be associated with each other. It can be carried out.

この実施の形態では、この対応付けのためにグラフ・ニューラル・ネットワーク（ＧＮＮ）を用いる。ＧＮＮは、ニューラル・ネットワークの一種であって、グラフ構造を持つデータを処理するのに適している。最近になって、推論及びマルチ・エージェント対話型システムにＧＮＮが非常に有効であることがわかってきた。非特許文献３では、対比較を行うためのグラフにおけるローカルなメッセージ伝達に関してＧＮＮが用いられている。以下に説明するこの発明の第１の実施の形態に係るシステムは、非特許文献３の記載をヒントに、ＧＮＮを用いて２つのセンサの測定した人物の一致を推定する。この詳細については後述する。 In this embodiment, a graph neural network (GNN) is used for this association. GNN is a kind of neural network and is suitable for processing data having a graph structure. Recently, GNN has been found to be very effective in inference and multi-agent interactive systems. In Non-Patent Document 3, GNN is used for local message transmission in a graph for making a pair comparison. The system according to the first embodiment of the present invention described below estimates the match of a person measured by two sensors using GNN, using the description of Non-Patent Document 3 as a hint. The details will be described later.

以下に説明する実施の形態では、２つの時系列データが同一の人物のものか否かを判定するためのニューラル・ネットワークを用いる。２つの時系列データは、複数のセンサが所定のタイムステップにわたり出力する、第１及び２の位置データからなる２つの時系列データである。このニューラル・ネットワークを用いることで、例えば２人の人物について第１のセンサが出力する２つの時系列データと、同じ２人の人物について第２のセンサが出力する２つの時系列とを比較し、第１のセンサのどの時系列データと、第２のセンサのどの時系列データとが同じ人物を表すか、その対応付けを行う。そのために上記した非特許文献３の記載をヒントに、上記した機能を提供するようにニューラル・ネットワークの訓練を行う。 In the embodiment described below, a neural network for determining whether or not the two time series data belong to the same person is used. The two time-series data are two time-series data consisting of first and second position data output by a plurality of sensors over a predetermined time step. By using this neural network, for example, two time series data output by the first sensor for two people and two time series output by the second sensor for the same two people are compared. , Which time-series data of the first sensor and which time-series data of the second sensor represent the same person are associated with each other. Therefore, using the above description of Non-Patent Document 3 as a hint, the neural network is trained to provide the above-mentioned functions.

（４）システムの全体構成
図５は、この発明の第１の実施の形態に係る校正システム１５０の全体構成を示す。図５を参照して、校正システム１５０は、上記したニューラル・ネットワークを一部に含みそのニューラル・ネットワークの訓練を行うためのオートエンコーダ１７８と、オートエンコーダ１７８により訓練されたニューラル・ネットワークのパラメータを記憶するためのパラメータ記憶部１８０とを含む。以下、このニューラル・ネットワークをエンコーダと呼ぶ。 (4) Overall Configuration of System FIG. 5 shows the overall configuration of the calibration system 150 according to the first embodiment of the present invention. With reference to FIG. 5, the calibration system 150 includes an autoencoder 178 for training the neural network including the above-mentioned neural network as a part, and parameters of the neural network trained by the autoencoder 178. A parameter storage unit 180 for storing is included. Hereinafter, this neural network will be referred to as an encoder.

校正システム１５０はさらに、エンコーダの訓練を行うための訓練データを生成するための訓練データ生成部１６０と、訓練データ生成部１６０により生成された訓練データを収集するための訓練データ収集装置１７０と、訓練データ収集装置１７０により収集された訓練データをコンピュータ可読な形式で記憶するための訓練データ記憶部１７２と、上記したエンコーダの訓練を行うためのコンピュータ・プログラムを記憶するための訓練プログラム記憶部１７６と、訓練プログラム記憶部１７６に記憶されたプログラムを実行し、訓練データ記憶部１７２に記憶された訓練データを用いてオートエンコーダ１７８の訓練を行い、それによってオートエンコーダ１７８の一部であるエンコーダの訓練を行うためのオンライン校正装置訓練システム１７４と、訓練が終了した後の、オートエンコーダ１７８をコンピュータにより実現するためのパラメータを記憶するためのパラメータ記憶部１８０とを含む。 The calibration system 150 further includes a training data generation unit 160 for generating training data for training the encoder, a training data collection device 170 for collecting training data generated by the training data generation unit 160, and a training data collection device 170. The training data storage unit 172 for storing the training data collected by the training data collection device 170 in a computer-readable format, and the training program storage unit 176 for storing the computer program for training the above-mentioned encoder. And the program stored in the training program storage unit 176 is executed, and the auto encoder 178 is trained using the training data stored in the training data storage unit 172, whereby the encoder that is a part of the auto encoder 178 is trained. It includes an online calibrator training system 174 for performing training and a parameter storage unit 180 for storing parameters for realizing the auto encoder 178 by a computer after the training is completed.

この例では、訓練データ生成部１６０はＲＧＢ−Ｄセンサ６０及び６２を含み、所定領域内を移動する２人の人物の位置データの、所定のタイムステップごとに計測した所定タイムステップ数の時系列を生成するものとする。もちろん、訓練では、このような訓練データを幾通りも生成し訓練データ記憶部１７２に記憶しておく。 In this example, the training data generation unit 160 includes RGB-D sensors 60 and 62, and is a time series of a predetermined number of time steps measured for each predetermined time step of the position data of two persons moving in a predetermined area. Suppose to generate. Of course, in training, a number of such training data are generated and stored in the training data storage unit 172.

校正システム１５０はさらに、校正対象となる校正対象音響処理システム１６２を含む。この実施の形態では、校正対象音響処理システム１６２は訓練データ生成部１６０と同様、ＲＧＢ−Ｄセンサ６０及び６２を含み、所定領域を移動する２人の人物の位置データの時系列を取得するものとする。またこの例ではＲＧＢ−Ｄセンサ６０及び６２の位置及び姿勢に関する校正を行うことが目的である。したがって校正前にＲＧＢ−Ｄセンサ６０及び６２の位置及び姿勢を厳密に設定する必要はない。 The calibration system 150 further includes a calibration target acoustic processing system 162 to be calibrated. In this embodiment, the calibration target sound processing system 162 includes RGB-D sensors 60 and 62 like the training data generation unit 160, and acquires a time series of position data of two persons moving in a predetermined area. And. Further, in this example, the purpose is to calibrate the positions and orientations of the RGB-D sensors 60 and 62. Therefore, it is not necessary to strictly set the positions and orientations of the RGB-D sensors 60 and 62 before calibration.

校正システム１５０はさらに、パラメータ記憶部１８０に記憶されたパラメータのうち、エンコーダに関するパラメータと、オートエンコーダ１７６に記憶された、エンコーダのアルゴリズムを実現するプログラムとを用いて、校正対象音響処理システム１６２から得られる所定のタイムステップ数の時系列の位置データを処理し、ＲＧＢ−Ｄセンサ６０とＲＧＢ−Ｄセンサ６２との間の校正を行うためのオンライン校正装置１８２と、オンライン校正装置１８２による校正の結果得られた校正パラメータを記憶するための校正パラメータ記憶部１８６とを含む。この実施の形態では、ＲＧＢ−Ｄセンサ６０のローカル座標をワールド座標とし、ＲＧＢ−Ｄセンサ６２のローカル座標をワールド座標に変換するためのパラメータ（前述の行列Ｒ２及びベクトルｔ２）がオンライン校正装置１８２により求められ、校正パラメータ記憶部１８６に記憶される。 The calibration system 150 further uses the parameters related to the encoder among the parameters stored in the parameter storage unit 180 and the program stored in the auto encoder 176 to realize the algorithm of the encoder from the sound processing system 162 to be calibrated. The online calibration device 182 for processing the obtained time-series position data of a predetermined number of time steps and calibrating between the RGB-D sensor 60 and the RGB-D sensor 62, and the online calibration device 182 for calibration. It includes a calibration parameter storage unit 186 for storing the resulting calibration parameter. In this embodiment, the local coordinates of the RGB-D sensor 60 are set as world coordinates, and the parameters (the above-mentioned matrix R2 and vector t2) for converting the local coordinates of the RGB-D sensor 62 into world coordinates are the online calibration device 182. And stored in the calibration parameter storage unit 186.

校正システム１５０はさらに、オンライン校正装置１８２による校正時、及び校正対象音響処理システム１６２の実際の稼働時に得られる時系列データを用いてエンコーダの訓練を同時並行的に行うための校正装置バックグラウンド更新システム１８４を含む。 The calibration system 150 further updates the background of the calibration device to simultaneously train the encoder using the time-series data obtained during calibration by the online calibration device 182 and during the actual operation of the sound processing system 162 to be calibrated. Includes system 184.

（５）コンピュータによる実現
図５において、訓練データ生成部１６０及び校正対象音響処理システム１６２を除く各機能部は、コンピュータハードウェア及びその上で実行されるコンピュータ・プログラムにより実現される。図６にそうしたコンピュータシステム２９０の外観を示し、図７にコンピュータシステム２９０のハードウェア構成をブロック図で示す。 (5) Realization by computer In FIG. 5, each functional unit except the training data generation unit 160 and the audio processing system 162 to be calibrated is realized by computer hardware and a computer program executed on the computer hardware. FIG. 6 shows the appearance of such a computer system 290, and FIG. 7 shows a block diagram of the hardware configuration of the computer system 290.

図６を参照して、このコンピュータシステム２９０は、ＤＶＤドライブ３１０を有するコンピュータ３００と、キーボード３０６と、マウス３０８と、モニタ３０２とを含む。 With reference to FIG. 6, the computer system 290 includes a computer 300 with a DVD drive 310, a keyboard 306, a mouse 308, and a monitor 302.

図７を参照して、コンピュータ３００は、ＤＶＤドライブ３１０に加えて、ＣPＵ３１６と、ＣPＵ３１６、ＤＶＤドライブ３１０に接続されたバス３２６と、ニューラル・ネットワークの学習及び推論の際の数値計算を高速に行うためのＧPＵ３１７と、コンピュータ２９０のためのブートアッププログラム等を記憶するＲＯＭ３１８と、バス３２６に接続され、実行対象のプログラム命令、システムプログラム、およびプログラム実行中の作業データ等を記憶するＲＯＭ３１８と、不揮発性メモリであるハードディスク３１４を含む。コンピュータシステム２９０はさらに、他端末との通信を可能とするネットワーク３２８への接続を提供するネットワークＩ／Ｆ３０４と、ＵＳＢメモリ３３０が装着可能でコンピュータ２９０の各部とＵＳＢメモリ３３０との間のデータ交換を可能にするＵＳＢメモリポート３１２とを含む。ＧPＵ３１７は計算を高速にするためのもので、機能的には必須のものではなくＣPＵ３１６で代行できる。しかし計算を高速にするためにはＧPＵ３１７があることが望ましい。 With reference to FIG. 7, the computer 300 performs numerical calculations at high speed in learning and inferring the CPU 316, the CPU 316, the bus 326 connected to the DVD drive 310, and the neural network in addition to the DVD drive 310. GPU 317 for storage, ROM 318 for storing boot-up programs for computer 290, ROM 318 connected to bus 326 for storing program instructions to be executed, system programs, work data during program execution, etc., and non-volatile Includes a hard disk 314 which is a non-volatile memory. The computer system 290 further exchanges data between a network I / F 304 that provides a connection to a network 328 that enables communication with other terminals and a USB memory 330 that can be attached to each part of the computer 290 and the USB memory 330. Includes a USB memory port 312 that enables The GPU 317 is for speeding up the calculation, and is not functionally essential and can be replaced by the CPU 316. However, it is desirable to have GPU317 in order to speed up the calculation.

本実施の形態では、図５に示す訓練データ記憶部１７２、オートエンコーダ１７６、パラメータ記憶部１８０及び校正パラメータ記憶部１８６等は、いずれもハードディスク３１４又はＲＡＭ３２０により実現される。 In the present embodiment, the training data storage unit 172, the autoencoder 176, the parameter storage unit 180, the calibration parameter storage unit 186, and the like shown in FIG. 5 are all realized by the hard disk 314 or the RAM 320.

コンピュータシステム２９０に校正システム１５０及びその構成要素の機能を実現させるためのコンピュータ・プログラムは、ＤＶＤドライブ３１０に装着されるＤＶＤ３２２又はＵＳＢメモリ３３０に記憶され、ＤＶＤドライブ３１０又はＵＳＢメモリポート３１２からハードディスク３１４に転送される。又は、プログラムはネットワーク３２８を通じてコンピュータ３００に送信されハードディスク３１４に記憶されてもよい。プログラムは実行の際にＲＡＭ３２０にロードされる。ＤＶＤ３２２から、又はネットワークを介して、直接にＲＡＭ３２０にプログラムをロードしてもよい。 The computer program for realizing the functions of the calibration system 150 and its components in the computer system 290 is stored in the DVD 322 or the USB memory 330 mounted on the DVD drive 310, and is stored in the DVD drive 310 or the USB memory port 312 to the hard disk 314. Transferred to. Alternatively, the program may be transmitted to the computer 300 via the network 328 and stored in the hard disk 314. The program is loaded into RAM 320 at run time. The program may be loaded directly into the RAM 320 from the DVD 322 or via the network.

このプログラムは、コンピュータ３００にこの実施の形態の校正システム１５０の訓練データ収集装置１７０、オンライン校正装置訓練システム１７４、オートエンコーダ１７８、オンライン校正装置１８２及び校正装置バックグラウンド更新システム１８４として動作を行なわせる複数の命令を含む。この動作を行なわせるのに必要な基本的機能のいくつかはコンピュータ３００上で動作するオペレーティングシステム（ＯＳ）若しくはサードパーティのプログラム、又はコンピュータ３００にインストールされる各種プログラミング・ツール・キットのモジュールにより提供される。したがって、このプログラムはこの実施の形態のシステムおよび方法を実現するのに必要な機能全てを必ずしも含まなくてよい。このプログラムは、命令のうち、所望の結果が得られるように制御されたやり方で適切な機能又は所望のプログラミング・ツールを呼出すことにより、上記した校正システム１５０及びその構成要素としての動作を実行する命令のみを含んでいればよい。もちろん、プログラムはコンピュータ２９０に所望の機能を実現させるための命令を全て含んでもよい。コンピュータシステム２９０の動作は周知であるので、ここでは繰返さない。 This program causes the computer 300 to operate as a training data collection device 170, an online calibration device training system 174, an auto encoder 178, an online calibration device 182, and a calibration device background update system 184 of the calibration system 150 of this embodiment. Contains multiple instructions. Some of the basic functions required to perform this operation are provided by an operating system (OS) running on the computer 300 or a third-party program, or modules of various programming tool kits installed on the computer 300. Will be done. Therefore, this program does not necessarily include all the functions necessary to realize the system and method of this embodiment. This program performs the above-mentioned calibration system 150 and its components. It only needs to contain the instructions. Of course, the program may include all instructions for the computer 290 to achieve the desired function. The operation of the computer system 290 is well known and will not be repeated here.

なお、このプログラムはＣPＵ３１６が直ちに実行可能ないわゆるオブジェクトプログラムでもよいし、インタープリタにより逐次実行可能な形式に変換することが必要なスクリプト形式でもよい。 This program may be a so-called object program that can be immediately executed by CPU316, or may be a script format that needs to be converted into a format that can be sequentially executed by an interpreter.

（６）オートエンコーダ
図８は、図５に示すオートエンコーダ１７８の概略構成を示す。図８を参照して、このオートエンコーダ１７８は、２つの頂点の一定数のタイムステップの位置データの時系列を入力とし、その２つの頂点の間にエッジがあるか否かに関する確率分布３５４（ｐ（ｅ｜ν）、ただしνは２つの頂点の位置データの時系列、ｅはその２つの頂点の間にエッジがあるか否かを示す値）にしたがった値を出力するエンコーダ３５０と、異なる頂点の組合せからの位置データの入力に応答してエンコーダ３５０が出力する、確率分布３５４にしたがった値のうち、最も大きなものに対応する頂点の組合せの、特定のタイムステップにおける位置データを入力として、出力がその入力と等しくなるように訓練されるニューラル・ネットワークからなるデコーダ３５２とを含む。したがってデコーダ３５２の出力するベクトルの次元数は入力ベクトルの次元数と同じである。オンライン校正装置訓練システム１７４は、エンコーダ３５０に可能な頂点の組合せの位置データの時系列を与え、エンコーダ３５０の出力である確率分布３５４のサンプリング値が最も大きな組合せの、特定時点での位置データをデコーダ３５２に与え、デコーダ３５２の出力がそのデコーダ３５２への入力と等しくなる方向に近づくように、エンコーダ３５０及びデコーダ３５２のパラメータを調整する動作を、位置データの時系列の先頭から順番に最後まで行う処理を所定の終了条件が成立するまで繰返してオートエンコーダ１７８の訓練を行う。この実施の形態では、終了条件は上記繰返しを予め定められた回数だけ行ったときに充足される。 (6) Autoencoder FIG. 8 shows a schematic configuration of the autoencoder 178 shown in FIG. With reference to FIG. 8, this auto-encoder 178 takes a time series of position data of a fixed number of time steps of two vertices as input, and has a probability distribution 354 regarding whether or not there is an edge between the two vertices. An encoder 350 that outputs a value according to p (e | ν), where ν is a time series of position data of two vertices, and e is a value indicating whether or not there is an edge between the two vertices). Input the position data at a specific time step of the combination of vertices corresponding to the largest value according to the probability distribution 354 output by the encoder 350 in response to the input of position data from the combination of different vertices. Includes a decoder 352 consisting of a neural network whose output is trained to be equal to its input. Therefore, the number of dimensions of the vector output by the decoder 352 is the same as the number of dimensions of the input vector. The online calibrator training system 174 gives the encoder 350 a time series of position data of possible combinations of vertices, and obtains the position data at a specific time point of the combination having the largest sampling value of the probability distribution 354 which is the output of the encoder 350. The operation of adjusting the parameters of the encoder 350 and the decoder 352 by giving to the decoder 352 so that the output of the decoder 352 approaches the direction equal to the input to the decoder 352 is performed from the beginning to the end of the time series of the position data. The auto encoder 178 is trained by repeating the process to be performed until a predetermined end condition is satisfied. In this embodiment, the termination condition is satisfied when the above repetition is performed a predetermined number of times.

（７）エンコーダの訓練
エンコーダ３５０は、上記非特許文献３に記載された、グラフの頂点からエッジへ、さらにエッジから頂点へのメッセージ伝達を行うニューラル・ネットワークをヒントにしたものである。図９に、非特許文献３に記載されたニューラル・ネットワークの構成の概略を示す。 (7) Encoder Training The encoder 350 is inspired by the neural network described in Non-Patent Document 3 that transmits a message from the vertex to the edge of the graph and further from the edge to the vertex. FIG. 9 shows an outline of the configuration of the neural network described in Non-Patent Document 3.

図９を参照して、このメッセージ伝達ニューラル・ネットワーク４００は、２つの頂点の組合せ３９０、３９２、及び組合せ３９４を別々の入力として、それぞれの組合せを構成する頂点の間にエッジがあるか否かを示す値ｅ_１，２，ｅ_１，３及びｅ_２，３を出力するための全結合層からなる第１段のニューラル・ネットワーク４１０と、ニューラル・ネットワーク４１０の値のうち２つの値からなる全ての組合せ４２０、４２２及び４２４を入力として受け、ニューラル・ネットワーク４１０に入力された３つの頂点に対応する値４４０、４４２及び４４４を出力するように訓練される第２段のニューラル・ネットワーク４１２とを含む。ニューラル・ネットワーク４１０及び４１２は、この実施の形態ではいずれも全結合層からなる。 With reference to FIG. 9, the message transmission neural network 400 takes the combinations 390, 392, and 394 of the two vertices as separate inputs, and whether or not there is an edge between the vertices that make up each combination. It consists of a first-stage neural network 410 consisting of fully connected layers for outputting the values e ₁ , _{2, e 1, 3} and _{e 2, 3 indicating, and two of the values of the neural network 410.} With a second stage neural network 412 that takes all combinations 420, 422 and 424 as inputs and is trained to output the values 440, 442 and 444 corresponding to the three vertices input to the neural network 410. including. The neural networks 410 and 412 both consist of a fully connected layer in this embodiment.

この実施の形態では、図９に示される構成を基礎に、さらに図１０に示す構成を持つエンコーダ３５０をオートエンコーダ１７８の前段に用いる。図１０を参照して、エンコーダ３５０は、図９に示すメッセージ伝達ニューラル・ネットワーク４００と、メッセージ伝達ニューラル・ネットワーク４００が頂点の３つの組合せが入力されたことに応答してそれぞれ出力する値４４０、４４２及び４４４からの２つを組合せた組合せ４５０、４５２、及び組合せ４５４を入力として、これら組合せに対応する頂点の間にエッジが存在する確率を示すスコアであるスコア４７０、４７２、及びスコア４７４をそれぞれ出力するためのデコーダ４６０とを含む。スコア４７０、４７２、及びスコア４７４はそれぞれ、頂点１及び２がエッジで結ばれている確率、頂点１及び３がエッジで結ばれている確率、及び頂点２及び３がエッジで結ばれている確率を示す値であり、デコーダ４６０はそのような値を出力するように訓練される。デコーダ４６０も含めたエンコーダ３５０は、図８に示すオンライン校正装置訓練システム１７４により、オートエンコーダ１７８の全体を対象におこなわれる訓練により訓練される。 In this embodiment, based on the configuration shown in FIG. 9, an encoder 350 having the configuration shown in FIG. 10 is used in the front stage of the autoencoder 178. With reference to FIG. 10, the encoder 350 outputs a value 440, which is output by the message transmission neural network 400 shown in FIG. 9 and the message transmission neural network 400 in response to the input of three combinations of vertices. By inputting combinations 450, 452, and combination 454, which are combinations of the two from 442 and 444, scores 470, 472, and 474, which are scores indicating the probability that an edge exists between the vertices corresponding to these combinations, are obtained. Each includes a decoder 460 for output. Scores 470, 472, and score 474 are the probabilities that vertices 1 and 2 are connected by edges, the probabilities that vertices 1 and 3 are connected by edges, and the probabilities that vertices 2 and 3 are connected by edges, respectively. The decoder 460 is trained to output such a value. The encoder 350 including the decoder 460 is trained by the online calibration device training system 174 shown in FIG. 8 by training performed on the entire autoencoder 178.

図１１に、オートエンコーダ１７８の訓練を行うようコンピュータシステム２９０を機能させるためのコンピュータ・プログラムの制御構造をフローチャート形式で示す。図１１を参照して、このプログラムは、ステップ５００により実行を開始する。ステップ５００では、校正の対象となるセンサの位置及び姿勢がいずれも乱数により初期化される。ステップ５００ではこの他にも、エンコーダ３５０及びデコーダ３５２のパラメータの初期化も行われる。この初期化は、乱数により行ってもよいし、所定の事前学習により定められた値を各パラメータに代入して行ってもよい。他のシステムで訓練済の値を各パラメータに代入してもよい。 FIG. 11 shows a control structure of a computer program for operating the computer system 290 to train the autoencoder 178 in a flowchart format. With reference to FIG. 11, this program starts execution in step 500. In step 500, the position and orientation of the sensor to be calibrated are both initialized by random numbers. In step 500, the parameters of the encoder 350 and the decoder 352 are also initialized. This initialization may be performed by a random number, or may be performed by substituting a value determined by a predetermined pre-learning into each parameter. Values trained in other systems may be assigned to each parameter.

このプログラムはさらに、図５に示す訓練データ記憶部１７２に記憶されている、予め訓練データ収集装置１７０により収集した訓練データを訓練データ記憶部１７２から読出し、図７に示すＲＡＭ３２０にロードするステップ５０２を含む。訓練データの収集では、図５に示す訓練データ生成部１６０にＲＧＢ−Ｄセンサ６０及び６２を設置し、所定領域を所定人数（この実施の形態では２人）が移動する状態で、所定の時間間隔（タイムステップ）で、所定の時間（所定数のタイムステップ）にわたりそれらの人の位置及び速度を時系列データとして訓練データ収集装置１７０が収集し、訓練データ記憶部１７２に格納する。こうした作業を様々な状況で繰返し行い、多くの訓練データの組を収集することが必要である。 Further, this program reads the training data stored in the training data storage unit 172 shown in FIG. 5 in advance by the training data collecting device 170 from the training data storage unit 172, and loads the training data into the RAM 320 shown in FIG. 7 in step 502. including. In collecting training data, RGB-D sensors 60 and 62 are installed in the training data generation unit 160 shown in FIG. 5, and a predetermined number of people (two in this embodiment) move in a predetermined area for a predetermined time. At intervals (time steps), the training data collection device 170 collects the positions and speeds of those persons as time-series data over a predetermined time (a predetermined number of time steps) and stores them in the training data storage unit 172. It is necessary to repeat these tasks in various situations and collect many sets of training data.

このプログラムはさらに、ステップ５０２に続き、全訓練データ中の訓練データの組の全てに対して処理５０６を所定の回数だけ繰返すことによりオートエンコーダ１７８の訓練を行うステップ５０４と、ステップ５０４により訓練されたオートエンコーダ１７８のパラメータを図７に示すハードディスク３１４等の不揮発性記憶装置に記憶してプログラムの実行を終了するステップ５０８とを含む。 This program is further trained by step 504, which follows step 502 and trains the auto-encoder 178 by repeating the process 506 a predetermined number of times for all the sets of training data in all the training data, and step 504. This includes step 508 of storing the parameters of the auto encoder 178 in a non-volatile storage device such as the hard disk 314 shown in FIG. 7 and ending the execution of the program.

ここで言う訓練データの組とは、２人の人物に対し訓練データ生成部１６０が１回の訓練データ収集のセッションで収集した位置データの時系列の組のことをいう。ここで「セッション」とは、所定のタイムステップ数の測定データの集まりのことをいう。 The training data set referred to here is a time-series set of position data collected by the training data generation unit 160 in one training data collection session for two persons. Here, the "session" refers to a collection of measurement data for a predetermined number of time steps.

各タイムステップの測定データは、各センサについて、そのセンサが２人の人物について測定した位置データを含む。各位置データは３次元の座標データ及びその差分（速度）データを含む。すなわち、あるタイムステップでのセンサ出力である位置データは６次元である。したがって、あるタイムステップでの、２人の人物に対し２個のセンサから得られる測定データは２×２×６個の座標値を含む。この座標値は、各センサを原点とするローカル座標で与えられる。これらが各タイムステップで得られるので、結果として、１セッションの測定データは、第１のセンサが出力する２人の位置データからなる２つの時系列データと、第２のセンサが出力する２人の位置データからなる２つの時系列データとを含む。これら４つの位置データの時系列データの集合をここでは時系列データの「組」と呼ぶ。 The measurement data for each time step includes, for each sensor, position data measured by that sensor for two people. Each position data includes three-dimensional coordinate data and its difference (velocity) data. That is, the position data, which is the sensor output at a certain time step, is six-dimensional. Therefore, the measurement data obtained from two sensors for two people at a certain time step includes 2 × 2 × 6 coordinate values. This coordinate value is given in local coordinates with each sensor as the origin. Since these are obtained at each time step, as a result, the measurement data of one session consists of two time series data consisting of the position data of two people output by the first sensor and two people output by the second sensor. Includes two time-series data consisting of the position data of. The set of time-series data of these four position data is referred to as a "set" of time-series data here.

この時系列データの構成を表形式で示せば以下のとおりである。

The structure of this time series data is shown below in tabular form.

この表から分かるように、各センサが各人物について１タイムステップで出力する位置データは６次元（位置＋速度）である。１回のセッションでＮ回のタイムステップの測定をするとすれば、１つのセンサが１人の人物に対して出力する位置データの時系列は、それぞれ６（位置＋速度）次元のベクトル×Ｎ個の系列となる。これは６×Ｎ次元ベクトルということもできる。測定対象の人物が２人であり、センサは２つあるので、全体として１セッションの訓練データは４個の６×Ｎ次元ベクトルである。この４個の６×Ｎ次元ベクトルの全体が前述した「組」を構成し、６×Ｎ次元ベクトルの各々が前述した時系列データである。 As can be seen from this table, the position data output by each sensor for each person in one time step is six-dimensional (position + velocity). If N time steps are measured in one session, the time series of position data output by one sensor for one person is 6 (position + velocity) dimensional vector x N, respectively. It becomes a series of. This can also be called a 6 × N-dimensional vector. Since there are two people to be measured and two sensors, the training data for one session as a whole is four 6 × N-dimensional vectors. The whole of these four 6 × N-dimensional vectors constitutes the above-mentioned “set”, and each of the 6 × N-dimensional vectors is the above-mentioned time series data.

処理５０６は、訓練データの各組に対し、その先頭から以下のステップ５１２を繰返すステップ５１０を含む。 The process 506 includes a step 510 in which the following steps 512 are repeated from the beginning of each set of training data.

ステップ５１０は、処理対象の組内で可能なペアの各々に対し、処理５２２を実行するステップ５２０と、ステップ５２０の結果、エンコーダ３５０からペアの数だけサンプリングされる値を比較し、最も高い値のペアの訓練データのうち、ステップ５１０で指定される順番の訓練データを選択してデコーダ３５２に入力するステップ５２４と、ステップ５２４での入力に応答してデコーダ３５２の出力を算出するステップ５２６と、デコーダ３５２への入力とその出力との誤差を用いた誤差逆伝播法により、オートエンコーダ１７８の全パラメータを調整してステップ５１２を終了するステップ５２８とを含む。 In step 510, for each of the possible pairs in the set to be processed, step 520 that executes the process 522 and the value sampled from the encoder 350 as the number of pairs as a result of step 520 are compared, and the highest value is obtained. Of the pair of training data, the training data in the order specified in step 510 is selected and input to the decoder 352, and the output of the decoder 352 is calculated in response to the input in step 524. Includes step 528, which adjusts all parameters of the autoencoder 178 and ends step 512 by an error backpropagation method using an error between the input to the decoder 352 and its output.

処理５２２は、その組で、ステップ５２０により指定されたペアに対応する全訓練データをエンコーダ３５０に入力するステップ５４０と、入力に応答してエンコーダ３５０が出力する値をサンプリングして処理５２２を終了するステップ５４２とを含む。サンプリングされた値は、例えば図７に示すＲＡＭ３２０に一時的に保持される。 The process 522 ends the process 522 by sampling the values output by the encoder 350 in response to the input and the step 540 in which all the training data corresponding to the pair specified by the step 520 is input to the encoder 350. Includes step 542 and. The sampled value is temporarily held in the RAM 320 shown in FIG. 7, for example.

図１１に示す処理を実行することで図８に示すオートエンコーダ１７８が訓練され、したがってその一部であるエンコーダ３５０も訓練される。 By performing the process shown in FIG. 11, the autoencoder 178 shown in FIG. 8 is trained, and therefore the encoder 350, which is a part thereof, is also trained.

（８）校正
上記したように訓練されたオートエンコーダ１７８のうち、エンコーダ３５０を用いて図５の校正対象音響処理システム１６２の校正が行われる。図５には２台のＲＧＢ−Ｄセンサ６０及び６２のみが示されている。しかし実際には、校正対象音響処理システム１６２には３台以上のセンサが校正対象音響処理システム１６２には設けられていることが多い。また各センサはＲＧＢ−Ｄセンサ６０のような３次元センサには限らず、マイクロホン・アレイであってもよい。いずれかのセンサのローカル座標をグローバル座標として選択すれば、どのセンサも、グローバル座標に対応するセンサとの対でその位置を校正すればよい。グローバル座標として選択されたセンサをここでは基準センサと呼ぶ。 (8) Calibration Of the autoencoders 178 trained as described above, the encoder 350 is used to calibrate the acoustic processing system 162 to be calibrated in FIG. Only two RGB-D sensors 60 and 62 are shown in FIG. However, in reality, the calibration target sound processing system 162 is often provided with three or more sensors in the calibration target sound processing system 162. Further, each sensor is not limited to a three-dimensional sensor such as the RGB-D sensor 60, and may be a microphone array. If the local coordinates of any sensor are selected as the global coordinates, then any sensor may calibrate its position in pairs with the sensor corresponding to the global coordinates. The sensor selected as the global coordinates is referred to herein as a reference sensor.

図６及び図７に示すコンピュータシステム２９０を図５に示すオンライン校正装置１８２として機能させるプログラムの制御構造を図１２にフローチャート形式で示す。図１２を参照して、このプログラムは、初期処理を実行するステップ５６０を含む。この初期処理では、各センサの位置座標及びその姿勢を乱数により初期化する。他に、処理に必要な記憶領域を図７に示すＲＡＭ３２０に確保する処理等もこのステップ５６０で実行される。 The control structure of the program that causes the computer system 290 shown in FIGS. 6 and 7 to function as the online calibration device 182 shown in FIG. 5 is shown in the form of a flowchart in FIG. With reference to FIG. 12, this program includes step 560 to perform initial processing. In this initial process, the position coordinates of each sensor and its posture are initialized by random numbers. In addition, a process of securing a storage area required for the process in the RAM 320 shown in FIG. 7 is also executed in this step 560.

このプログラムはさらに、図５に示す校正対象音響処理システム１６２内を所定人数（この実施の形態では２人）の人物に移動してもらい、各センサから校正用データを収集するステップ５６２を含む。各校正用データはエンコーダ３５０の訓練時と同じであることを想定する。訓練時の各訓練データのタイムステップがｍ＋１であるとすれば、ステップ５６２で収集する校正用データもｍ＋１タイムステップである。また校正用データはセンサ個数だけ得られる。 The program further includes step 562 of having a predetermined number of people (two in this embodiment) move within the calibration target acoustic processing system 162 shown in FIG. 5 and collect calibration data from each sensor. It is assumed that each calibration data is the same as when the encoder 350 is trained. Assuming that the time step of each training data at the time of training is m + 1, the calibration data collected in step 562 is also the m + 1 time step. In addition, calibration data can be obtained for the number of sensors.

このプログラムはさらに、後述の処理５６６を基準センサ以外のセンサ数だけ繰返すことにより、各センサの校正を行い、校正パラメータをＲＡＭ３２０等に保存するステップ５６４を含む。 This program further includes step 564 of calibrating each sensor by repeating the process 566 described later by the number of sensors other than the reference sensor, and storing the calibration parameters in the RAM 320 or the like.

処理５６６は、処理対象の校正用データのうち、可能なデータのペアの数だけ、以下のステップ５８２を繰返して実行するステップ５８０と、ステップ５８０による処理の結果、各データペアに対してエンコーダ３５０から出力される値を比較し、各データペアが同一の人物を指すか否かを判定し、その結果にしたがってデータの対応付けを行う（人物の同定を行う）ステップ５８４と、センサの種類にしたがった式を用いて対象センサ（基準センサ以外のセンサ）の位置及び姿勢の校正パラメータを算出するステップ５８６と、ステップ５８６で算出された校正パラメータをＲＡＭ３２０等に保存して処理５６６を終了するステップ５８８とを含む。 The process 566 is a step 580 in which the following steps 582 are repeatedly executed as many as the number of possible data pairs among the calibration data to be processed, and as a result of the process in step 580, the encoder 350 is applied to each data pair. In step 584, which compares the values output from, determines whether or not each data pair points to the same person, and associates the data according to the result (identifies the person), and the type of sensor. A step 586 for calculating the calibration parameters of the position and orientation of the target sensor (sensor other than the reference sensor) using the following equation, and a step of storing the calibration parameters calculated in step 586 in the RAM 320 or the like and ending the process 566. Includes 588 and.

ステップ５８２は、エンコーダ３５０に処理対象のペアの校正用データを全て入力するステップ６００と、ステップ６００で与えられた入力に応答してエンコーダ３５０が出力する値をＲＡＭ３２０等に一時的に保存してステップ５８２を終了するステップ６０２とを含む。 In step 582, step 600 for inputting all the calibration data of the pair to be processed into the encoder 350 and the value output by the encoder 350 in response to the input given in step 600 are temporarily stored in the RAM 320 or the like. Includes step 602 and the end of step 582.

この処理をコンピュータシステム２９０が実行することにより、校正対象音響処理システム１６２内の各センサの校正が行われる。この処理で得られた校正パラメータを用いて各センサの位置及び姿勢を基準センサのローカル座標（グローバル座標）に変換することで各人物の位置を定めることができる。 When the computer system 290 executes this process, each sensor in the acoustic processing system 162 to be calibrated is calibrated. The position of each person can be determined by converting the position and posture of each sensor into the local coordinates (global coordinates) of the reference sensor using the calibration parameters obtained in this process.

なお、ステップ５８６で使用される校正パラメータの算出式は、ＲＧＢ−Ｄセンサのような３次元センサと、マイクロホン・アレイのような２次元センサとの場合で異なっている。この式については前述したとおりである。 The calculation formula of the calibration parameter used in step 586 is different between a three-dimensional sensor such as an RGB-D sensor and a two-dimensional sensor such as a microphone array. This formula is as described above.

（９）校正との並行訓練
この実施の形態ではさらに、上記したように校正パラメータを決定した後にも、各センサが出力するデータを用いて校正パラメータの更新を行う。コンピュータシステム２９０をそのための校正装置バックグラウンド更新システム１８４として機能させるプログラムの制御構造を図１３に示す。 (9) Parallel training with calibration In this embodiment, even after the calibration parameters are determined as described above, the calibration parameters are updated using the data output from each sensor. FIG. 13 shows the control structure of the program that causes the computer system 290 to function as the calibration device background update system 184 for that purpose.

図１３を参照して、このプログラムは、初期処理を行うステップ６２０と、後述する処理６２４を終了条件が成立するまで繰返すことで、各センサの校正パラメータを更新した値を算出するステップ６２２と、ステップ６２２で各センサに対して算出された更新後の校正パラメータで、図５の校正パラメータ記憶部１８６に記憶された校正パラメータを更新するステップ６２６とを含む。 With reference to FIG. 13, this program repeats the initial process 620 and the process 624 described later until the end condition is satisfied, thereby calculating an updated value of the calibration parameters of each sensor. The updated calibration parameters calculated for each sensor in step 622 include step 626 for updating the calibration parameters stored in the calibration parameter storage unit 186 of FIG.

ステップ６２０では、処理に必要な記憶領域をＲＡＭ３２０に確保する処理、及び図５に示す校正パラメータ記憶部１８６に記憶された各センサの校正パラメータをＲＡＭ３２０に読み出す処理等が行われる。 In step 620, a process of securing a storage area required for processing in the RAM 320, a process of reading the calibration parameters of each sensor stored in the calibration parameter storage unit 186 shown in FIG. 5 into the RAM 320, and the like are performed.

処理６２４は、訓練データ及び校正用データと同様の、所定のタイムステップ数の更新用データを更新対象の各センサから受信するステップ６４０と、後述の処理６４４を所定の回数繰返すステップ６４２とを含む。 The process 624 includes a step 640 of receiving the update data of a predetermined number of time steps from each sensor to be updated, which is similar to the training data and the calibration data, and a step 642 of repeating the process 644 described later a predetermined number of times. ..

処理６４４は、更新用データの各組に対して、その時系列の先頭から順に以下の処理６６２を実行するステップ６６０を含む。 Process 644 includes step 660 of executing the following process 662 in order from the beginning of the time series for each set of update data.

処理６６２は、処理対象の組のデータの中で可能な各データペアに対して処理６８２を実行することにより、各データペアについて、当該データベアに関する全更新用データが入力されたときのエンコーダ３５０の出力をサンプリングするステップ６８０と、ステップ６８０のサンプリング結果にしたがって、デコーダ入力の時系列データのペアを選択し、そのペアの、ステップ６６０で指定される順番のデータをデコーダ３５２に入力するステップ６８４と、ステップ６８４に続き、デコーダ３５２の出力を算出するステップ６８６と、デコーダ３５２への入力とステップ６８６で得られたデコーダ３５２の出力との誤差を用いた誤差逆伝播法により、オートエンコーダ１７８のパラメータを調整するステップ６８８とを含む。 The process 662 executes the process 682 for each possible data pair in the set of data to be processed, so that the encoder 350 when all the update data related to the data bear is input for each data pair. A pair of time-series data of the decoder input is selected according to the sampling result of step 680 and the sampling result of step 680, and the data of the pair in the order specified in step 660 is input to the decoder 352 in step 684. Then, following step 684, the auto encoder 178 is subjected to an error back propagation method using the error between the input to the decoder 352 and the output of the decoder 352 obtained in step 686 and the step 686 for calculating the output of the decoder 352. Includes step 688 and step 688 to adjust the parameters.

処理６８２は、処理中の組の処理中のペアの全更新用データをエンコーダ３５０に入力するステップ７００と、ステップ７００での入力に対するエンコーダ３５０の出力をサンプリングするステップ７０２とを含む。 The process 682 includes a step 700 of inputting the data for updating all the processing pairs of the set being processed into the encoder 350, and a step 702 of sampling the output of the encoder 350 with respect to the input in the step 700.

この処理は各センサ出力による人物の位置の検出と並行してバックグラウンドで動作可能である。したがって、図１３に示すプログラムをコンピュータシステム２９０が実行することにより、コンピュータシステム２９０は図５に示す校正装置バックグラウンド更新システム１８４として機能する。 This process can operate in the background in parallel with the detection of the position of the person by the output of each sensor. Therefore, when the computer system 290 executes the program shown in FIG. 13, the computer system 290 functions as the calibrator background update system 184 shown in FIG.

２動作
上記した構成を持つ校正システム１５０は以下のように動作する。 2 Operation The calibration system 150 having the above configuration operates as follows.

（１）動作全体の流れ
校正システム１５０の全体の動作の流れは以下のとおりである。 (1) Overall operation flow The overall operation flow of the calibration system 150 is as follows.

・訓練データ生成部１６０にＲＧＢ−Ｄセンサ６０、６２等のセンサを配置する。
・訓練データ収集装置１７０が訓練データを収集し訓練データ記憶部１７２に格納する。
・オンライン校正装置訓練システム１７４が訓練データ記憶部１７２とオートエンコーダ１７６とを用いてオートエンコーダ１７８を訓練する。訓練後のオートエンコーダ１７８のパラメータはパラメータ記憶部１８０に記憶される。 -Sensors such as RGB-D sensors 60 and 62 are arranged in the training data generation unit 160.
-The training data collection device 170 collects training data and stores it in the training data storage unit 172.
The online calibrator training system 174 trains the autoencoder 178 using the training data storage unit 172 and the autoencoder 176. The parameters of the autoencoder 178 after training are stored in the parameter storage unit 180.

・校正対象音響処理システム１６２にＲＧＢ−Ｄセンサ６０、６２等のセンサを配置する。
・ＲＧＢ−Ｄセンサ６０、６２等が検出対象とする領域内を２人の人物が歩き回り、その間にオンライン校正装置１８２が校正用データを収集する。
・オンライン校正装置１８２がパラメータ記憶部１８０からオートエンコーダ１７８のパラメータを読み込み、エンコーダ３５０及びデコーダ３５２を構築する。
・オンライン校正装置１８２が校正用データに対して図１２に示す処理を実行することで各センサの校正パラメータを算出する。校正パラメータは校正パラメータ記憶部１８６に記憶される。 -Sensors such as RGB-D sensors 60 and 62 are arranged in the calibration target acoustic processing system 162.
-Two people walk around in the area to be detected by the RGB-D sensors 60, 62, etc., and the online calibration device 182 collects calibration data during that time.
-The online calibration device 182 reads the parameters of the autoencoder 178 from the parameter storage unit 180, and constructs the encoder 350 and the decoder 352.
-The online calibration device 182 calculates the calibration parameters of each sensor by executing the process shown in FIG. 12 on the calibration data. The calibration parameters are stored in the calibration parameter storage unit 186.

・その後、校正パラメータ記憶部１８６に記憶された校正パラメータを用いて、図示しない音源定位装置等が所定領域内の人物の位置を検出する処理を実行する。
・人物の位置の検出と並行して、その際に得られた時系列データを用い、バックグラウンドでコンピュータシステム２９０がオートエンコーダ１７８の訓練を行う。その結果、エンコーダ３５０を含むオートエンコーダ１７８が新たなデータに基づいて更新される。 After that, using the calibration parameters stored in the calibration parameter storage unit 186, a sound source localization device or the like (not shown) executes a process of detecting the position of a person in a predetermined area.
-In parallel with the detection of the position of the person, the computer system 290 trains the autoencoder 178 in the background using the time series data obtained at that time. As a result, the autoencoder 178 including the encoder 350 is updated based on the new data.

（２）エンコーダ３５０の訓練
エンコーダ３５０の訓練は以下のようにして実行される。図５を参照して、ＲＧＢ−Ｄセンサ６０、６２等が配置された領域内を二人の人物が歩き回り、そのあいだのセンサ出力を訓練データ収集装置１７０が収集する。これがオートエンコーダ１７８の訓練データとして訓練データ記憶部１７２に記憶される。必要な量の訓練データが収集できたらオートエンコーダ１７８の訓練を行う。 (2) Training of the encoder 350 The training of the encoder 350 is executed as follows. With reference to FIG. 5, two people walk around in the area where the RGB-D sensors 60, 62 and the like are arranged, and the training data collecting device 170 collects the sensor output between them. This is stored in the training data storage unit 172 as training data of the autoencoder 178. When the required amount of training data can be collected, the autoencoder 178 is trained.

図１１を参照して、ステップ５００では、校正の対象となるセンサの位置及び姿勢がいずれも乱数により初期化される。ステップ５００ではこの他にも、エンコーダ３５０及びデコーダ３５２のパラメータの初期化も行われる。この初期化は、乱数により行ってもよいし、所定の事前学習により定められた値を各パラメータに代入してもよい。他のシステムで訓練済の値を各パラメータに代入してもよい。 With reference to FIG. 11, in step 500, the positions and orientations of the sensors to be calibrated are all initialized by random numbers. In step 500, the parameters of the encoder 350 and the decoder 352 are also initialized. This initialization may be performed by a random number, or a value determined by a predetermined pre-learning may be assigned to each parameter. Values trained in other systems may be assigned to each parameter.

さらにステップ５０２では、コンピュータシステム２９０は訓練データを訓練データ記憶部１７２から読出し、図７に示すＲＡＭ３２０にロードする。前述したように、この訓練データは複数のセッションにより得られた訓練データの組を含む。各組は４つの訓練データの時系列を含む。タイムステップ単位でいえば、各タイムステップの訓練データは、４つの６次元ベクトルを含む。２つのセンサが２人の人物についてそれぞれ６次元（位置＋速度）ベクトルを出力するためである。 Further, in step 502, the computer system 290 reads the training data from the training data storage unit 172 and loads it into the RAM 320 shown in FIG. As mentioned above, this training data includes a set of training data obtained from multiple sessions. Each set contains a time series of four training data. In terms of time step units, the training data for each time step includes four 6-dimensional vectors. This is because the two sensors output 6-dimensional (position + velocity) vectors for each of the two persons.

さらに、ステップ５０２に続き、ステップ５０４において、全訓練データ中の訓練データの組の全てに対して処理５０６を所定の回数だけ繰返すことによりオートエンコーダ１７８の訓練を行う。 Further, following step 502, in step 504, the autoencoder 178 is trained by repeating the process 506 a predetermined number of times for all the sets of training data in all the training data.

ステップ５０８では、ステップ５０４により訓練されたオートエンコーダ１７８のパラメータを図７に示すハードディスク３１４等の不揮発性記憶装置に記憶してプログラムの実行を終了する。 In step 508, the parameters of the autoencoder 178 trained in step 504 are stored in a non-volatile storage device such as the hard disk 314 shown in FIG. 7, and the execution of the program ends.

処理５０６ではまず、訓練データに含まれる各組に対し、先頭のタイムステップの測定データを選択し（ステップ５１０）、その組で可能なデータのペアの各々に対して処理５２２を行う。その組で可能なデータのペアとは、第１のセンサの第１及び２の人物の測定データと、第２のセンサの第１及び２の人物の測定データとの間で可能なペアのことをいう。図３を例に説明すると、先頭のタイムステップでの測定データはｘ_１１、ｙ_１１、ｘ_２１及びｙ_２１、第２のタイムステップでの測定データはｘ_１２、ｙ_１２、ｘ_２２及びｙ_２２である。これらのうち、ｘ_１１及び_１２が第１の時系列データを形成する。この時系列データをｘ_１とする。同様に、ｘ_２１及び_２２が時系列データｘ_２を、ｙ_１１及び_１２が時系列データｙ_１を、ｙ_２１及び_２２が時系列データｙ_２を、それぞれ形成する。これらの間での可能な組合せは、センサＳ_１で観測された時系列データｘ_１及びｙ_１のうちの一つと、センサＳ_２で観測された時系列データｘ_２及びｙ_２のうちの一つとの組合せとなる。すなわち可能な組合せは（ｘ_１、_２）、（ｘ_１、ｙ_２）、（ｙ_１、ｘ_２）、及び（ｙ_１、_２）の４通りである。これらは図３で測定データを結ぶエッジとして表現されている。 In the process 506, first, the measurement data of the first time step is selected for each set included in the training data (step 510), and the process 522 is performed for each of the data pairs possible in the set. The possible data pair in the set is a possible pair between the measurement data of the first and second persons of the first sensor and the measurement data of the first and second persons of the second sensor. To say. Taking FIG. 3 as an example, the measurement data in the first time step are x ₁₁ , y ₁₁ , x ₂₁ and y ₂₁ , and the measurement data in the second time step are x ₁₂ , y ₁₂ , x ₂₂ and y _22. Is. Of these, x ₁₁ and ₁₂ form the first time series data. This time-series data and x _1. Similarly, x ₂₁ and _{22 form} time series data x ₂ , y ₁₁ and _{12 form} time series data y ₁ , and y ₂₁ and ₂₂ form time series data y ₂ , respectively. A possible combination between these is _one of the time series data x ₁ and y ₁ observed by the sensor S 1 and one of the time series data x ₂ and y ₂ _{observed by the sensor S 2.} It will be a combination with one. That is, there are four possible combinations: (x ₁ , ₂ ), (x ₁ , y ₂ ), (y ₁ , x ₂ ), and (y ₁ , _2). These are represented as edges connecting the measurement data in FIG.

ステップ５２０では、この４通りの組合せの全てに対し、処理５２２を実行する。処理５２２のステップ５４０では、例えば（ｘ_１、_２）の組合せについて、時系列データｘ_１を構成する全ての訓練データｘ_１１及び_１２と、時系列データｘ_２を構成する全ての訓練データｘ_２１及び_２２とが連結されたベクトルがエンコーダ３５０に入力される。この入力に応答して、エンコーダ３５０がその内部のパラメータにより定まる演算を行い、結果として一つの値を出力する。この値は、ステップ５４２でエンコーダ３５０の規定する確率分布３５４からサンプリングしたものであり、時系列データｘ_１と、時系列データｘ_２とが同一の人物の位置を測定したものか否かを示すスコアである。もちろん、訓練の開始時にはエンコーダ３５０は正しい予測を行えるような状態にはなっていないので、このスコアは信頼がおけない。しかし、処理５０６を繰返し実行することにより、エンコーダ３５０のパラメータの訓練が行われ、入力された時系列のペアが同じ人物に関する測定データか否かを示すスコアを高い精度で出力できるようになる。 In step 520, the process 522 is executed for all of these four combinations. In step 540 of the process 522, for example, for _{the combination of (x 1} , ₂ ), all the training data x ₁₁ and ₁₂ constituting the time series data x ₁ and all the training data x ₂₁ constituting the time series data x ₂ A vector in which and ₂₂ are connected is input to the encoder 350. In response to this input, the encoder 350 performs an operation determined by its internal parameters and outputs one value as a result. This value is sampled from the probability distribution 354 defined by the encoder 350 in step 542, _{and indicates whether or not the time series data x 1} and the time series data x ₂ measure the position of the same person. The score. Of course, this score is unreliable, as the encoder 350 is not ready to make correct predictions at the start of training. However, by repeatedly executing the process 506, the parameters of the encoder 350 are trained, and it becomes possible to output a score indicating whether or not the input time series pair is the measurement data for the same person with high accuracy.

このスコアを（ｘ_１、_２）の組合せに対応する値としてＲＡＭ３２０に一旦記憶する。同様に、他の（ｘ_１、ｙ_２）、（ｙ_１、ｘ_２）、及び（ｙ_１、_２）の３通りについても処理５２２を実行し、ステップ５４２で得られた値をＲＡＭ３２０に記憶する。 This score is temporarily stored in the RAM 320 as a value corresponding to the combination of _{(x 1} , _2). Similarly, the _{processing 522 is executed for the other three methods (x 1} , y ₂ ), (y ₁ , x ₂ ), and (y ₁ , ₂ ), and the value obtained in step 542 is stored in the RAM 320. do.

ステップ５２０の処理が完了したところで、ステップ５２４において、上記した４つの組合せについて得られたスコアが最も高いものを選択し、その時系列データの中で、ステップ５１０により指定された順番の測定データを組合せたものをデコーダ３５２に入力する。例えば（ｘ_１、ｙ_２）のスコアが最も高かった場合には、ｘ_１１及びｙ_２１を連結したベクトルをデコーダ３５２に入力する。 When the process of step 520 is completed, in step 524, the one with the highest score obtained for the above four combinations is selected, and among the time series data, the measurement data in the order specified by step 510 is combined. Is input to the decoder 352. For example, when the score of _{(x 1} , y ₂ _{) is the highest, the vector in which x 11} and y ₂₁ are concatenated is input to the decoder 352.

ステップ５２６では、デコーダ３５２のパラメータにしたがい、入力されたベクトルに対するデコーダ３５２の出力を算出する。 In step 526, the output of the decoder 352 for the input vector is calculated according to the parameters of the decoder 352.

ステップ５２８では、デコーダ３５２に入力されたベクトル（現在の例ではｘ_１１及びｙ_２１を連結したベクトル）とデコーダ３５２が出力したベクトルとの誤差を用いた誤差逆伝播法により、オートエンコーダ１７８の全体の学習が行われる。 In step 528, the entire autoencoder 178 is subjected to an error backpropagation method using an error between the vector input to the decoder 352 (in the current example, _{the vector obtained by concatenating x 11} and y _{21) and the vector output by the decoder 352.} Learning is done.

続いて次のステップ５１２の処理が次のタイムステップの測定データに対して行われる。この場合、ステップ５２０で行われる処理は先頭のタイムステップの測定データについて行われた処理と全く同じである。ただしエンコーダ３５０のパラメータは１回目の繰返しとは変化している。 Subsequently, the process of the next step 512 is performed on the measurement data of the next time step. In this case, the process performed in step 520 is exactly the same as the process performed on the measurement data of the first time step. However, the parameters of the encoder 350 are different from those of the first repetition.

以下同様の処理が行われるが、ステップ５２４で選択されデコーダ３５２に入力されるのは、ステップ５４２でサンプリングされた値が最も大きな時系列データの、２番目の測定データのベクトルの組合せである。ステップ５２８の処理は１番目の測定データに対して行われたものと同様である。 The same processing is performed thereafter, but what is selected in step 524 and input to the decoder 352 is a combination of vectors of the second measurement data of the time series data having the largest value sampled in step 542. The process of step 528 is similar to that performed for the first measurement data.

このようにして、１個の測定データの組の全てのタイムステップについてステップ５１２の処理が実行されると、処理５０６の１回目の処理が完了する。この結果、オートエンコーダ１７８のパラメータはさらに変化する。ステップ５０４によれば、この処理をさらに何回か繰返す。この繰り返しによりオートエンコーダ１７８のパラメータの訓練が進行する。ここでは所定回数だけ処理５０６の処理を実行した時点でステップ５０４の終了条件が充足され、訓練を終了して処理はステップ５０８に進む。 In this way, when the process of step 512 is executed for all the time steps of one set of measurement data, the first process of process 506 is completed. As a result, the parameters of the autoencoder 178 change further. According to step 504, this process is repeated several more times. By repeating this process, the training of the parameters of the autoencoder 178 proceeds. Here, the end condition of step 504 is satisfied when the process of process 506 is executed a predetermined number of times, the training is completed, and the process proceeds to step 508.

ステップ５０８では、ステップ５０４による繰返し処理で得られたオートエンコーダ１７８のパラメータをＲＡＭ３２０に保存し、さらにハードディスク３１４（図７）等からなるパラメータ記憶部１８０に転記することでオートエンコーダ１７８（及びエンコーダ３５０）の訓練が終了する。 In step 508, the parameters of the autoencoder 178 obtained by the iterative processing in step 504 are stored in the RAM 320, and further transferred to the parameter storage unit 180 including the hard disk 314 (FIG. 7), thereby causing the autoencoder 178 (and the encoder 350). ) Training is completed.

（３）校正と並行訓練
上記した処理により訓練が終了したエンコーダ３５０を用いた校正対象音響処理システム１６２（図５）内の各センサの校正は以下のようにして行われる。 (3) Calibration and parallel training Calibration of each sensor in the calibration target sound processing system 162 (FIG. 5) using the encoder 350 whose training has been completed by the above processing is performed as follows.

図１２を参照して、ステップ５６０で初期処理が行われる。この実施の形態においては、この初期処理では、各センサの位置及び姿勢に関する値には乱数が設定される。またパラメータ記憶部１８０からエンコーダ３５０のパラメータを読み込むことでエンコーダ３５０を構築する。このエンコーダ３５０はオートエンコーダ１７８を用いて訓練されたものと同じアルゴリズムを提供するものである。 With reference to FIG. 12, the initial process is performed in step 560. In this embodiment, in this initial process, random numbers are set for the values related to the position and orientation of each sensor. Further, the encoder 350 is constructed by reading the parameters of the encoder 350 from the parameter storage unit 180. The encoder 350 provides the same algorithm trained with the autoencoder 178.

ステップ５６２では、校正用データの組をセンサ個数だけ収集する。ここでは、校正用データのタイムステップ数はｍ＋１であるものとする。はなお、この校正のための校正用データは、校正対象音響処理システム１６２の対象とする領域内を２人の人物が歩き回り、そのときの各センサの出力を得ることで収集される。 In step 562, as many sets of calibration data as the number of sensors are collected. Here, it is assumed that the number of time steps of the calibration data is m + 1. The calibration data for this calibration is collected when two people walk around in the target area of the calibration target acoustic processing system 162 and obtain the output of each sensor at that time.

続いてステップ５６４では、基準となるセンサ以外のセンサの各々について処理５６６を繰返す。基準となるセンサとは、構成で説明したとおり、そのローカル座標をグローバル座標として扱うことが決められたセンサである。他のセンサのローカル座標の座標値をこのグローバル座標の座標値に換算するためのパラメータを得ることが校正処理の目的である。 Subsequently, in step 564, the process 566 is repeated for each of the sensors other than the reference sensor. The reference sensor is a sensor whose local coordinates are determined to be treated as global coordinates, as described in the configuration. The purpose of the calibration process is to obtain parameters for converting the coordinate values of the local coordinates of other sensors into the coordinate values of the global coordinates.

処理５６６では、処理対象のセンサと基準センサとの出力に含まれる時系列データについて、可能なペアの数だけステップ５８２を実行する。この処理は図１１の処理５２２で実行される処理と同様であり、処理対象のデータが訓練データではなく校正用データである点のみが異なっている。 In the process 566, step 582 is executed for the time series data included in the output of the sensor to be processed and the reference sensor as many as the number of possible pairs. This process is the same as the process executed in the process 522 of FIG. 11, except that the data to be processed is not the training data but the calibration data.

ステップ５８２の処理の結果、可能なペアの全てについてエンコーダ３５０の出力が得られる。その中で最も高い値が得られたペアの時系列同士が、同じ人物に関する位置データを示すものとして対応付けられる。この例では、残るペアの時系列同士が、もう一人の人物に関する位置データを示すものとして自動的に対応付けられる。 As a result of the process of step 582, the output of the encoder 350 is obtained for all possible pairs. The time series of the pair with the highest value among them are associated with each other as indicating the position data related to the same person. In this example, the remaining pair of time series are automatically associated with each other as indicating position data for another person.

ステップ５８６では、このようにして対応付けられた時系列同士を用い、校正対象となるセンサの種類に応じて、前記した式のいずれかを用いてその位置及び姿勢の校正パラメータが算出される。こうして算出された校正パラメータは、ステップ５８８で校正パラメータ記憶部１８６に記憶される。 In step 586, the calibration parameters of the position and the posture are calculated by using any of the above equations according to the type of the sensor to be calibrated by using the time series associated with each other in this way. The calibration parameters calculated in this way are stored in the calibration parameter storage unit 186 in step 588.

処理５６６の処理を全ての対象センサに対して実行することで、基準となるセンサ以外の全てのセンサのローカル座標として得られた座標値を、グローバル座標の座標値に関するするための校正パラメータが校正パラメータ記憶部１８６に保存される。 By executing the process of process 566 for all target sensors, the calibration parameters for determining the coordinate values obtained as the local coordinates of all sensors other than the reference sensor with respect to the coordinate values of the global coordinates are calibrated. It is stored in the parameter storage unit 186.

この実施の形態ではさらに、上記したように校正パラメータを決定した後にも、各センサが出力するデータを用いてエンコーダ３５０のパラメータの更新を行う。コンピュータシステム２９０をそのための校正装置バックグラウンド更新システム１８４として機能させるプログラムの制御構造を図１３に示す。 Further, in this embodiment, even after the calibration parameters are determined as described above, the parameters of the encoder 350 are updated using the data output by each sensor. FIG. 13 shows the control structure of the program that causes the computer system 290 to function as the calibration device background update system 184 for that purpose.

図１３を参照して、このプログラムの実行が開始されると、ステップ６２０において初期処理を行う。ここでの初期処理は、オートエンコーダ１７８のパラメータをＲＡＭ３２０に読み込む処理、ＲＡＭ３２０に作業用の記憶領域を確保する処理などを含む。 With reference to FIG. 13, when the execution of this program is started, the initial processing is performed in step 620. The initial process here includes a process of reading the parameters of the autoencoder 178 into the RAM 320, a process of securing a storage area for work in the RAM 320, and the like.

続いてステップ６２２において、処理６２４を終了条件が成立するまで繰返す。この処理を終了するためには、操作者からの指示による場合、及び図示しない音源定位装置が動作を終了する場合等、任意の条件をトリガーにすることができる。 Subsequently, in step 622, the process 624 is repeated until the end condition is satisfied. In order to end this process, any condition can be used as a trigger, such as when an instruction is given from the operator or when the sound source localization device (not shown) ends the operation.

処理６２４では、まず、図示しない音源定位装置等がセンサから収集した更新用データを受信する。この更新用データの１組は、訓練データと同様のタイムステップ数からなるものとする。 In the process 624, first, the update data collected from the sensor by a sound source localization device or the like (not shown) is received. One set of the update data shall consist of the same number of time steps as the training data.

続いてステップ６４２において、ステップ６４０で受信した全ての更新用データに対して処理６４４を実行する。処理６４４の処理は訓練の処理とほぼ同様である。 Subsequently, in step 642, the process 644 is executed for all the update data received in step 640. The process of process 644 is almost the same as the process of training.

処理６４４では、更新用データに含まれる各組に対し、先頭のタイムステップのデータから順番に処理６６２を実行する。処理６６２の最初には、処理対象の組内の時系列データに対し、可能なペアの各々について処理６８２を実行する。処理６８２のステップ７００ではその組のそのペアの全更新用データをエンコーダ３５０に入力する。続くステップ７０２において、エンコーダ３５０の出力をサンプリングし、ＲＡＭ３２０に保持する。 In the process 644, the process 662 is executed in order from the data of the first time step for each set included in the update data. At the beginning of the process 662, the process 682 is executed for each of the possible pairs for the time series data in the set to be processed. In step 700 of process 682, all update data of the pair of the pair is input to the encoder 350. In the subsequent step 702, the output of the encoder 350 is sampled and held in the RAM 320.

ステップ６８０において処理６８２を全てのペアについて実行することで、処理対象の各組についてエンコーダ３５０の出力がサンプリングにより得られる。ステップ６８４ではこうしてサンプリングされた値の中で最も大きな値が得られたペアを選択し、そのペアの各タイムステップの測定データのうち、ステップ６６０により指定された順番の測定データからなるベクトルを連結したものをデコーダ３５２に入力する。続くステップ６８６で、デコーダ３５２の出力を算出する。ステップ６８８で、デコーダ３５２への入力とデコーダ３５２からの出力との誤差を用いて誤差逆伝播法により、パラメータ記憶部１８０に記憶されたオートエンコーダ１７８のパラメータを調整（更新）する。 By executing the process 682 for all pairs in step 680, the output of the encoder 350 is obtained by sampling for each pair to be processed. In step 684, the pair with the largest value among the values sampled in this way is selected, and among the measurement data of each time step of the pair, a vector consisting of the measurement data in the order specified by step 660 is concatenated. Is input to the decoder 352. In the following step 686, the output of the decoder 352 is calculated. In step 688, the parameters of the autoencoder 178 stored in the parameter storage unit 180 are adjusted (updated) by the error back propagation method using the error between the input to the decoder 352 and the output from the decoder 352.

そして所定の終了条件が成立するとこのプログラムは実行を終了する。このプログラムが実行されている間、オートエンコーダ１７８のパラメータはバックグラウンドで更新される。図示しない音源定位装置等はその動作にエンコーダ３５０は使用しないため、このようにオートエンコーダ１７８のパラメータを更新しても音源定位装置等の動作に影響は与えない。次回、オンライン校正装置１８２が校正処理を行うときのエンコーダ３５０の動作が変わってくることになる。 Then, when the predetermined end condition is satisfied, the program ends the execution. While this program is running, the parameters of the autoencoder 178 are updated in the background. Since the encoder 350 is not used for the operation of the sound source localization device or the like (not shown), updating the parameters of the autoencoder 178 in this way does not affect the operation of the sound source localization device or the like. Next time, the operation of the encoder 350 when the online calibration device 182 performs the calibration process will change.

以上のようにこの実施の形態に係る校正システム１５０によれば、訓練データを自動的に生成した後、何ら人手を介さずに自動的にオートエンコーダ１７８（及びエンコーダ３５０）の訓練が行われる。また校正対象音響処理システム１６２が含むセンサの校正時にも、各センサの実際の位置及び姿勢を人手で設定することなく、単に所定領域内を２人の人物が歩いて校正用データを生成するだけで、各センサの校正パラメータを自動的に算出できる。またＲＧＢ−Ｄセンサ６０のような３Ｄセンサだけではなくマイクロホン・アレイのような２Ｄセンサと３Ｄセンサとを組合せた音響処理システムでも校正パラメータを自動的に算出できるという効果がある。 As described above, according to the calibration system 150 according to this embodiment, after the training data is automatically generated, the autoencoder 178 (and the encoder 350) is automatically trained without any human intervention. Also, when calibrating the sensors included in the calibration target sound processing system 162, two people simply walk within a predetermined area to generate calibration data without manually setting the actual position and posture of each sensor. Therefore, the calibration parameters of each sensor can be calculated automatically. Further, there is an effect that the calibration parameter can be automatically calculated not only by the 3D sensor such as the RGB-D sensor 60 but also by the sound processing system in which the 2D sensor and the 3D sensor such as the microphone array are combined.

３実験
この第１の実施の形態に係るオートエンコーダ１７８による校正処理の性能をテストするために、以下に述べる実験を行った。実験では、非特許文献４で使用されたオープンデータセットを用いた。各カメラ測定値に、１５ｃｍの標準偏差のガウシアンノイズを加えた。マイクロホン・アレイの初期位置をランダムに設定し、そのマイクロホン・アレイに対するターゲット・アングルを、平均が０、標準偏差が２のガウシアンノイズにより生成した。 3 Experiment In order to test the performance of the calibration process by the autoencoder 178 according to the first embodiment, the experiment described below was performed. In the experiment, the open data set used in Non-Patent Document 4 was used. Gaussian noise with a standard deviation of 15 cm was added to each camera measurement. The initial position of the microphone array was randomly set and the target angle with respect to the microphone array was generated by Gaussian noise with a mean of 0 and a standard deviation of 2.

まず、オートエンコーダ１７８の訓練を行うために、５つの測定データの集合を準備した。各集合は１００タイムステップの時系列データの組を含んでいた。 First, a set of five measurement data was prepared for training the autoencoder 178. Each set contained a set of time series data with 100 time steps.

この実験により訓練したエンコーダ３５０を用いて人物の同定処理を行った結果、異なる人物について、それらが異なるとコンピュータ３００が正しく判定した率は９８．３％であった。このコンピュータ３００を用いた校正を行った結果、得られた平均誤差はＲＧＢ−センサについては２２ｍｍ、マイクロホン・アレイについては５７ｍｍであった。 As a result of performing a person identification process using the encoder 350 trained in this experiment, the rate at which the computer 300 correctly determined that different persons were different was 98.3%. As a result of calibration using this computer 300, the average error obtained was 22 mm for the RGB-sensor and 57 mm for the microphone array.

このようにこの第１の実施の形態によれば、手作業を介することなく、ＲＧＢ−センサとマイクロホン・アレイの双方について、高い精度で校正を行うことができる。校正に要するエンコーダ３５０についても同様に人手を介することなく訓練できる。この訓練は教師なし学習であり、手作業で訓練データを準備する必要はない。訓練データの準備と校正処理とのいずれの場合も、単に所定領域を所定の人数の人間が歩き回り、さらにマイクロホン・アレイが対象に含まれる場合には適宜発話することが求められるだけである。 Thus, according to this first embodiment, both the RGB-sensor and the microphone array can be calibrated with high accuracy without manual work. Similarly, the encoder 350 required for calibration can be trained without human intervention. This training is unsupervised learning and does not require manual training data preparation. In both the preparation of training data and the calibration process, it is only required that a predetermined number of people walk around a predetermined area and speak appropriately when a microphone array is included in the target.

２．第２の実施の形態
１構成
（１）全体構成
第２の実施の形態は、オンライン校正装置訓練システムに関する。図５に示すオンライン校正装置訓練システム１７４とは異なり、第２の実施の形態に係るオンライン校正装置訓練システムは、誤差逆伝播法によるオートエンコーダ１７８の訓練をミニバッチにより行う。 2. Second Embodiment 1 Configuration (1) Overall Configuration The second embodiment relates to an online calibration device training system. Unlike the online calibrator training system 174 shown in FIG. 5, the online calibrator training system according to the second embodiment trains the autoencoder 178 by the backpropagation method by a mini-batch.

（２）エンコーダの訓練
図１４に、この第２の実施の形態においてオートエンコーダ１７８を訓練するためのプログラム（コンピュータシステム２９０をオンライン校正装置訓練システムとして機能させるプログラム）の制御構造をフローチャート形式で示す。図１４を参照して、このプログラムが図１１に示すものと異なるのは、図１１のステップ５０４に代えて、処理７２２を全ての訓練データに対して所定回数にわたり繰返すステップ７２０を含む点である。 (2) Encoder Training FIG. 14 shows a control structure of a program for training the auto encoder 178 in the second embodiment (a program for making the computer system 290 function as an online calibration device training system) in a flowchart format. .. With reference to FIG. 14, this program differs from that shown in FIG. 11 in that, instead of step 504 in FIG. 11, process 722 includes step 720, which repeats process 722 for all training data a predetermined number of times. ..

処理７２２は、全訓練データを所定数のミニバッチに分割するステップ７４０と、これらミニバッチのうち、先頭のミニバッチから順番に処理７４６を実行するステップ７４２とを含む。 The process 722 includes a step 740 that divides all the training data into a predetermined number of mini-batch, and a step 742 that executes the process 746 in order from the first mini-batch among these mini-batch.

ステップ７４２は、対象のミニバッチ中の訓練データの各組の先頭から順番に処理７４６を実行することによりミニバッチ中の各訓練データにより得られる誤差を蓄積するステップ７６０と、ステップ７６０により蓄積された誤差を用いてオートエンコーダ１７８のパラメータを誤差逆伝播法により調整し処理７４６を終了するステップ７６４とを含む。 Step 742 includes step 760 for accumulating the error obtained by each training data in the mini-batch by executing the process 746 in order from the beginning of each set of training data in the target mini-batch, and step 760 and the error accumulated by step 760. Includes step 764, which adjusts the parameters of the autoencoder 178 by the backpropagation method and ends the process 746.

処理７４６は、処理７６２をミニバッチの訓練データの各組の先頭から順番に実行することでそのミニバッチに関して累積された誤差を算出するステップ７６０と、ステップ７６０により累積された誤差を用いた誤差逆伝播法によりオートエンコーダ１７８のパラメータを調整して処理７４６を終了するステップ７６４とを含む。 The process 746 executes the process 762 in order from the beginning of each set of training data of the mini-batch to calculate the accumulated error for the mini-batch, and the error backpropagation using the error accumulated in the mini-batch. Includes step 764, which adjusts the parameters of the autoencoder 178 by method to end process 746.

処理７６２は、処理対象の組の時系列データで可能な各ペアについてエンコーダ３５０の出力するスコアをサンプリングする処理７８２を実行するステップ７８０と、ステップ７８０でサンプリングされたスコアのうち、最も高い値に対応するペアに対応する時系列データの、ステップ７６０により指定される順番の（タイムステップの）測定データをデコーダ３５２に入力するステップ７８４と、ステップ７８４での入力に応答してデコーダ３５２の出力を算出するステップ７８６と、デコーダ３５２の出力と入力との誤差を累積して処理７６２を終了するステップ７８８とを含む。 The process 762 sets the highest value among the scores sampled in step 780 and step 780, which execute the process 782 for sampling the score output by the encoder 350 for each pair of time-series data of the set to be processed. Step 784 of inputting the measurement data (of the time step) in the order specified by step 760 of the time series data corresponding to the corresponding pair into the decoder 352, and the output of the decoder 352 in response to the input in step 784. Includes step 786 to calculate and step 788 to accumulate processing 762 by accumulating errors between the output and input of the decoder 352.

処理７８２は、処理対象の組の処理対象の時系列データのペアの全位置データをエンコーダ３５０に入力するステップ８００と、ステップ８００の入力に対応するエンコーダ３５０の出力をサンプリングして記憶して、処理対象のペアに対する処理７８２を終了するステップ８０２とを含む。 The processing 782 samples and stores the output of the encoder 350 corresponding to the input of step 800 and the step 800 of inputting the full position data of the pair of the time series data of the processing target of the processing target set to the encoder 350. Includes step 802 to end process 782 for the pair to be processed.

２動作
（１）動作全体の流れ
この第２の実施の形態に係るオンライン校正装置訓練システムの、訓練時の動作の全体の流れが第１の実施の形態の動作と異なるのは、訓練データの各組単位ではなく、ミニバッチ単位で誤差逆伝播法を適用する点にある。その他の点ではこの第２の実施の形態に係るオンライン校正装置訓練システムと第１の実施の形態のオンライン校正装置訓練システム１７４とは同様の動作を行う。 2 Operation (1) Overall flow of operation The overall flow of operation during training of the online calibration device training system according to the second embodiment is different from the operation of the first embodiment of the training data. The point is that the backpropagation method is applied not for each set but for each mini-batch. In other respects, the online calibrator training system according to the second embodiment and the online calibrator training system 174 according to the first embodiment operate in the same manner.

（２）エンコーダの訓練
図１４を参照して、この第２の実施の形態に係るオンライン校正装置訓練システムは、ステップ５００で初期処理を行う。続いてステップ５０２で訓練データを訓練データ記憶部１７２（図３参照）から読出し、ＲＡＭ３２０（図７参照）にロードする。 (2) Encoder Training With reference to FIG. 14, the online calibration device training system according to the second embodiment performs initial processing in step 500. Subsequently, in step 502, the training data is read from the training data storage unit 172 (see FIG. 3) and loaded into the RAM 320 (see FIG. 7).

続いて、コンピュータシステム２９０は、処理７２２を全ての訓練データに対して所定の回数繰返す。 Subsequently, the computer system 290 repeats the process 722 for all the training data a predetermined number of times.

処理７２２の各繰返しでは、全訓練データをミニバッチに分割し（ステップ７４０）、先頭のミニバッチから順番に処理７４６を実行する。この処理によりオートエンコーダ１７８のパラメータが調整される。 In each repetition of the process 722, all the training data is divided into mini-batch (step 740), and the process 746 is executed in order from the first mini-batch. This process adjusts the parameters of the autoencoder 178.

処理７４６では、処理対象のミニバッチに含まれる訓練データの各組の先頭から順番に処理７６２を実行する（ステップ７６０）ことで、そのミニバッチに関する誤差を蓄積する。 In the process 746, the error related to the mini-batch is accumulated by executing the process 762 in order from the beginning of each set of training data included in the mini-batch to be processed (step 760).

具体的には、処理７４６では、最初にその組を構成する時系列データに関し可能なペアの各々について、その組の時系列データを構成する全ての位置データをエンコーダ３５０に入力し（ステップ８００）、エンコーダ３５０に入力される時系列データのペアで表される人物が同一人物か否かを示すスコアをサンプリングする（ステップ８０２）。この処理を実行することで、その組を構成する時系列データの可能なペアの全てについてスコアが算出される。 Specifically, in the process 746, for each of the possible pairs of the time-series data that first constitutes the set, all the position data that constitutes the time-series data of the set is input to the encoder 350 (step 800). , A score indicating whether or not the persons represented by the pair of time series data input to the encoder 350 are the same person is sampled (step 802). By executing this process, scores are calculated for all possible pairs of time-series data that make up the set.

続いてステップ７８４において、ステップ７６０で算出されたスコアのうち最もスコアが高かったものに対応する時系列のペアを選択し、ペアを構成する時系列データの各々から、ステップ７６０で指定される順番のタイムステップの位置データを選択しデコーダ３５２に入力する（ステップ７８４）。ステップ７８６でデコーダ３５２の出力を算出し、ステップ７８８でデコーダ３５２の入力と出力との間の誤差を算出し、蓄積する。 Subsequently, in step 784, the time-series pair corresponding to the highest score among the scores calculated in step 760 is selected, and the order specified in step 760 from each of the time-series data constituting the pair. The position data of the time step of is selected and input to the decoder 352 (step 784). In step 786, the output of the decoder 352 is calculated, and in step 788, the error between the input and the output of the decoder 352 is calculated and accumulated.

このように処理７６２を実行することで、ステップ７４２で指定された処理対象のミニバッチについて、誤差が蓄積される。この誤差を用いて、ステップ７６４で誤差逆伝播法によりオートエンコーダ１７８のパラメータを調整する。このとき、ミニバッチの蓄積誤差をクリアしておく。 By executing the process 762 in this way, errors are accumulated for the mini-batch to be processed specified in step 742. Using this error, the parameters of the autoencoder 178 are adjusted by the error backpropagation method in step 764. At this time, the accumulation error of the mini-batch is cleared.

このようにステップ７４２で順番に選択されたミニバッチの全てについてステップ７６４の処理までを実行することで、全訓練データを用いたオートエンコーダ１７８の訓練が１回終了する。この訓練をステップ７２０で指定された回数だけ繰返すことでオートエンコーダ１７８の訓練が終了する。このようにして得られたオートエンコーダ１７８のエンコーダ３５０及びデコーダ３５２の両者のパラメータを図５のパラメータ記憶部１８０に保存する。 By executing the process up to step 764 for all the mini-batch sequentially selected in step 742 in this way, the training of the autoencoder 178 using all the training data is completed once. By repeating this training a number of times specified in step 720, the training of the autoencoder 178 is completed. The parameters of both the encoder 350 and the decoder 352 of the autoencoder 178 thus obtained are stored in the parameter storage unit 180 of FIG.

こうして、この第２の実施の形態に係るオンライン校正装置訓練システムの訓練が終了するが、このオンライン校正装置訓練システムによっても第１の実施の形態に係るオンライン校正装置１８２と同様にセンサの校正を実行できる。 In this way, the training of the online calibration device training system according to the second embodiment is completed, but the online calibration device training system also calibrates the sensor in the same manner as the online calibration device 182 according to the first embodiment. Can be executed.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味及び範囲内での全ての変更を含む。 The embodiments disclosed this time are merely examples, and the present invention is not limited to the above-described embodiments. The scope of the present invention is indicated by each claim of the scope of claims, taking into consideration the description of the detailed description of the invention, and all changes within the meaning and scope equivalent to the wording described therein. include.

５０センサシステム
６０、６２ＲＧＢ−Ｄセンサ
６４マイクロホン・アレイ
６６、６８対象人物
８０、８２ローカル座標
１００グラフ
１２０、１２２、１２４、１２６エッジ
１５０校正システム
１６０訓練データ生成部
１６２校正対象音響処理システム
１７０訓練データ収集装置
１７２訓練データ記憶部
１７４オンライン校正装置訓練システム
１７６、１７８オートエンコーダ
１８０パラメータ記憶部
１８２オンライン校正装置
１８４校正装置バックグラウンド更新システム
１８６校正パラメータ記憶部
２９０コンピュータシステム
３００コンピュータ
３０２モニタ
３０４ネットワークＩ／Ｆ
３０６キーボード
３０８マウス
３１０ＤＶＤドライブ
３１２ＵＳＢメモリポート
３１４ハードディスク
３１６ＣPＵ
３１７ＧPＵ
３１８ＲＯＭ
３２０ＲＡＭ
３２２ＤＶＤ
３２６バス
３２８ネットワーク
３３０ＵＳＢメモリ
３５０エンコーダ
３５２、４６０デコーダ
３５４確率分布
３９０、３９２、３９４、４２０、４２２、４２４、４５０、４５２、４５４組合せ
４００メッセージ伝達ニューラル・ネットワーク
４１０、４１２ニューラル・ネットワーク
４４０、４４２、４４４値
４７０、４７２、４７４スコア 50 Sensor system 60, 62 RGB-D sensor 64 Microphone array 66, 68 Target person 80, 82 Local coordinates 100 Graph 120, 122, 124, 126 Edge 150 Calibration system 160 Training data generator 162 Calibration target sound processing system 170 Training Data collection device 172 Training data storage unit 174 Online calibration device Training system 176, 178 Auto encoder 180 Parameter storage unit 182 Online calibration device 184 Calibration device Background update system 186 Calibration parameter storage unit 290 Computer system 300 Computer 302 Monitor 304 Network I / F
306 Keyboard 308 Mouse 310 DVD Drive 312 USB Memory Port 314 Hard Disk 316 CPU
317 GPU
318 ROM
320 RAM
322 DVD
326 Bus 328 Network 330 USB Memory 350 Encoder 352, 460 Decoder 354 Probability Distribution 390, 392, 394, 420, 422, 424, 450, 452, 454 Combination 400 Message Transmission Neural Network 410, 412 Neural Network 440, 442, 444 value 470, 472, 474 score

Claims

It is a calibration device for calibrating the positions and orientations of the first sensor and the second sensor, each of which can detect and output the positions of a plurality of moving objects in a discrete time series.
An acquisition unit that acquires first time-series data and second time-series data regarding the positions of a predetermined number of moving objects, respectively, measured by the first sensor and the second sensor over a predetermined time.
With the first time-series data and the second time-series data as inputs, the first moving body represented by the first time-series data and the second time-series data represented by the second time-series data. For each combination with the moving body, the time series data of the position of the first moving body in the first time series data and the time series of the position of the second moving body in the second time series data. A neural network pre-trained to take data as input and output a score indicating whether the first and second moving bodies forming the combination are the same moving body. Mobile body identification means consisting of
Based on the output of the moving body identification means, the correspondence between each moving body represented by the first time-series data and each moving body represented by the second time-series data is estimated, and the correspondence is estimated. Using the relationship, the position and orientation of the second sensor with respect to the first sensor are calibrated so that the output error between the first sensor and the second sensor for each moving body satisfies a predetermined condition. A calibrator, including a sensor calibrator.

The calibration device according to claim 1, wherein the sensor calibration means includes a minimization means for calibrating the position and orientation of the second sensor so as to minimize the sum of the output errors.

Further, claim 1 includes a parallel training means that uses the first time-series data and the second time-series data to train the moving body identification means in parallel with the operation of the correspondence estimation means. Alternatively, the calibration device according to claim 2.

The parallel training means
With the mobile body identification means
A decoder that inputs the output of the mobile body identification means and the position data of each of the first time series data and the second time series data at the same time step.
The moving object identification means and the above means so that the output of the decoder is close to the position data of the same time step input to the decoder over a predetermined range of the first time series data and the second time series data. The calibrator according to claim 3, further comprising an adjusting means for training the moving body identification means by adjusting parameters with the decoder.

The adjusting means uses the first time-series data and the second time-series data over the entire predetermined time to obtain the output of the decoder and the position data of the same time step input to the decoder. The calibration device according to claim 4, further comprising an error back-propagation means for training the moving body identification means by adjusting parameters between the moving body identification means and the decoder by an error back-propagation method using an error.

The parallel training means trains the moving body identification means each time the first time series data and the second time series data are given, according to any one of claims 3 to 5. The calibrator described.

The first time-series data and the second time-series data regarding the positions of the first moving body and the second moving body, respectively, measured by the first sensor and the second sensor over a predetermined time. As an input, a mobile body identification device comprising a neural network pre-trained to output a score indicating whether or not the first mobile body and the second mobile body are the same mobile body.

Each of the first time-series data and the second time-series data includes position data of the target moving body at predetermined time intervals.
The moving body identification device according to claim 7, wherein each of the position data for each predetermined time includes the position and speed of the target moving body and time information indicating the time when the position and speed are measured. ..

The neural network receives a plurality of inputs that receive the position and velocity included in the first time series data, and the position and velocity included in the second time series data, and an output that outputs the probability. The moving body identification device according to claim 8, which is a neural network composed of a plurality of layers having the above.

A time-series data acquisition unit that acquires a time-series of position data obtained in a predetermined time step over a predetermined time for each of a plurality of moving objects.
A position data extraction means for extracting position data acquired at the same time in a specified order from the time series of the position data acquired by the time series data acquisition unit.
A first neural network with an input determined by the number of predetermined time steps and at least one output.
A second neural network, both of which have the same number of inputs and outputs as determined by the position data that make up the time series.
All possible combinations of the two moving bodies are extracted from the plurality of moving bodies, and among the time series of the position data, the time series of the position data of the moving bodies constituting the extracted combination is the time series of the first neural network. -The input means given to the first neural network as the input to the network and
A first sampling means that samples the value output by the first neural network in response to the input.
A selection means for selecting the combination in which the largest value is obtained from the values sampled by the first sampling means for each of the possible combinations, and
Among the position data extracted by the position data extraction means, the position data of two moving bodies corresponding to the combination selected by the selection means is input to the second neural network, and the second neural network is used. -A second sampling method that samples the output of the network,
The error between the two moving body position data given to the input of the second neural network and the value sampled by the second sampling means from the output of the second neural network is reduced. In addition, a parameter adjusting means for adjusting the respective parameters of the first neural network and the second neural network by the error back propagation method, and
The position data extraction means, the first neural network, the input means, the first sampling means, the selection means, the second sampling means, and the parameter adjusting means are used in a time series of the position data. A first repetitive execution means that specifies position data in order from the beginning and repeatedly operates until the time series data ends.
A second repeat execution means that repeatedly executes the repetition by the first repeat execution means until a predetermined end condition is satisfied.
A training device including a parameter storage means for storing the parameters of the first neural network at a time when the repetition by the second repetition execution means is completed in a predetermined storage device.

The parameter adjusting means is between the two moving body position data given to the input of the second neural network and the value sampled by the second sampling means from the output of the second neural network. An error accumulating means that accumulates the error of the above a predetermined number of times,
After the first sampling means and the second sampling means have operated a predetermined number of times, the first neural network is used by an error backpropagation method so that the error accumulated by the error accumulating means is reduced. The training apparatus according to claim 10, further comprising a batch adjusting means for adjusting each parameter of the network and the second neural network by batch processing.

A computer program that causes a computer to function as the calibration apparatus according to any one of claims 1 to 6.

A computer program that causes a computer to function as the mobile identification device according to any one of claims 7 to 9.

A computer program that causes a computer to function as the training device according to claim 10 or 11.