JP7034035B2

JP7034035B2 - Motion generation method for autonomous learning robot device and autonomous learning robot device

Info

Publication number: JP7034035B2
Application number: JP2018156175A
Authority: JP
Inventors: 洋伊藤; 健次郎山本
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2018-08-23
Filing date: 2018-08-23
Publication date: 2022-03-11
Anticipated expiration: 2038-08-23
Also published as: CN112638596A; CN112638596B; WO2020039616A1; JP2020028950A

Description

本発明は、機械学習装置を備える又は機械学習装置と電気的に（通信可能に）接続されるロボット装置に係り、特に、外部からのセンサ情報をもとにロボットが動作を生成する自律学習型ロボット装置及びその動作生成方法に関する。 The present invention relates to a robot device provided with a machine learning device or electrically (communicable) connected to the machine learning device, and in particular, an autonomous learning type in which a robot generates an motion based on sensor information from the outside. The present invention relates to a robot device and a method for generating its motion.

従来のロボットシステムは、膨大なプログラミングや高い専門知識が必要であり、ロボット導入の阻害要因になっている。そこで、ロボット装置に取り付けられた各種センサ情報に基づいて、ロボット自身が動作を決定する自律学習型ロボット装置が提案されている。この自律学習型ロボット装置は、ロボット自らの動作経験を記憶及び学習することで多様な環境変化に対し柔軟な動作生成が可能ではないかと期待されている。 Conventional robot systems require a huge amount of programming and a high degree of specialized knowledge, which is an obstacle to the introduction of robots. Therefore, an autonomous learning type robot device in which the robot itself determines the operation based on various sensor information attached to the robot device has been proposed. It is expected that this autonomous learning type robot device can flexibly generate motions in response to various environmental changes by memorizing and learning the motion experiences of the robot itself.

ロボットの動作経験とは、例えば、オペレータ又はユーザがロボットに動作を直接教え記憶させる方法や、人や他のロボットの動作を見て真似る方法などが挙げられる。
一般的に、自律学習型ロボット装置には、学習器と呼ばれる学習装置が備えられており、動作経験時のセンサ情報の記憶と、動作を生成するためのパラメータ調整が行われている。この記憶された動作を学習データと称すると共にパラメータの調整を学習と称し、学習データを用いて学習器の学習を行う。学習器は、あらかじめ入出力の関係を定義し、学習器への入力値に対し期待した出力値が出力されるように学習を繰り返し行う。
例えば、ある動作経験時のロボットの関節角情報を時系列情報として記憶する。得られた学習データを用いて、学習器に、時刻（ｔ）の関節角情報を入力し、次時刻（ｔ＋１）の関節角情報を予測するように時系列学習させたとする。そして、学習が完了した学習器にロボット関節角情報を逐次入力することで、自律学習型ロボット装置は、環境や自身の状態変化に応じて自動的に動作を生成することが可能になる。 Examples of the robot operation experience include a method in which an operator or a user directly teaches and memorizes an operation to a robot, a method in which an operator or a user sees and imitates the operation of a person or another robot, and the like.
Generally, the autonomous learning type robot device is provided with a learning device called a learning device, and stores sensor information at the time of operation experience and adjusts parameters for generating an operation. This memorized operation is called learning data, and parameter adjustment is called learning, and the learning device is learned using the learning data. The learner defines the input / output relationship in advance, and repeats learning so that the expected output value is output with respect to the input value to the learner.
For example, the joint angle information of the robot at the time of a certain motion experience is stored as time series information. It is assumed that the joint angle information at the time (t) is input to the learner using the obtained learning data, and the time-series learning is performed so as to predict the joint angle information at the next time (t + 1). Then, by sequentially inputting the robot joint angle information into the learning device for which learning has been completed, the autonomous learning type robot device can automatically generate an motion according to the environment or its own state change.

このように環境や自身の状態変化に対し動的に動作を生成するための技術として、例えば、特許文献１、非特許文献１に記載される技術が知られている。
特許文献１では、人間などが行った作業の運動パターンの忠実な再現だけでは目的とする作業が成功しない或いは実時間の運動修正では対応できないような状況にあるロボットなどの運動計画及び制御において目的とする作業を成功するように動作を自動的に修正するロボットの作業学習装置を提供することを目的としている。そのため、ロボットの作業学習装置は、作業中の人間などの運動を計測する計測手段を実現する入力装置、入力されたデータから経由点を抽出する手段を実現する経由点抽出装置、ロボット装置に実現させる動きを計画する計画軌道生成装置、計画された軌道を実現するようにロボットに指令値を送る運動指令生成装置、作業を実現するロボット装置、実際にロボット装置で実現された或いはシミュレータで実現された作業から作業結果を抽出する作業結果抽出装置、及び、得られた作業結果と作業目標から作業の達成度を評価し、作業の達成度を向上させるように経由点を修正する経由点修正装置を備えている。
また、非特許文献１には、ロボットによる複数の物体操作行動から得られた視覚運動時系列の記憶学習により動作を生成する旨開示されている。 As a technique for dynamically generating an operation in response to a change in the environment or one's own state, for example, the techniques described in Patent Document 1 and Non-Patent Document 1 are known.
In Patent Document 1, the purpose is in motion planning and control of a robot or the like in a situation where the target work cannot be successful only by faithful reproduction of the motion pattern of the work performed by a human or the like or cannot be dealt with by real-time motion correction. It is an object of the present invention to provide a work learning device for a robot that automatically corrects an operation so as to succeed in the work. Therefore, the robot work learning device is realized as an input device that realizes a measuring means for measuring the movement of a human being during work, a waypoint extraction device that realizes a means for extracting a waypoint from input data, and a robot device. Planned trajectory generator that plans the movement to be made, motion command generator that sends command values to the robot to realize the planned trajectory, robot device that realizes the work, actually realized by the robot device or realized by the simulator A work result extraction device that extracts work results from the work, and a waypoint correction device that evaluates the achievement level of the work from the obtained work results and work goals and corrects the waypoints so as to improve the work achievement level. It is equipped with.
Further, Non-Patent Document 1 discloses that a motion is generated by memory learning of a visual motion time series obtained from a plurality of object manipulation actions by a robot.

特開平８－３１４５２２号公報Japanese Unexamined Patent Publication No. 8-314522

ＫｕｎｉａｋｉＮｏｄａ，ＨｉｒｏａｋｉＡｒｉｅ，ＹｕｋｉＳｕｇａ，ａｎｄＴｅｔｓｕｙａＯｇａｔａ：ＭｕｌｔｉｍｏｄａｌＩｎｔｅｇｒａｔｉｏｎＬｅａｒｎｉｎｇｏｆＲｏｂｏｔＢｅｈａｖｉｏｒｕｓｉｎｇＤｅｅｐＮｅｕｒａｌＮｅｔｗｏｒｋｓ，ＲｏｂｏｔｉｃｓａｎｄＡｕｔｏｎｏｍｏｕｓＳｙｓｔｅｍｓ，Ｖｏｌ．６２，Ｎｏ．６，ｐｐ．７２１－７３６，２０１４．Kuniaki Noda, Hiroaki Arie, Yuki Suga, and Tetsuya Ogata: Multimodal Integration Learning of Robot Robot Behavior Motion Leather Network Desor 62, No. 6, pp. 721-736, 2014.

しかしながら、特許文献１に開示される構成では、予め教示した動作１種のみに対し動作修正を行うため、複数の動作パターンの生成や、動作生成途中において他の動作パターンへの切り替えが困難となる。換言すれば、特許文献１では、異なる種別の動作パターンについては何ら考慮されていない。
また、非特許文献１に開示される構成では、複数の動作パターンを１つの学習器に学習させ、動作開始直後に選択した動作パターンに基づいて動作の生成を実現しているため、環境変化に対し動的な軌道修正や動作パターンの切り替えが困難である。 However, in the configuration disclosed in Patent Document 1, since the operation is corrected only for one type of operation taught in advance, it becomes difficult to generate a plurality of operation patterns or switch to another operation pattern during the operation generation. .. In other words, Patent Document 1 does not consider any different types of operation patterns.
Further, in the configuration disclosed in Non-Patent Document 1, a plurality of operation patterns are learned by one learning device, and the operation is generated based on the operation pattern selected immediately after the start of the operation. On the other hand, it is difficult to dynamically correct the trajectory and switch the operation pattern.

そこで、本発明は、ロボットの状態や環境変化に対しロバストであって、異なる種別の動作パターンの実行が可能な自律学習型ロボット装置及び自律学習型ロボット装置の動作生成方法を提供する。 Therefore, the present invention provides a motion generation method for an autonomous learning robot device and an autonomous learning robot device that are robust to changes in the robot state and environment and can execute different types of motion patterns.

上記課題を解決するため、本発明に係る自律学習型ロボット装置は、少なくとも制御部を有するロボット装置と、前記ロボット装置と電気的又は通信可能に接続される機械学習装置と、を備える自律学習型ロボット装置であって、前記機械学習装置は、センサ部により計測される前記ロボット装置の状態及び環境情報を含むセンサ情報からロボット装置の動作経由点を抽出する経由点抽出部と、前記経由点抽出部により抽出された経由点に対し所定の時間幅毎に動作パターンを学習し、前記センサ情報に基づき動作パターンを選択する動作パターン選択部と、前記経由点抽出部により抽出された経由点に対し所定の時間幅毎にロボットの動作パターンを学習し、前記センサ情報及びに前記動作パターン選択部により選択された動作パターンに基づき動作パターンを生成し、前記ロボット装置の制御部へ動作指令として出力する動作パターン生成部と、前記動作パターン生成部により生成された動作パターンと前記センサ情報とを比較し、前記ロボット装置の制御部へ動作パターンを出力するタイミングを決定する状態判定部と、を備えることを特徴とする。 In order to solve the above problems, the autonomous learning type robot device according to the present invention includes at least a robot device having a control unit and a machine learning device electrically or communicably connected to the robot device. The machine learning device is a robot device, and the machine learning device includes a waypoint extraction unit that extracts an operation waypoint of the robot device from sensor information including the state and environment information of the robot device measured by the sensor part, and the waypoint extraction unit. For the operation pattern selection unit that learns the operation pattern for the waypoints extracted by the unit at predetermined time widths and selects the operation pattern based on the sensor information, and for the waypoints extracted by the waypoint extraction unit. The robot operation pattern is learned for each predetermined time width, an operation pattern is generated based on the sensor information and the operation pattern selected by the operation pattern selection unit, and is output as an operation command to the control unit of the robot device. It is provided with an operation pattern generation unit, a state determination unit that compares the operation pattern generated by the operation pattern generation unit with the sensor information, and determines the timing of outputting the operation pattern to the control unit of the robot device. It is characterized by.

また、本発明に係る自律学習型ロボット装置の動作生成方法は、少なくとも制御部を有するロボット装置と、前記ロボット装置と電気的又は通信可能に接続される機械学習装置と、を備える自律学習型ロボット装置の動作生成方法であって、センサ部により計測される前記ロボット装置の状態及び環境情報を含むセンサ情報からロボット装置の動作経由点を経由点抽出部により抽出し、前記抽出された経由点に対し所定の時間幅毎に動作パターンを学習し、前記センサ情報に基づき動作パターンを動作パターン選択部により選択し、動作パターン生成部により、前記抽出された経由点に対し所定の時間幅毎にロボットの動作パターンを学習し、前記センサ情報及びに前記動作パターン選択部により選択された動作パターンに基づき動作パターンを生成し、前記ロボット装置の制御部へ動作指令として出力し、状態判定部により、前記動作パターン生成部により生成された動作パターンと前記センサ情報とを比較し、前記ロボット装置の制御部へ動作パターンを出力するタイミングを決定することを特徴とする。 Further, the motion generation method of the autonomous learning type robot device according to the present invention is an autonomous learning type robot including at least a robot device having a control unit and a machine learning device electrically or communicably connected to the robot device. It is a method of generating the operation of the device, and the operation waypoint of the robot device is extracted by the waypoint extraction unit from the sensor information including the state and environment information of the robot device measured by the sensor part, and the extracted waypoint is used as the waypoint. On the other hand, the operation pattern is learned for each predetermined time width, the operation pattern is selected by the operation pattern selection unit based on the sensor information, and the operation pattern generation unit selects the robot for each predetermined time width with respect to the extracted waypoints. The operation pattern is learned, an operation pattern is generated based on the sensor information and the operation pattern selected by the operation pattern selection unit, and is output as an operation command to the control unit of the robot device. It is characterized in that the operation pattern generated by the operation pattern generation unit is compared with the sensor information, and the timing of outputting the operation pattern to the control unit of the robot device is determined.

本発明によれば、ロボットの状態や環境変化に対しロバストであって、異なる種別の動作パターンの実行が可能な自律学習型ロボット装置及び自律学習型ロボット装置の動作生成方法を提供することが可能となる。
上記した以外の課題、構成及び効果は、以下の実施形態の説明により明らかにされる。 INDUSTRIAL APPLICABILITY According to the present invention, it is possible to provide an autonomous learning type robot device and a motion generation method of an autonomous learning type robot device that are robust to changes in the robot state and environment and can execute different types of motion patterns. Will be.
Issues, configurations and effects other than those described above will be clarified by the following description of the embodiments.

本発明の一実施例に係る自律学習型ロボット装置の全体概略構成図である。It is an overall schematic block diagram of the autonomous learning type robot apparatus which concerns on one Embodiment of this invention. 図１に示す自律学習型ロボット装置を用いた動作教示例を示す図である。It is a figure which shows the motion instruction example using the autonomous learning type robot apparatus shown in FIG. 教示動作の経由点を抽出する方法を示す図である。It is a figure which shows the method of extracting the waypoint of a teaching operation. 図１に示す自律学習型ロボット装置を用いた動作生成例を示す図である。It is a figure which shows the motion generation example using the autonomous learning type robot apparatus shown in FIG. 図１に示す機械学習装置を構成する動作パターン選択部と動作パターン生成部の学習方法を説明する図である。It is a figure explaining the learning method of the operation pattern selection unit and the operation pattern generation unit constituting the machine learning apparatus shown in FIG. 1. 学習データを所定の窓幅及びスライドサイズにて分割し学習する方法を説明する図である。It is a figure explaining the method of dividing the learning data by a predetermined window width and slide size, and learning. 図１に示す自律学習型ロボット装置の学習時の処理フローを示すフローチャートである。It is a flowchart which shows the processing flow at the time of learning of the autonomous learning type robot apparatus shown in FIG. 図１に示す自律学習型ロボット装置の動作時の処理フローを示すフローチャートである。It is a flowchart which shows the processing flow at the time of operation of the autonomous learning type robot apparatus shown in FIG. 図１に示す自律学習型ロボット装置の動作時のデータの流れを示す図である。It is a figure which shows the flow of data at the time of operation of the autonomous learning type robot apparatus shown in FIG.

本明細書においてロボット装置とは、例えば、人型ロボット、クレーン、工作機械、自動運転走行車両等、が含まれる。また、本明細書において、機械学習装置をクラウド（サーバ）にて実現し、上記ロボット装置と通信ネットワーク（有線であるか無線であるかを問わない）を介して接続される自律学習型ロボット装置も含まれる。なお、この場合、１つの機械学習装置に複数の異なる上記ロボット装置が電気的（通信可能）に接続される形態も含まれる。
以下では、説明を解り易くするため一例として、ロボットアームを有するロボット装置及び機械学習装置より構成される自律学習型ロボット装置について説明するが、自律学習型ロボット装置の形態はこれに限られるものではない。
以下、図面を用いて本発明の実施例について説明する。 In the present specification, the robot device includes, for example, a humanoid robot, a crane, a machine tool, an autonomous driving vehicle, and the like. Further, in the present specification, the machine learning device is realized in the cloud (server), and the robot device is connected to the robot device via a communication network (whether wired or wireless). Is also included. In this case, a form in which a plurality of different robot devices are electrically (communicable) connected to one machine learning device is also included.
In the following, an autonomous learning type robot device composed of a robot device having a robot arm and a machine learning device will be described as an example to make the explanation easier to understand, but the form of the autonomous learning type robot device is not limited to this. do not have.
Hereinafter, examples of the present invention will be described with reference to the drawings.

図１は、本発明の一実施例に係る自律学習型ロボット装置の全体概略構成図である。図１に示すように、自律学習型ロボット装置１は、ロボット装置２及び機械学習装置３より構成されている。ロボット装置２は、動作指令に基づいてロボットの各駆動部を制御する制御部１１及びロボットの状態量である各種センサ情報を計測するセンサ部１２を有する。制御部１１は、例えば、図示しないＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）などのプロセッサ、各種プログラムを格納するＲＯＭ、演算過程のデータを一時的に格納するＲＡＭ、外部記憶装置などの記憶装置にて実現されると共に、ＣＰＵなどのプロセッサがＲＯＭに格納された各種プログラムを読み出し実行し、実行結果である演算結果をＲＡＭ又は外部記憶装置に格納する。 FIG. 1 is an overall schematic configuration diagram of an autonomous learning type robot device according to an embodiment of the present invention. As shown in FIG. 1, the autonomous learning type robot device 1 is composed of a robot device 2 and a machine learning device 3. The robot device 2 has a control unit 11 that controls each drive unit of the robot based on an operation command, and a sensor unit 12 that measures various sensor information that is a state quantity of the robot. The control unit 11 is realized by, for example, a processor such as a CPU (Central Processing Unit) (not shown), a ROM for storing various programs, a RAM for temporarily storing data of a calculation process, and a storage device such as an external storage device. At the same time, a processor such as a CPU reads and executes various programs stored in the ROM, and stores the calculation result, which is the execution result, in the RAM or an external storage device.

機械学習装置３は、センサ部１２にて計測されたセンサ情報に基づきロボットの動作経由点を抽出する経由点抽出部２１、及び、経由点抽出部２１により抽出された経由点に基づいてロボットの動作パターンを分類し、センサ部１２にて計測されたセンサ情報と、状態判定部２４の指令に基づいて動作パターンを選択する動作パターン選択部２２を備える。また、機械学習装置３は、経由点抽出部２１により抽出された経由点に基づいてロボットの動作パターンを学習し、動作パターン選択部２２で選択された動作パターンを生成する動作パターン生成部２３、及び、動作パターン生成部２３にて生成された動作とセンサ部１２にて計測されたセンサ情報を比較し、動作パターン生成部２３へ動作パターン選択部２２を介して動作指令を送ることで、動作パターン生成部２３の動作タイミングを決定する状態判定部２４を有する。ここで、経由点抽出部２１、動作パターン選択部２２、動作パターン生成部２３、及び、状態判定部２４は、例えば、図示しないＣＰＵなどのプロセッサ、各種プログラムを格納するＲＯＭ、演算過程のデータを一時的に格納するＲＡＭ、外部記憶装置などの記憶装置にて実現されると共に、ＣＰＵなどのプロセッサがＲＯＭに格納された各種プログラムを読み出し実行し、実行結果である演算結果をＲＡＭ又は外部記憶装置に格納する。なお、説明を解り易くするため、各機能ブロックに分割して示しているが、経由点抽出部２１、動作パターン選択部２２、動作パターン生成部２３、及び、状態判定部２４を、１つの演算部としても良く、また、所望の機能ブロックを統合する構成としても良い。 The machine learning device 3 has a waypoint extraction unit 21 that extracts the operation waypoints of the robot based on the sensor information measured by the sensor unit 12, and a waypoint extraction unit 21 that extracts the waypoints of the robot. The operation pattern selection unit 22 for classifying the operation pattern and selecting the operation pattern based on the sensor information measured by the sensor unit 12 and the command of the state determination unit 24 is provided. Further, the machine learning device 3 learns the operation pattern of the robot based on the waypoints extracted by the waypoint extraction unit 21, and generates the operation pattern selected by the operation pattern selection unit 22. The operation is performed by comparing the operation generated by the operation pattern generation unit 23 with the sensor information measured by the sensor unit 12 and sending an operation command to the operation pattern generation unit 23 via the operation pattern selection unit 22. It has a state determination unit 24 that determines the operation timing of the pattern generation unit 23. Here, the waypoint extraction unit 21, the operation pattern selection unit 22, the operation pattern generation unit 23, and the state determination unit 24 use, for example, a processor such as a CPU (not shown), a ROM for storing various programs, and data of a calculation process. It is realized by a storage device such as a RAM or an external storage device that temporarily stores it, and a processor such as a CPU reads and executes various programs stored in the ROM, and the calculation result that is the execution result is stored in the RAM or the external storage device. Store in. Although the description is divided into functional blocks for the sake of clarity, the waypoint extraction unit 21, the operation pattern selection unit 22, the operation pattern generation unit 23, and the state determination unit 24 are combined into one operation. It may be a unit, or it may be configured to integrate desired functional blocks.

次に、図１示す自律学習型ロボット装置１を具体的に、図示しない１つのカメラとロボットアームから構成されるロボット装置２に対し、物体把持動作を学習させた例について示す。
ロボット装置２を構成する制御部１１は、機械学習装置３からの動作指令に基づいて、ロボットアームの各駆動部（図示せず）に対しＰＩＤ制御などを用いて駆動させ、センサ部１２は、ロボットの視覚情報であるカメラ画像とロボットアームの各関節角度を計測する。ここで、センサ部１２を構成するセンサとして、例えば、ポテンショメータ、エンコーダ、カメラ、或は、電流計等が用いられる。ロボットアームの関節をモータ駆動する場合は、ポテンショメータ、エンコーダ、或いはモータへの電流値により各関節角度が計測される。また、ロボットアームの関節をモータ以外で駆動する場合、例えば、アクチュエータ等により駆動する場合は、カメラによる撮像された画像に対し画像処理を実行することにより関節角度を演算により求める構成とすることが好ましい。 Next, an example will be shown in which the autonomous learning type robot device 1 shown in FIG. 1 is specifically made to learn an object gripping motion by a robot device 2 composed of a camera and a robot arm (not shown).
The control unit 11 constituting the robot device 2 drives each drive unit (not shown) of the robot arm by using PID control or the like based on an operation command from the machine learning device 3, and the sensor unit 12 drives the robot arm. The camera image, which is the visual information of the robot, and each joint angle of the robot arm are measured. Here, as the sensor constituting the sensor unit 12, for example, a potentiometer, an encoder, a camera, an ammeter, or the like is used. When the joint of the robot arm is driven by a motor, each joint angle is measured by a potentiometer, an encoder, or a current value to the motor. Further, when the joint of the robot arm is driven by a device other than the motor, for example, when the joint is driven by an actuator or the like, the joint angle may be obtained by calculation by performing image processing on the image captured by the camera. preferable.

経由点抽出部２１Xtrjは、ダイレクトティーチング或いはマスタ・スレーブなどの任意の動作教示方法を用いて物体の把持動作を教示したときに、センサ部１２にて計測された各種センサ情報を用いて、経由点を抽出する。図２は図１に示す自律学習型ロボット装置を用いた動作教示例を示す図である。本実施例では、図２に示すように、初期位置が異なる物体（物体Ａ及び物体Ｂ）の把持動作を複数回教示し、計測された各時系列データを所望の補間手法（線形補間、ラグランジュ補間、スプライン補間など）を用いて離散化する。なお、カメラにより撮像された画像のように上記補間手法では離散化が困難な時系列データが存在するため、各センサ間で抽出される経由点の数と時刻は、同一となるように離散化を行う。図２に示す動作教示例では、ロボットアームの先端部に取り付けられたロボットハンドが、ある位置に置かれた物体Ａに対する動作Ａとして、（１）腕を伸ばす、（２）物体Ａを掴む、（３）物体Ａを把持した状態で戻るという一連の動作が教示される。また、物体Ａとは異なる位置に置かれた物体Ｂに対する動作Ｂとして、（１）腕を伸ばす、（２）物体Ｂを掴む、（３）物体Ｂを把持した状態で戻るという一連の動作が教示される。 The waypoint extraction unit 21Xtrj uses various sensor information measured by the sensor unit 12 when teaching the gripping motion of an object by using an arbitrary motion teaching method such as direct teaching or master / slave. To extract. FIG. 2 is a diagram showing an example of motion teaching using the autonomous learning type robot device shown in FIG. In this embodiment, as shown in FIG. 2, the gripping motions of objects (object A and object B) having different initial positions are taught a plurality of times, and each measured time series data is subjected to a desired interpolation method (linear interpolation, Lagrange interpolation). , Spline interpolation, etc.) to disperse. Since there is time-series data that is difficult to discretize by the above interpolation method, such as an image captured by a camera, the number and time of waypoints extracted between each sensor are discretized so as to be the same. I do. In the motion teaching example shown in FIG. 2, the robot hand attached to the tip of the robot arm performs (1) extending the arm and (2) grasping the object A as motion A for the object A placed at a certain position. (3) A series of operations of returning while holding the object A is taught. Further, as the motion B for the object B placed at a position different from the object A, a series of motions of (1) extending the arm, (2) grasping the object B, and (3) returning while grasping the object B are performed. Be taught.

図３は、教示動作の経由点を抽出する方法を示す図である。例えば、複数のセンサ時系列データＤ_ｔｒｊ、経由点セット{Ｄ_{ｖｉａ，ｊ}，ｉ＝１，・・・Ｎ}が与えられ、各センサにつき７つの経由点を抽出した場合、各センサ情報は図３に示すように、横軸を時間、縦軸を関節角度とするグラフに示されるように抽出される。そして対応するように横軸を時間、縦軸を画像とするグラフとなる。なお、抽出する経由点の数は７つに限らず、適宜所望数設定可能であり、仮に抽出点の数を時系列データ長にした場合、全時系列データを用いることと同意となる。 FIG. 3 is a diagram showing a method of extracting a waypoint of a teaching operation. For example, when a plurality of sensor time series data D _trj , a waypoint set {D _{via, j} , i = 1, ... N} are given and seven waypoints are extracted for each sensor, each sensor information is shown in the figure. As shown in 3, the data is extracted as shown in the graph in which the horizontal axis is time and the vertical axis is the joint angle. Then, the graph has a horizontal axis as time and a vertical axis as an image so as to correspond to each other. The number of waypoints to be extracted is not limited to seven, and a desired number can be set as appropriate. If the number of extraction points is set to the time-series data length, it is agreed to use all time-series data.

動作パターン選択部２２及び動作パターン生成部２３は、経由点抽出部２１にて抽出された経由点情報に基づき学習を行う。本実施例では、一例として、動作パターン選択部２２及び経由点抽出部２１は、人工知能技術の１つであるニューラルネットワークを用いており、経由点抽出部２１にて抽出された経由点に対し、所望の時間幅を所望のステップでスライドすることで、多様な動作パターン（腕を伸ばす、掴むなど）を学習させることができる。ニューラルネットワークは、多様な情報を学習させることで、過去の学習経験に基づき、未知の情報に対し適切な情報を推定することが可能である。そのため、ニューラルネットワークを物体の把持動作学習に用いた場合、上述の図２に示した物体Ａ及び物体Ｂの把持動作を学習させることで、図４に示すように、未教示位置である物体Ｃの把持動作が可能となる。 The operation pattern selection unit 22 and the operation pattern generation unit 23 perform learning based on the waypoint information extracted by the waypoint extraction unit 21. In this embodiment, as an example, the operation pattern selection unit 22 and the waypoint extraction unit 21 use a neural network, which is one of the artificial intelligence technologies, for the waypoints extracted by the waypoint extraction unit 21. By sliding a desired time width in a desired step, various movement patterns (stretching, grasping, etc.) can be learned. By learning various information, the neural network can estimate appropriate information for unknown information based on past learning experience. Therefore, when the neural network is used for learning the gripping motion of the object, the object C, which is an unteached position, is shown in FIG. 4 by learning the gripping motion of the object A and the object B shown in FIG. The gripping operation of is possible.

図５は図１に示す機械学習装置３を構成する動作パターン選択部２２と動作パターン生成部２３の学習方法を説明する図であり、図６は学習データを所定の窓幅及びスライドサイズにて分割し学習する方法を説明する図である。図５では、窓幅を“３”、スライドサイズを“１”とし、物体の把持動作を学習させた場合の動作パターン選択部２２と動作パターン生成部２３の学習手順を示している。図５の説明をする前に、ここで、図６を用いて窓幅及びスライドサイズによる学習方法について説明する。 FIG. 5 is a diagram illustrating a learning method of an operation pattern selection unit 22 and an operation pattern generation unit 23 constituting the machine learning device 3 shown in FIG. 1, and FIG. 6 shows learning data with a predetermined window width and slide size. It is a figure explaining the method of dividing and learning. FIG. 5 shows a learning procedure of the motion pattern selection unit 22 and the motion pattern generation unit 23 when the window width is “3” and the slide size is “1” and the gripping motion of the object is learned. Before explaining FIG. 5, here, a learning method based on a window width and a slide size will be described with reference to FIG.

図６では、窓幅を“１０”、スライドサイズを“５”とし、物体の把持動作を学習させた場合を一例として示している。図６の上図のグラフは、横軸を時間、縦軸をセンサ値とする、例えば、ロボットの関節角度の時系列データを学習データとした場合を想定している。図６の上図に示すように、窓幅とは所定時の時間幅、ここでは窓幅がＷ＝１０の場合を示しており、窓幅（Ｗ＝１０）により学習データのうち部分データＸ^１が抽出される。同様に、窓幅（Ｗ＝１０）により学習データのうち部分データＸ^２及び部分データＸ^３が抽出される。ここで、相互に隣接する部分データＸ^１及び部分データＸ^２、部分データＸ^２及び部分データＸ^３とは所定の時間分遅延している。すなわち、相互に隣接する２つの部分データは、所定の遅延時間分スライドしており、図６の上図ではスライドサイズがＳ＝５の場合を示している。 In FIG. 6, a case where the window width is “10” and the slide size is “5” and the gripping motion of the object is learned is shown as an example. The graph in the upper figure of FIG. 6 assumes a case where the horizontal axis is the time and the vertical axis is the sensor value, for example, the time-series data of the joint angle of the robot is used as the learning data. As shown in the upper figure of FIG. 6, the window width indicates a time width at a predetermined time, here, a case where the window width is W = 10, and the partial data X of the training data is obtained by the window width (W = 10). ¹ is extracted. Similarly, the partial data X ² and the partial data X ³ are extracted from the training data by the window width (W = 10). Here, the partial data X ¹ and the partial data X ² , the partial data X ² , and the partial data X ³ that are adjacent to each other are delayed by a predetermined time. That is, the two partial data adjacent to each other are slid by a predetermined delay time, and the upper figure of FIG. 6 shows the case where the slide size is S = 5.

以下に学習データを所定の窓幅及びスライドサイズにて分割し学習する方法の概略を説明する。
まず図６の上図に示すように、学習データである時系列データを所定の窓幅（Ｗ＝１０）及びスライドサイズ（Ｓ＝５）で分割する。
続いて、各部分データについて以下の３つのステップを実行する。
ステップ１では、時刻ｔ＝０から時刻ｔ＝Ｗまでのセンサ情報（センサ値）を、図６に示すように、動作パターン生成部２３へ入力する。そして各時刻における誤差Ｌ^＊ｔを計算する。ここで＊は分割された部分データの番号を示している。なお、誤差計算については後述する。
ステップ２では、各時刻の誤差Ｌ^＊ｔに基づき学習データの全体誤差Ｌ^＊を算出する。
ステップ３では、各分データの全体誤差Ｌ^＊を用いて、動作パターン生成部２３の重みパラメータの更新を実行する。
上記ステップ１～ステップ３を指定回数若しくは目標誤差に達するまで繰り返し実行する。 The outline of the method of learning by dividing the learning data into predetermined window widths and slide sizes will be described below.
First, as shown in the upper figure of FIG. 6, the time-series data which is the learning data is divided into a predetermined window width (W = 10) and a slide size (S = 5).
Subsequently, the following three steps are executed for each partial data.
In step 1, the sensor information (sensor value) from the time t = 0 to the time t = W is input to the operation pattern generation unit 23 as shown in FIG. Then, the error L ^* t at each time is calculated. Here, * indicates the number of the divided partial data. The error calculation will be described later.
In step 2, the total error L ^* of the training data is calculated based on the error L ^* t at each time.
In step 3, the weight parameter of the operation pattern generation unit 23 is updated using the total error L ^* of each minute data.
The above steps 1 to 3 are repeatedly executed until the specified number of times or the target error is reached.

図５に戻り、機械学習装置３を構成する動作パターン選択部２２と動作パターン生成部２３の学習方法について説明する。図５の上図は時刻ｔ＝０における動作パターン選択部２２と動作パターン生成部２３の学習について、また、図５の下図は時刻ｔ＝１における動作パターン選択部２２と動作パターン生成部２３の学習について、窓幅（Ｗ＝３）及びスライドサイズ（Ｓ＝１）の場合について示している。各窓（上述の各部分データ）における最小時刻の画像から動作パターン選択部２２を、動作パターン選択部２２の選択結果（選択された動作パターン：Ｓ_ｐｔ）と３ステップ分のセンサ情報から動作パターン生成部２３を学習する。具体的には、図５の上図では、時刻ｔ＝０におけるロボットの視覚情報であるカメラにて撮像された画像ｉｍｇ_ｔ＝０とロボットアームの各関節角度ｘ_ｔ＝０～ｘ_ｔ＝２を入力し、次時刻の各関節角度の推定値ｘ’_ｔ＝１～ｘ’_ｔ＝３と真値ｘ_ｔ＝１～ｘ_ｔ＝３との誤差値Ｅを以下の式（１）により算出する。 Returning to FIG. 5, the learning method of the operation pattern selection unit 22 and the operation pattern generation unit 23 constituting the machine learning device 3 will be described. The upper figure of FIG. 5 shows the learning of the operation pattern selection unit 22 and the operation pattern generation unit 23 at time t = 0, and the lower figure of FIG. 5 shows the operation pattern selection unit 22 and the operation pattern generation unit 23 at time t = 1. Regarding learning, the case of window width (W = 3) and slide size (S = 1) is shown. The operation pattern selection unit 22 is obtained from the image of the minimum time in each window (each partial data described above), and the operation pattern is obtained from the selection result (selected operation pattern: _Spt ) of the operation pattern selection unit 22 and the sensor information for three steps. The generation unit 23 is learned. Specifically, in the upper figure of FIG. 5, the image img _{t = 0} captured by the camera, which is the visual information of the robot at time t = 0, and the joint angles x _{t = 0} to x _{t = 2} of the robot arm. Is input, and the error value E between the estimated value x't = 1 to x't _{= 3} of each joint angle at the next time and the true value x _t _{= 1} to x _{t = 3} is calculated by the following equation (1). do.

算出された誤差値Ｅに基づき、動作パターン選択部２２のニューラルネットワークの重みパラメータ（Ｗ_ｃ）、及び、動作パターン生成部２３のニューラルネットワークの重みパラメータ（Ｗ_ｉ，Ｗ_ｒ，Ｗ_ｏ）を更新する。これにより、動作パターン選択部２２には、センサ情報に合った動作パターンを画像特徴量として抽出し、動作パターン生成部２３はセンサ情報にあった動作パターンが学習される。 Based on the calculated error value E, the neural network weight parameter (W _c ) of the motion pattern selection unit 22 and the neural network weight parameter ( _Wi , _Wr , W _o ) of the motion pattern generation unit 23 are updated. do. As a result, the operation pattern selection unit 22 extracts the operation pattern that matches the sensor information as an image feature amount, and the operation pattern generation unit 23 learns the operation pattern that matches the sensor information.

図５の下図に示す時刻ｔ＝１における動作パターン選択部２２と動作パターン生成部２３の学習では、動作パターン選択部２２のニューラルネットワークの重みパラメータ（Ｗ_ｃ）、及び、動作パターン生成部２３のニューラルネットワークの重みパラメータ（Ｗ_ｉ，Ｗ_ｒ，Ｗ_ｏ）として、上述の図５の上図に示した更新後の重みパラメータが用いられ、上述の図５の上図と同様の処理を実行する。 In the learning of the operation pattern selection unit 22 and the operation pattern generation unit 23 at the time t = 1 shown in the lower figure of FIG. 5, the weight parameter (W _c ) of the neural network of the operation pattern selection unit 22 and the operation pattern generation unit 23 As the weight parameters ( _{Wi, Wr} _, _Wo ) of the neural network, the updated weight parameters shown in the above figure 5 above are used, and the same processing as the above figure 5 above is executed. ..

ここで、本実施例の自律学習型ロボット装置１による学習について詳細に説明する。図７は、図１に示す自律学習型ロボット装置の学習時の処理フローを示すフローチャートである。図７に示すように、ステップＳ１１では、機械学習装置３を構成する経由点抽出部２１は、センサ部１２にて計測されたセンサ時系列データＤ_ｔｒｊから経由点Ｄ_ｖｉａを抽出する。 Here, learning by the autonomous learning type robot device 1 of this embodiment will be described in detail. FIG. 7 is a flowchart showing a processing flow at the time of learning of the autonomous learning type robot device shown in FIG. As shown in FIG. 7, in step S11, the waypoint extraction unit 21 constituting the machine learning device 3 extracts the waypoint D _via from the sensor time series data D _trj measured by the sensor unit 12.

ステップＳ１２では、経由点抽出部２１は、機械学習装置３を構成する動作パターン選択部２２及び動作パターン生成部２３内のニューラルネットワークの初期化を実行する。
ステップＳ１３では、動作パターン生成部２３は、経由点抽出部２１より経由点Ｄ_{ｖｉａ，t}を入力し、出力値Ｄ’_{ｖｉａ，t+1}を計算する。 In step S12, the waypoint extraction unit 21 initializes the neural network in the operation pattern selection unit 22 and the operation pattern generation unit 23 constituting the machine learning device 3.
In step S13, the operation pattern generation unit 23 inputs the waypoints D _{via and t} from the waypoint extraction unit 21 and calculates the output values D' _{via and t + 1} .

ステップＳ１４では、経由点抽出部２１は、出力値Ｄ’_{ｖｉａ，t+1}と真値Ｄ_{ｖｉａ，t+1}との誤差値Ｅを上記式（１）にて計算する。
ステップＳ１５では、経由点抽出部２１は、計算された誤差値Ｅがあらかじめ設定された目標値以下である場合はステップＳ１６へ進み、学習を終了する。一方、計算された誤差値Ｅがあらかじめ設定された目標値を超える場合はステップＳ１７へ進む。 In step S14, the waypoint extraction unit 21 calculates the error value E between the output value D' _{via, t + 1} and the true value D _{via, t + 1} by the above equation (1).
In step S15, if the calculated error value E is equal to or less than the preset target value, the waypoint extraction unit 21 proceeds to step S16 and ends learning. On the other hand, if the calculated error value E exceeds the preset target value, the process proceeds to step S17.

ステップＳ１７では、経由点抽出部２１は、学習回数_ｔがあらかじめ設定された学習回数_ｍａｘ以上か否かを判定する。判定の結果、学習回数_ｔがあらかじめ設定された学習回数_ｍａｘ以上の場合はステップＳ１６へ進み、学習を終了する。一方、学習回数ｔがあらかじめ設定された学習回数_ｍａｘ未満の場合にはステップＳ１８へ進む。 In step S17, the waypoint extraction unit 21 determines whether or not the number of learning times _t is equal to or greater than the preset number of learning times _max . As a result of the determination, if the number of learnings _t is equal to or greater than the preset number of learnings _max , the process proceeds to step S16 and the learning is completed. On the other hand, if the number of learning times t is less than the preset number of learning times _max , the process proceeds to step S18.

ステップＳ１８では、経由点抽出部２１は、図５に示したニューラルネットワークの重みパラメータ（Ｗ_ｃ，Ｗ_ｉ，Ｗ_ｒ，Ｗ_ｏ）を更新し、ステップＳ１９へ進み学習回数を“１”インクリメントし（学習回数_ｔ＋１＝学習回数_ｔ＋１）、ステップＳ１３へ戻り以降のステップを繰り返し実行する。 In step S18, the waypoint extraction unit 21 updates the weight parameters (W _c , _{Wi, Wr} _, _Wo ) of the neural network shown in FIG. 5, proceeds to step S19, and increments the number of learnings by “1”. (Learning count _{t + 1} = Learning count _t + 1), the process returns to step S13, and the subsequent steps are repeatedly executed.

次に、本実施例の自律学習型ロボット装置１による動作時の詳細について説明する。図８は、図１に示す自律学習型ロボット装置の動作時の処理フローを示すフローチャートである。図８に示すようにステップＳ２１では、機械学習装置３を構成する状態判定部２４は、学習済みニューラルネットワークを読み込む。 Next, the details of the operation by the autonomous learning type robot device 1 of this embodiment will be described. FIG. 8 is a flowchart showing a processing flow during operation of the autonomous learning type robot device shown in FIG. As shown in FIG. 8, in step S21, the state determination unit 24 constituting the machine learning device 3 reads the trained neural network.

ステップＳ２２では、状態判定部２４は、センサ部１２よりロボット装置２のセンサ値Ｘ_ｔを取得する。
ステップＳ２３では、動作パターン選択部２２は、状態判定部２４から入力されるセンサ値Ｘ_ｔを用いて動作パターンを推定（選択）する。 In step S22, the state determination unit 24 acquires the sensor value _Xt of the robot device 2 from the sensor unit 12.
In step S23, the operation pattern selection unit 22 estimates (selects) the operation pattern using the sensor value _Xt input from the state determination unit 24.

ステップＳ２４では、動作パターン生成部２３は、データであるセンサ値Ｘ_ｔをニューラルネットワーク入力し、出力値Ｘ_{ｔａｒｇｅｔ}を計算する。
ステップＳ２５では、動作パターン生成部２３は、出力値Ｘ_{ｔａｒｇｅｔ}をロボット装置２の入力部（制御部１１）へ出力する。 In step S24, the operation pattern generation unit 23 inputs the sensor value X _t , which is data, into the neural network, and calculates the output value X _target .
In step S25, the operation pattern generation unit 23 outputs the output value X _target to the input unit (control unit 11) of the robot device 2.

ステップＳ２６では、状態判定部２４は、センサ部１２よりロボット装置２のセンサ値Ｘ_ｎｏｗを取得する。
ステップ２７では、状態判定部２４は、以下の式（２）に示す条件を満たすか否かを判定する。 In step S26, the state determination unit 24 acquires the sensor value X _now of the robot device 2 from the sensor unit 12.
In step 27, the state determination unit 24 determines whether or not the condition shown in the following equation (2) is satisfied.

判定の結果、条件を満たさない場合、すなわち、出力値Ｘ_{ｔａｒｇｅｔ}（動作パターン生成部２３がステップＳ２４にて生成した目標値）に、センサ値Ｘ_ｎｏｗが所定の範囲ε内に存在しない場合は、ステップＳ２６へ戻る。一方、出力値Ｘ_{ｔａｒｇｅｔ}（動作パターン生成部２３がステップＳ２４にて生成した目標値）に、センサ値Ｘ_ｎｏｗが所定の範囲ε内である場合にはステップＳ２８へ進む。 As a result of the determination, when the condition is not satisfied, that is, when the sensor value X _now does not exist within the predetermined range ε in the output value X _target (target value generated by the operation pattern generation unit 23 in step S24). Return to step S26. On the other hand, if the sensor value X _now is within the predetermined range ε in the output value X _target (target value generated by the operation pattern generation unit 23 in step S24), the process proceeds to step S28.

ステップＳ２８では、状態判定部２４は、ループカウントがあらかじめ設定した実行回数に達したか否かを判定する。判定の結果、ループカウントがあらかじめ設定した実行回数に達した場合はステップＳ２９へ進み動作を終了する。一方、ループカウントがあらかじめ設定した実行回数に達していない場合には、ステップＳ３０にてループカウントを更新しステップＳ２２へ戻り、以降のステップを繰り返し実行する。 In step S28, the state determination unit 24 determines whether or not the loop count has reached a preset number of executions. As a result of the determination, when the loop count reaches the preset number of executions, the process proceeds to step S29 and the operation is terminated. On the other hand, if the loop count has not reached the preset number of executions, the loop count is updated in step S30, the process returns to step S22, and the subsequent steps are repeatedly executed.

図９は、図１に示す自律学習型ロボット装置の動作時のデータの流れを示す図である。図９において、上図は時刻ｔの画像と関節角度情報を入力し、時刻ｔ＋１の関節角度を推定する状態でのデータの流れを示している。また、図９において、下図は目標位置に到達するまで、関節角度情報を逐次推定する状態でのデータの流れを示している。 FIG. 9 is a diagram showing a data flow during operation of the autonomous learning type robot device shown in FIG. 1. In FIG. 9, the above figure shows the flow of data in a state where an image at time t and joint angle information are input and the joint angle at time t + 1 is estimated. Further, in FIG. 9, the figure below shows the flow of data in a state where joint angle information is sequentially estimated until the target position is reached.

図９の上図に示すように、教示動作を学習した動作パターン選択部２２は、センサ情報であるカメラにて撮像された画像_ｔに基づき初期動作パターンを選択し、選択された動作パターンを動作パターン生成部２３へ出力する。動作パターン生成部２３は、動作パターン選択部２２から入力される選択された動作パターン及び関節角度ｘ_ｔに基づき逐次動作を生成し、動作指令値としてロボット装置２の制御部１１へ出力することで、環境変化に基づいた動作生成が実現できる。
図９の下図に示すように、状態判定部２４が、上述の式（２）に示す条件を満たすまで状態の判定を行い、動作パターン生成部２３が状態判定部２４による判定結果に基づいて逐次動作生成を行うことで、動作パターン生成部２３の動作タイミングを調整する。 As shown in the upper figure of FIG. 9, the motion pattern selection unit 22 that has learned the teaching motion selects an initial motion pattern based on the image _t captured by the camera, which is sensor information, and operates the selected motion pattern. Output to the pattern generation unit 23. The motion pattern generation unit 23 generates sequential motions based on the selected motion pattern and joint angle _xt input from the motion pattern selection section 22, and outputs the motion command values to the control section 11 of the robot device 2. , Motion generation based on environmental changes can be realized.
As shown in the lower figure of FIG. 9, the state determination unit 24 determines the state until the condition shown in the above equation (2) is satisfied, and the operation pattern generation unit 23 sequentially based on the determination result by the state determination unit 24. By generating the motion, the motion timing of the motion pattern generation unit 23 is adjusted.

以上、教示動作を所定の時間幅で切り出し分割学習させることで、多様な動作パターンの獲得が可能である。またセンサ情報に基づいて逐次動作パターンの選択と動作パターンを生成することで、環境変化に対し逐次適切な動作生成が可能な自律学習型ロボット装置１を実現し得る。すなわち、本構成を用いて自律学習型ロボット装置１に静止物体の把持動作を学習させることで、移動物体の把持動作生成が可能となる。 As described above, it is possible to acquire various motion patterns by cutting out the teaching motion in a predetermined time width and performing split learning. Further, by selecting a sequential operation pattern and generating an operation pattern based on the sensor information, it is possible to realize an autonomous learning type robot device 1 capable of sequentially generating an appropriate operation in response to environmental changes. That is, by making the autonomous learning type robot device 1 learn the gripping motion of a stationary object by using this configuration, it is possible to generate the gripping motion of a moving object.

本実施例では、ロボット装置２のロボットアームの関節角度の抽出及び動作パターンの学習を一例として説明したが、これに代えて、ロボット装置２のロボットアームの手先位置や各関節トルクとしても良い。 In this embodiment, extraction of the joint angle of the robot arm of the robot device 2 and learning of the motion pattern have been described as an example, but instead of this, the hand position of the robot arm of the robot device 2 and the joint torque may be used.

また、本実施例では、最小時刻のカメラにて撮像された画像から動作パターンの選択を行っていたが、窓幅分の画像全てを用いて動作パターン選択部２２の学習と選択を行う構成としても良い。 Further, in this embodiment, the operation pattern is selected from the images captured by the camera at the minimum time, but the operation pattern selection unit 22 is learned and selected using all the images corresponding to the window width. Is also good.

本実施例では、ロボット装置２のロボットアームの各関節角度を用いて経由点の抽出を行っていたが、ロボットアームの関節角度情報のように、各センサ情報に依存関係がある場合、ロボットアームのセンサ情報をロボットアームの手先位置に変換した後に、運動最小化モデル（トルク変化最小モデル、筋張力変化最小モデル、運動指令最小モデルなど）を用いて経由点の抽出を行う構成としても良い。 In this embodiment, the waypoints are extracted using each joint angle of the robot arm of the robot device 2. However, when there is a dependency relationship between each sensor information such as the joint angle information of the robot arm, the robot arm After converting the sensor information of the robot arm into the hand position of the robot arm, the waypoints may be extracted using a motion minimization model (torque change minimum model, muscle tension change minimum model, motion command minimum model, etc.).

本実施例では、動作パターン選択部２２及び動作パターン生成部２３を一括で学習することにより、画像特徴量として動作パターン選択部２２の選択結果を抽出する構成を一例として説明した。すなわち、上述の図５に示した誤差計算の結果（誤差値Ｅ）を動作パターン選択部２２及び動作パターン生成部２３へフィードバックすることにより、全ての重みパラメータ（Ｗ_ｃ，Ｗ_ｉ，Ｗ_ｒ，Ｗ_ｏ）を更新する構成について説明した。これに代えて、動作パターン選択部２２及び動作パターン生成部２３を分割で学習させ、動作パターン選択部２２の選択結果には、物体の名称や位置などを出力するよう構成しても良い。この場合、誤差計算の結果（誤差値Ｅ）に基づき、それぞれ、動作パターン選択部２２及び動作パターン生成部２３の重みパラメータを更新する構成となる。 In this embodiment, a configuration in which the selection result of the operation pattern selection unit 22 is extracted as an image feature amount by learning the operation pattern selection unit 22 and the operation pattern generation unit 23 collectively has been described as an example. That is, by feeding back the result of the error calculation (error value E) shown in FIG. 5 to the operation pattern selection unit 22 and the operation pattern generation unit 23, all the weight parameters (W _c , _Wi , W _r , The configuration for updating _Wo ) has been described. Instead of this, the operation pattern selection unit 22 and the operation pattern generation unit 23 may be trained by division, and the selection result of the operation pattern selection unit 22 may be configured to output the name and position of the object. In this case, the weight parameters of the operation pattern selection unit 22 and the operation pattern generation unit 23 are updated based on the result of the error calculation (error value E), respectively.

更に、本実施例では、ロボット装置２内にセンサ部１２を備える構成について説明したがこれに限られるものではない。例えば、監視カメラやモーションキャプチャのような外界センサをロボット装置２のセンサ部１２としても良い。 Further, in the present embodiment, the configuration in which the sensor unit 12 is provided in the robot device 2 has been described, but the present invention is not limited to this. For example, an external sensor such as a surveillance camera or motion capture may be used as the sensor unit 12 of the robot device 2.

以上の通り本実施例によれば、ロボットの状態や環境変化に対しロバストであって、異なる種別の動作パターンの実行が可能な自律学習型ロボット装置及び自律学習型ロボット装置の動作生成方法を提供することが可能となる。
また、本実施例によれば、経由点抽出部２１にて学習データを抽出し、学習データ間のバラつきを抑えることで、学習性能と学習効率の向上が可能になる。 As described above, according to the present embodiment, there is provided a motion generation method of an autonomous learning type robot device and an autonomous learning type robot device that are robust to changes in the robot state and environment and can execute different types of motion patterns. It becomes possible to do.
Further, according to the present embodiment, the learning performance and the learning efficiency can be improved by extracting the learning data by the waypoint extraction unit 21 and suppressing the variation between the learning data.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。 The present invention is not limited to the above-described embodiment, and includes various modifications. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the described configurations.

１…自律学習型ロボット装置
２…ロボット装置
３…機械学習装置
１１…制御部
１２…センサ部
２１…経由点抽出部
２２…動作パターン選択部
２３…動作パターン生成部
２４…状態判定部 1 ... Autonomous learning type robot device 2 ... Robot device 3 ... Machine learning device 11 ... Control unit 12 ... Sensor unit 21 ... Waypoint extraction unit 22 ... Operation pattern selection unit 23 ... Operation pattern generation unit 24 ... State determination unit

Claims

An autonomous learning type robot device including at least a robot device having a control unit and a machine learning device electrically or communicably connected to the robot device.
The machine learning device is
A waypoint extraction unit that extracts the operation waypoints of the robot device from the sensor information including the state and environment information of the robot device measured by the sensor unit.
An operation pattern selection unit that learns an operation pattern for each waypoint extracted by the waypoint extraction unit at predetermined time widths and selects an operation pattern based on the sensor information.
The robot operation pattern is learned for each waypoint extracted by the waypoint extraction unit at predetermined time widths, and an operation pattern is generated based on the sensor information and the operation pattern selected by the operation pattern selection unit. , An operation pattern generation unit that outputs an operation command to the control unit of the robot device,
A state determination unit that compares the operation pattern generated by the operation pattern generation unit with the sensor information and determines the timing of outputting the operation pattern to the control unit of the robot device.
An autonomous learning type robot device characterized by being equipped with.

In the autonomous learning type robot device according to claim 1,
The state determination unit is an autonomous learning type robot device characterized in that the target value generated by the operation pattern generation unit is compared with the sensor information and the timing is determined based on the comparison result.

In the autonomous learning type robot device according to claim 2.
The machine learning device is characterized in that at least an error value of an operation pattern generated by the operation pattern generation unit at the time of learning is obtained, and learning is terminated when the obtained error value is equal to or less than a preset target value. Autonomous learning type robot device.

In the autonomous learning type robot device according to claim 2.
When the comparison result by the state determination unit shows that the difference between the target value generated by the operation pattern generation unit and the sensor information is within a predetermined range, the operation pattern generation unit uses the generated operation pattern of the robot device. An autonomous learning type robot device characterized by outputting as an operation command to a control unit.

In the autonomous learning type robot device according to claim 3.
When the comparison result by the state determination unit shows that the difference between the target value generated by the operation pattern generation unit and the sensor information is within a predetermined range, the operation pattern generation unit uses the generated operation pattern of the robot device. An autonomous learning type robot device characterized by outputting as an operation command to a control unit.

In the autonomous learning type robot device according to claim 5.
The operation pattern selection unit and the operation pattern generation unit each have a neural network, and by feeding back the obtained error value, the weight parameters of the neural networks of the operation pattern selection unit and the operation pattern generation unit are collectively updated. An autonomous learning type robot device characterized by doing.

It is a motion generation method of an autonomous learning type robot device including at least a robot device having a control unit and a machine learning device electrically or communicably connected to the robot device.
From the sensor information including the state and environment information of the robot device measured by the sensor unit, the operation waypoints of the robot device are extracted by the waypoint extraction unit.
An operation pattern is learned for each of the extracted transit points at predetermined time widths, and an operation pattern is selected by the operation pattern selection unit based on the sensor information.
The motion pattern generation unit learns the motion pattern of the robot for each of the extracted transit points at predetermined time widths, and generates an motion pattern based on the sensor information and the motion pattern selected by the motion pattern selection unit. Then, it is output as an operation command to the control unit of the robot device.
The autonomous learning robot is characterized in that the state determination unit compares the operation pattern generated by the operation pattern generation unit with the sensor information and determines the timing of outputting the operation pattern to the control unit of the robot device. How to generate the operation of the device.

In the motion generation method of the autonomous learning type robot device according to claim 7.
The state determination unit is an operation generation method of an autonomous learning type robot device, characterized in that a target value generated by the operation pattern generation unit is compared with the sensor information, and the timing is determined based on the comparison result.

In the motion generation method of the autonomous learning type robot device according to claim 8.
An autonomous learning robot characterized in that at least an error value of an operation pattern generated by the operation pattern generation unit at the time of learning is obtained, and learning is terminated when the obtained error value is equal to or less than a preset target value. How to generate the operation of the device.

In the motion generation method of the autonomous learning type robot device according to claim 9.
When the comparison result by the state determination unit shows that the difference between the target value generated by the operation pattern generation unit and the sensor information is within a predetermined range, the operation pattern generation unit uses the generated operation pattern of the robot device. An operation generation method for an autonomous learning robot device, which is characterized by outputting an operation command to a control unit.