JP2021030407A

JP2021030407A - Information processing device, information processing method and program

Info

Publication number: JP2021030407A
Application number: JP2019156798A
Authority: JP
Inventors: 山田　大輔; Daisuke Yamada; 大輔山田; 小林　一彦; Kazuhiko Kobayashi; 一彦小林
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2021-03-01

Abstract

To cause a robot to execute work even though an instruction from a user is insufficient.SOLUTION: According to the present invention, an information processing device causes a robot to execute a work group composed of a plurality of work pieces. The information processing device includes input means for inputting instruction information indicating at least one portion of the work group, and specification means for specifying a work group including work indicated by the indication information in the work group of a plurality of patterns as a candidate for a work group to be executed by the robot.SELECTED DRAWING: Figure 1

Description

ロボットへの作業指示に関する。 Regarding work instructions to the robot.

従来、音声認識機能を有するロボット装置等の作業指示は、予め特定の１つの指令語に対して特定の１つの作業を対応させてプログラムしておくことにより、指示者が特定の指令語を発話することでロボットが特定の作業を実行していた。一方で、特許文献１では、１つの作業指示に対して複数の作業の組み合わせを実行する。 Conventionally, a work instruction of a robot device or the like having a voice recognition function is programmed in advance by associating a specific work with a specific command word, so that the instructor utters a specific command word. By doing so, the robot was performing a specific task. On the other hand, in Patent Document 1, a combination of a plurality of operations is executed for one work instruction.

特開２００２−３３７０７５号公報JP-A-2002-337075

しかしながら、特許文献１では、ロボットに対して複数の作業を行わせる際、ロボットはユーザからの１つの指示に対して既に教示した作業の組み合わせしか実行できない。例えば、複数の部品を組み合わせてつくる生成物を組み立てるロボットにおいて、どの生成物をつくるのかについて指示や情報と作業の組み合わせである作業群を予め関連付けて教示していない場合がある。その場合、一連の作業群をロボットが特定するためのユーザの指示がロボットにとって不足すると、一連の作業群のうち必要な作業の一部をロボットが実行できない可能性がある。そのため、ユーザからの指示がロボットにとって不足していると、ロボットは一連の作業群を特定できず、ユーザが指示によって意図していた一連の作業群をロボットが実行できなかった。 However, in Patent Document 1, when the robot is made to perform a plurality of operations, the robot can execute only a combination of the operations already taught in response to one instruction from the user. For example, in a robot that assembles a product made by combining a plurality of parts, there is a case where an instruction or a work group that is a combination of information and work is not taught in advance as to which product is to be made. In that case, if the robot lacks instructions for the robot to identify a series of work groups, the robot may not be able to perform some of the necessary tasks in the series of work groups. Therefore, if the instructions from the user are insufficient for the robot, the robot cannot specify a series of work groups, and the robot cannot execute the series of work groups intended by the user's instructions.

そこで本発明では、ユーザからの指示に不足があってもロボットに作業群を実行させることを目的とする。 Therefore, an object of the present invention is to make the robot execute a work group even if there is a lack of instructions from the user.

上記課題を解決する本発明にかかる情報処理装置は、複数の作業から成る作業群をロボットに実行させる情報処理装置であって、前記作業群の少なくとも一部の作業を示す指示情報を入力する入力手段と、複数パターンの前記作業群のうち、前記指示情報が示す作業が含まれる作業群を、前記ロボットに実行させる作業群の候補として特定する特定手段と、を有することを特徴とする。 The information processing device according to the present invention that solves the above problems is an information processing device that causes a robot to execute a work group composed of a plurality of tasks, and is an input for inputting instruction information indicating at least a part of the tasks in the work group. It is characterized by having means and specific means for specifying a work group including the work indicated by the instruction information as a candidate of the work group to be executed by the robot among the work groups of a plurality of patterns.

本発明によれば、ユーザからの指示に不足があってもロボットに作業群を実行させることができる。 According to the present invention, the robot can execute the work group even if the instruction from the user is insufficient.

情報処理システムの構成例を示す図Diagram showing a configuration example of an information processing system 情報処理装置の機能構成例を示すブロック図Block diagram showing a functional configuration example of an information processing device 情報処理システムが実行する処理を説明するフローチャートFlow chart explaining the processing executed by the information processing system 物体の状態遷移を説明する図Diagram explaining the state transition of an object 物体の状態の一例を示す図Diagram showing an example of the state of an object 画像情報の一例を示す図Diagram showing an example of image information 目標状態の一例を示す図Diagram showing an example of the target state 情報処理システムの座標系を示す図Diagram showing the coordinate system of an information processing system 情報処理装置の機能構成例を示すブロック図Block diagram showing a functional configuration example of an information processing device 情報処理システムが実行する処理を説明するフローチャートFlow chart explaining the processing executed by the information processing system ＧＵＩの一例を示す図Diagram showing an example of GUI 情報処理装置の機能構成例を示すブロック図Block diagram showing a functional configuration example of an information processing device 情報処理システムが実行する処理を説明するフローチャートFlow chart explaining the processing executed by the information processing system 情報処理装置の機能構成例を示すブロック図Block diagram showing a functional configuration example of an information processing device 情報処理装置のハードウェア構成例を示す図The figure which shows the hardware configuration example of an information processing apparatus 情報処理装置が実行する処理を説明するフローチャートFlow chart explaining the processing executed by the information processing device 作業群特定部が実行する処理を説明するフローチャートFlowchart explaining the process executed by the work group identification unit 作業群特定部が実行する処理を説明するフローチャートFlowchart explaining the process executed by the work group identification unit

以下、添付の図面を参照して、本発明の好適な実施形態を説明する。なお、以下の実施形態において示す構成は一例に過ぎず、本発明は図示された構成に限定されるものではない。 Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings. The configuration shown in the following embodiments is only an example, and the present invention is not limited to the illustrated configuration.

＜第一の実施形態＞
第一の実施形態では、弁当の食品詰めをロボットにより行う工程において、指示者の指示する指示に不足がある場合に、情報処理装置がロボットに実行させる作業群を特定することで、ロボットがユーザに指示された作業を実行する例について述べる。例えば、工場ではベテランの従事者もいればバイト等で一時的に雇われた人もいる。後者の場合は、ロボットに指示を与える方法を十分に習得しているとは限らない。例えば、作るべき製品の名前や型番を覚えられていない場合、次に作るべき製品の名前や型番を使ってロボットに指示することができないことがある。また、多品目を少数生産する工場では、生産品目の変更が一日のうちに何度も行われる。そのような現場において、製品の型番とロボットの作業内容を予め関連付けるなど、ロボットに作業の変更を逐一指示しておくことはロボットの数や生産品目の数が多ければ多いほど手間がかかる。また、工場従事者は作業中に手元が空いていない時間が長いため、タブレットといった通信端末でロボットに指示を行うより、音声による指示の方が手間は少ない。とはいえ、工場内の騒音によって、目的の装置に完全な音声指示を入力できない場合もある。すなわち、工場従事者が協働するロボットに音声による指示を行う際、ユーザによって常に適切な指示を入力されるとは限らない。このような状況下において有効な実施形態を以下で説明する。 <First Embodiment>
In the first embodiment, in the process of packing food for a lunch box by a robot, when the instruction given by the instructor is insufficient, the robot specifies a work group to be executed by the robot by the information processing apparatus, so that the robot can use the user. An example of performing the work instructed in is described. For example, some factories are veteran workers and others are temporarily hired for part-time jobs. In the latter case, the robot is not always well-versed in how to give instructions. For example, if you do not remember the name or model number of the product to be made, you may not be able to tell the robot using the name or model number of the product to be made next. Moreover, in a factory that produces a large number of items in a small number, the production items are changed many times in a day. In such a field, it takes more time and effort to instruct the robot to change the work one by one, such as associating the model number of the product with the work content of the robot in advance, as the number of robots and the number of production items increases. In addition, since factory workers spend a lot of time not having their hands free during work, it takes less time and effort to give instructions by voice than to give instructions to a robot using a communication terminal such as a tablet. However, noise in the factory may prevent you from entering complete voice instructions into the device of interest. That is, when a factory worker gives a voice instruction to a collaborating robot, the user does not always input an appropriate instruction. An embodiment effective under such a situation will be described below.

例えば、第一の物体の上に第二の物体を配置し、その第二の物体の上に第三の物体を配置する作業を行っているロボットに対して、指示者が、第三の物体の代わりに、第四の物体を配置する作業をさせたいとする。このとき、指示者が発話する指令語は、単に第四の物体へ変更する指示だけでなく、第一の物体の上に第二の物体を配置し、その第二の物体の上に第四の物体を配置する指示が必要である。そのため、指示者が内容を省略して指示したり、指示の順番が不正確だったりすると、ロボットに指示者の意図する作業群を実行させることができない。つまり、ロボットには変更点の情報しか与えられないため、不足している情報をロボット自身に対して補完しなければならない。そこで、本実施形態では、ユーザによる部分的な指示だけでも、所望の作業群をロボットに実行させる装置を説明する。 For example, the instructor gives a third object to a robot that is performing the work of arranging a second object on the first object and arranging the third object on the second object. Instead of, let's do the work of arranging a fourth object. At this time, the command word spoken by the instructor is not only an instruction to change to the fourth object, but also a second object is placed on the first object and the fourth object is placed on the second object. You need instructions to place the object in. Therefore, if the instructor omits the contents or gives an inaccurate order of instructions, the robot cannot be made to execute the work group intended by the instructor. In other words, since only the information on the changes is given to the robot, the missing information must be supplemented to the robot itself. Therefore, in the present embodiment, a device for causing the robot to execute a desired work group even with only a partial instruction by the user will be described.

図１は、本実施形態における情報処理装置１０を備える情報処理システム１００の構成例である。音声入力装置１は、音を電気信号に変換する機器であり、具体的にはマイクロフォン等である。音声入力装置１は、指示者が発した音声を音声情報として取得する。音声情報とは、指示者からの動作指示である。具体的には、生産対象が任意の弁当から梅干し弁当に変わるとしたら、「次の弁当は梅干しだから、セットして」といったような指示である。音声入力装置１から出力される音声情報は、情報処理装置１０の入力として取り込まれる。音声入力装置１は、環境に対して配置を固定してもよいし、ロボットなどの移動する機構部分に搭載しても構わない。また、音声入力装置１は複数配置してもよい。音声入力装置１を複数配置した場合は、位相差を利用することで特定の方向の音声のみ取得してもよい。または、指向性マイクロフォンを用いることで、特定の方向の音声のみ取得してもよい。 FIG. 1 is a configuration example of an information processing system 100 including the information processing device 10 according to the present embodiment. The voice input device 1 is a device that converts sound into an electric signal, and specifically, a microphone or the like. The voice input device 1 acquires the voice emitted by the instructor as voice information. The voice information is an operation instruction from the instructor. Specifically, if the production target changes from an arbitrary bento to a pickled plum lunch, it is an instruction such as "Because the next bento is pickled plum, set it." The voice information output from the voice input device 1 is taken in as an input of the information processing device 10. The voice input device 1 may be arranged in a fixed manner with respect to the environment, or may be mounted on a moving mechanism portion such as a robot. Further, a plurality of voice input devices 1 may be arranged. When a plurality of voice input devices 1 are arranged, only the voice in a specific direction may be acquired by using the phase difference. Alternatively, by using a directional microphone, only the sound in a specific direction may be acquired.

撮像装置２は、レンズを有する電子撮像素子を用いたデジタルカメラなどの視覚センサであり、対象物体５の状態を画像情報として取得する。画像情報とは、具体的には、対象物体５が供給された初期状態を撮像した画像情報である。画像情報として二次元のカラー画像情報や、ステレオカメラであれば距離画像情報を取得する。撮像装置１から出力される画像情報は情報処理装置１０の入力として取り込まれる。光源６は対象物体の模様やエッジを顕在化する照明を照射する。また、光源６がプロジェクタの場合には、パターン光を照射することで距離画像を取得できる。撮像装置２および光源６は、撮像対象に対して配置を固定するほか、ロボットなどの移動する機構部分に搭載しても構わない。また、撮像装置２は複数配置してもよい。 The image pickup device 2 is a visual sensor such as a digital camera using an electronic image pickup device having a lens, and acquires the state of the target object 5 as image information. Specifically, the image information is image information obtained by capturing an image of the initial state in which the target object 5 is supplied. Two-dimensional color image information is acquired as image information, and distance image information is acquired in the case of a stereo camera. The image information output from the image pickup apparatus 1 is taken in as an input of the information processing apparatus 10. The light source 6 irradiates an illumination that reveals the pattern or edge of the target object. Further, when the light source 6 is a projector, a distance image can be acquired by irradiating the pattern light. The image pickup device 2 and the light source 6 may be mounted on a moving mechanism portion such as a robot, in addition to fixing the arrangement with respect to the image pickup target. Further, a plurality of image pickup devices 2 may be arranged.

対象物体５は、主にロボットの作業空間に存在する材料を指す。例えば、製造装置からベルトコンベア等で次々に排出されたものや、ベルトコンベアの近くに配置されたもので、１つあるいは複数同時に撮像装置２の撮影範囲に入っているものを対象とする。また、対象物体５は、１つの物体全体でなくても、物体の一部の領域を対象としてもよい。本実施形態では、対象物体が撮像装置２の範囲に移動する場合を示すが、ベルトコンベアを使わずに、配置されている複数の対象物体に対して作業を行う場合には、本システムを可動することで対応できる。 The target object 5 mainly refers to a material existing in the work space of the robot. For example, those discharged from the manufacturing apparatus one after another by a belt conveyor or the like, or those arranged near the belt conveyor and simultaneously within the imaging range of the imaging apparatus 2 are targeted. Further, the target object 5 may be a part of an area of the object, not the entire object. In the present embodiment, the case where the target object moves within the range of the image pickup apparatus 2 is shown, but when the work is performed on a plurality of arranged target objects without using the belt conveyor, the system can be moved. It can be dealt with by doing.

ロボット２０は、例えば多関節ロボットであり、ロボットアーム等のマニピュレータ２０２や、ロボットハンド等の把持装置２０３、マニピュレータ２０２と把持装置２０３を制御するコントローラ２０１を備える。また、ロボット１０は、マニピュレータ２０２の各関節の角度を変更することで把持装置２０３の位置姿勢を変更可能な位置姿勢変更機構を備える。位置姿勢変更機構は、電動モータによって駆動されてもよいし、油圧や空気圧等の流体圧で作動するアクチュエータによって駆動されてもよい。この位置姿勢変更機構は、情報処理装置１０から出力される制御情報に従って駆動される。また、ロボット２０は、多関節ロボットに限定されるものではなく、数値制御（ＮｕｍｅｒｉｃａｌＣｏｎｔｒｏｌ：ＮＣ）可能な可動式の機械であってもよい。コントローラ２０１は、生成部１０７から送られた位置姿勢制御情報に基づき、マニピュレータ２０２および把持装置２０３を制御する。コントローラ２０１は、マニピュレータ２０２に付随するロボットコントローラでもよいし、プログラマブルロジックコントローラ（ＰＬＣ）でもよいし、これら以外でもマニピュレータ２０２と把持装置２０３を制御できる装置であればよい。 The robot 20 is, for example, an articulated robot, and includes a manipulator 202 such as a robot arm, a gripping device 203 such as a robot hand, and a controller 201 that controls the manipulator 202 and the gripping device 203. Further, the robot 10 is provided with a position / posture changing mechanism capable of changing the position / posture of the gripping device 203 by changing the angle of each joint of the manipulator 202. The position / orientation changing mechanism may be driven by an electric motor or an actuator that operates by a fluid pressure such as hydraulic pressure or pneumatic pressure. This position / orientation changing mechanism is driven according to the control information output from the information processing device 10. Further, the robot 20 is not limited to the articulated robot, and may be a movable machine capable of numerical control (NC). The controller 201 controls the manipulator 202 and the gripping device 203 based on the position / attitude control information sent from the generation unit 107. The controller 201 may be a robot controller attached to the manipulator 202, a programmable logic controller (PLC), or any other device that can control the manipulator 202 and the gripping device 203.

ロボット２０は、複数パターンの連続して実行する作業群を実行する。具体的には、材料である対象物体５を認識するため画像を撮像する位置姿勢に移動する作業や、情報処理装置１０が提示した位置姿勢制御情報に基づき対象物体５を操作するための作業を実行する。 The robot 20 executes a work group that is continuously executed in a plurality of patterns. Specifically, the work of moving to the position / posture for capturing an image in order to recognize the target object 5 which is a material, and the work of operating the target object 5 based on the position / posture control information presented by the information processing apparatus 10. Execute.

把持装置２０３は、ロボット２０による対象物体の種類に応じた操作を実現するためのツールであり、モータ駆動可能なチャック機構を有し対象物体を把持可能なハンドや、空気圧で対象物体を吸着する吸着パッドを用いたハンドを用いることができる。なお、把持装置２０３は、マニピュレータ２０２に対して着脱可能に取り付けられており、対象物体の種類に応じて交換可能である。また、把持装置２０３は、必ずしも必要ではなく、マニピュレータ２０２により対象物体を操作できるのであればなくてもよい。例えば、対象物体を押し出す作業の場合、ロボット２０は１軸シリンダにより構成され、把持装置２０３を含まなくてもよい。 The gripping device 203 is a tool for realizing an operation according to the type of the target object by the robot 20, and is a hand that has a chuck mechanism that can be driven by a motor and can grip the target object, or sucks the target object by air pressure. A hand using a suction pad can be used. The gripping device 203 is detachably attached to the manipulator 202 and can be replaced according to the type of the target object. Further, the gripping device 203 is not always necessary, and may not be necessary as long as the target object can be operated by the manipulator 202. For example, in the case of the work of pushing out the target object, the robot 20 is composed of a uniaxial cylinder and does not have to include the gripping device 203.

図２は、情報処理システム１００および情報処理装置１０の機能ブロック図である。情報処理装置１０は、指示入力部１０１と、視覚情報取得部１０２と、作業記憶部１０３と、推定部１０４と、状態記憶部１０５と、作業群特定部１０６と、生成部１０７を備える。 FIG. 2 is a functional block diagram of the information processing system 100 and the information processing device 10. The information processing device 10 includes an instruction input unit 101, a visual information acquisition unit 102, a work storage unit 103, an estimation unit 104, a state storage unit 105, a work group identification unit 106, and a generation unit 107.

指示入力部１０１は、ユーザからの指示の入力を指示情報として受け付ける。 The instruction input unit 101 accepts the input of an instruction from the user as instruction information.

視覚情報取得部１０２は、撮像装置２から出力される視覚情報を取得する。視覚情報取得部１０２は、取得した視覚情報を画像情報に変換し、推定部１０４に出力する。視覚情報から画像情報に変換する方法は後述する。なお、画像情報には撮像画像とその撮像画像に含まれる物体の認識結果を含む。視覚情報取得部１０２は、例えばキャプチャボードやメモリ（ＲＡＭ）で構成される。 The visual information acquisition unit 102 acquires the visual information output from the image pickup apparatus 2. The visual information acquisition unit 102 converts the acquired visual information into image information and outputs it to the estimation unit 104. The method of converting visual information into image information will be described later. The image information includes the captured image and the recognition result of the object included in the captured image. The visual information acquisition unit 102 is composed of, for example, a capture board or a memory (RAM).

作業記憶部１０３は、ロボットに連続して実行させる作業群を複数パターン記憶する。例えば対象物体５を把持、搬送、配置するための情報処理システム１００の一連の作業のことである。ロボットが対象物体５を操作するための作業とその順序を示す作業群のことを作業パターンとも記載する。 The work storage unit 103 stores a plurality of patterns of work groups to be continuously executed by the robot. For example, it is a series of operations of the information processing system 100 for grasping, transporting, and arranging the target object 5. The work for the robot to operate the target object 5 and the work group indicating the order thereof are also described as a work pattern.

推定部１０４は、作業記憶部１０３に記憶されている作業群と入力された指示情報から、作業群の少なくとも一部の作業を推定する。推定部１０４は、推定された作業についての作業情報を作業群特定部１０６に送る。また、推定部１０４は、状態記憶部１０５に記憶された状態遷移情報に基づいて、入力された指示情報または画像情報から対象物体の状態を推定する。詳細な説明は後述する。 The estimation unit 104 estimates at least a part of the work of the work group from the work group stored in the work storage unit 103 and the instruction information input. The estimation unit 104 sends work information about the estimated work to the work group identification unit 106. Further, the estimation unit 104 estimates the state of the target object from the input instruction information or image information based on the state transition information stored in the state storage unit 105. A detailed description will be described later.

状態記憶部１０５は、音声入力装置１で取得した音声情報を変換したテキスト情報と、対象物体５の状態と操作に関する情報とを記憶し、作業群特定部１０６に入力する。ここで、対象物体５の状態に関する情報とは、対象物体５が供給される現在の状態（初期状態）と、対象物体５への作業が完了した後の目標とする状態（目標状態）と、現在の状態から目標の状態へ遷移する過程の途中の状態（途中状態）のことである。また、対象物体５の操作に関する情報とは、対象物体５を初期状態から目標状態へ遷移させる過程において、対象物体５に対して行う情報処理システム１００による操作のことである。操作とは、ロボット２０による把持、搬送、配置などの、対象物体５の位置姿勢を変化させる対象物体５への行動や、対象物体５を検査するための撮像装置２や把持装置２０３の位置姿勢を変化させる行動のことである。また、状態記憶部１０５は、対象物体５の状態遷移情報を記憶する。状態遷移情報とは、材料である対象物体の初期状態、材料を使って構成された途中状態または目標状態の情報と、それぞれの状態から次の状態への遷移可否の情報である。初期状態、途中状態、目標状態の情報は、画像情報とテキスト情報の組み合わせで記憶する。複数の対象物体５が含まれる状態の場合、画像情報とテキスト情報のいずれか一つ以上の情報に基づき、対象物体５同士の相対的な位置関係を知ることができる。また、作業記憶部１０３は、物体の操作に関する情報を記憶していてもよい。例えば、特定の対象物体５を情報処理システム１００により操作するとき、対象物体５に対する把持位置の情報や、把持するときの把持力の情報を記憶していてもよい。また、作業記憶部１０３は、状態間の遷移方法に関する情報を記憶していてもよい。例えば、対象物体５をある状態から別の状態へ遷移させる際に、情報処理システム１００で搬送する場合、搬送速度の制限や、搬送経路の軌跡の情報を記憶していてもよい。また、作業記憶部１０３は、対象物体５を、特定の１つの物体ごとの情報として記憶してもよいし、物体のクラスごとの情報として記憶してもよい。ただし、前者の場合は、情報処理システム１００の処理実行時に、事前に記憶している物体のみを対象とすることを前提する。記憶していない物体を対象としないのに対して、後者の場合は、事前に記憶していない物体に対しても、いずれかのクラスに分類することもできる。例えば、対象物体５として「林檎」「蜜柑」「桃」がある場合「林檎」「蜜柑」「桃」それぞれについて状態遷移情報を記憶してもよいし、それら３つをまとめた「果物」というクラスで記憶してもよい。 The state storage unit 105 stores the text information obtained by converting the voice information acquired by the voice input device 1 and the information regarding the state and operation of the target object 5, and inputs the information to the work group identification unit 106. Here, the information regarding the state of the target object 5 includes the current state (initial state) in which the target object 5 is supplied, the target state (target state) after the work on the target object 5 is completed, and the target state (target state). It is a state in the middle of the process of transitioning from the current state to the target state (intermediate state). Further, the information regarding the operation of the target object 5 is an operation performed by the information processing system 100 on the target object 5 in the process of transitioning the target object 5 from the initial state to the target state. The operation includes actions on the target object 5 that change the position and orientation of the target object 5, such as gripping, transporting, and arranging by the robot 20, and the position and orientation of the imaging device 2 and the gripping device 203 for inspecting the target object 5. It is an action that changes. Further, the state storage unit 105 stores the state transition information of the target object 5. The state transition information is information on the initial state of the target object which is a material, information on an intermediate state or a target state constructed by using the material, and information on whether or not each state can be transitioned to the next state. Information on the initial state, intermediate state, and target state is stored as a combination of image information and text information. In the case where a plurality of target objects 5 are included, the relative positional relationship between the target objects 5 can be known based on one or more of the image information and the text information. Further, the working storage unit 103 may store information related to the operation of the object. For example, when the specific target object 5 is operated by the information processing system 100, information on the gripping position with respect to the target object 5 and information on the gripping force at the time of gripping may be stored. Further, the working storage unit 103 may store information regarding the transition method between states. For example, when the target object 5 is transported from one state to another when the information processing system 100 transports the target object 5, the information on the transport speed may be limited or the locus of the transport path may be stored. Further, the working storage unit 103 may store the target object 5 as information for each specific object, or may store it as information for each class of objects. However, in the former case, it is assumed that only the objects stored in advance are targeted when the processing of the information processing system 100 is executed. Whereas objects that are not memorized are not targeted, in the latter case, objects that are not memorized in advance can also be classified into either class. For example, when there are "apple", "mandarin orange", and "peach" as the target object 5, state transition information may be stored for each of "apple", "mandarin orange", and "peach", and the three are collectively called "fruit". You may remember it in class.

作業群特定部１０６は、複数パターンの作業群から、指示情報が示す作業として、推定部１０４で推定した作業が含まれるものを、ロボットが実行するべき作業群の候補として特定する。作業記憶部１０３が記憶する作業群のうち、推定部１０４が推定した作業が含まれる作業群を特定する。推定された作業が複数パターンの作業群に含まれる場合は、作業群の候補を特定する。さらに追加情報を取得することで作業群の候補からさらに実行する作業群を特定する。詳細な説明は後述する。作業群特定部１０６は、特定した作業群を生成部１０７に送る。 The work group identification unit 106 identifies, as the work indicated by the instruction information, the work estimated by the estimation unit 104 from the plurality of patterns of work groups as a candidate for the work group to be executed by the robot. Among the work groups stored in the work storage unit 103, the work group including the work estimated by the estimation unit 104 is specified. If the estimated work is included in multiple patterns of work groups, identify work group candidates. By acquiring additional information, the work group to be further executed is specified from the work group candidates. A detailed description will be described later. The work group specifying unit 106 sends the specified work group to the generation unit 107.

生成部１０７は、作業群特定部１０６が特定した作業群と、視覚情報取得部１０２から送られた画像情報に基づき、対象物体５を操作するためのロボット２０の制御情報を生成する。ここでは、ロボットが取るべき位置姿勢と現在の位置姿勢とから、ロボットを移動させる向きと移動量を制御情報として生成する。生成部１０７は、生成した制御情報をロボット２０に送る。 The generation unit 107 generates control information of the robot 20 for operating the target object 5 based on the work group specified by the work group identification unit 106 and the image information sent from the visual information acquisition unit 102. Here, the direction and amount of movement of the robot are generated as control information from the position / posture to be taken by the robot and the current position / posture. The generation unit 107 sends the generated control information to the robot 20.

図１６を用いて情報処理装置１０が実行する処理の概要を説明する。以下の説明では、各工程（ステップ）について先頭にＳを付けて表記することで、工程（ステップ）の表記を省略する。Ｓ１６０１では、ＣＰＵ１１が初期化する。Ｓ１６０２では、指示入力部１０１が、指示の入力を受け付ける。Ｓ１６０３では、推定部１０４が、入力された指示から作業内容を推定する。Ｓ１６０４では、作業群特定部１０６が、複数パターンの作業群から、指示情報が示す作業が含まれる作業群を、ロボットに実行させる作業群の候補として特定する。Ｓ１６０５では、生成部１０７が、特定した作業群と、視覚情報取得部１０２から送られた画像情報に基づき、対象物体５を操作するためのロボット２０の制御情報を生成する。Ｓ１６０６では、指示入力部１０１が、追加の指示があるか否かを判断する。終了指示があった場合は、処理を終了する。終了指示がなかった場合は、Ｓ１６０２に戻る。指示入力が新たになされない場合は、引き続き同じ作業群を実行しても良い。 The outline of the process executed by the information processing apparatus 10 will be described with reference to FIG. In the following description, the description of the process (step) is omitted by adding S at the beginning of each process (step). In S1601, the CPU 11 initializes. In S1602, the instruction input unit 101 receives the input of the instruction. In S1603, the estimation unit 104 estimates the work content from the input instruction. In S1604, the work group specifying unit 106 identifies a work group including the work indicated by the instruction information as a candidate of the work group to be executed by the robot from the work groups of a plurality of patterns. In S1605, the generation unit 107 generates control information of the robot 20 for operating the target object 5 based on the specified work group and the image information sent from the visual information acquisition unit 102. In S1606, the instruction input unit 101 determines whether or not there is an additional instruction. If there is an end instruction, the process ends. If there is no end instruction, the process returns to S1602. If the instruction input is not newly performed, the same work group may be continuously executed.

図３は、情報処理システム１００および情報処理装置１０が実行する処理を示すフローチャートである。図３に示される処理は、図１５に示す情報処理装置１０のＣＰＵ１１が、ＲＯＭ１２もしくは外部メモリ１４に格納されたプログラムを読み出して実行することにより実現される。ただし、図３の処理の一部または全部が、専用のハードウェアにより実現されてもよい。図３の処理は、例えばオペレータが情報処理システム１００を起動したときに開始される。ただし、開始のタイミングは、情報処理システム１００の起動時に限定されるものではない。 FIG. 3 is a flowchart showing a process executed by the information processing system 100 and the information processing device 10. The process shown in FIG. 3 is realized by the CPU 11 of the information processing apparatus 10 shown in FIG. 15 reading and executing a program stored in the ROM 12 or the external memory 14. However, a part or all of the processing of FIG. 3 may be realized by dedicated hardware. The process of FIG. 3 is started, for example, when the operator starts the information processing system 100. However, the start timing is not limited to when the information processing system 100 is started.

情報処理システム１００が実行する処理の概要を説明する。Ｓ１では、ＣＰＵ１１がシステムを初期化する。これは、図１６のＳ１６０１の処理に相当する。Ｓ２では、音声入力装置１は、ユーザがロボットに対して指示をした音声情報を情報処理装置に入力する。これは、図１６のＳ１６０２の処理に相当する。Ｓ３では、推定部１０４が、入力された指示から作業内容を推定する。これは、図１６のＳ１６０３の処理に相当する。Ｓ４では、コントローラ２０１が、撮像装置２の位置姿勢を決定する。Ｓ５では、コントローラ２０１が、撮像位置にマニピュレータ２０２を移動制御する。Ｓ６では、撮像装置２が、供給領域にある対象物体５を撮像する。Ｓ７では、視覚情報取得部１０２が、初期状態の対象物体を撮像した画像情報を取得する。Ｓ８では、作業群特定部１０６が、複数パターンの作業群から、指示情報が示す作業が含まれる作業群を、作業群の候補として特定する。これは、図１６のＳ１６０４の処理に相当する。さらに、補助情報と、作業群の候補とに基づいて、ロボットが実行する作業群を特定する。Ｓ９では、作業群特定部１０６が、特定した作業群に基づいて、作業順序を決定する。Ｓ１０では、コントローラ２０１が対象物体５の撮影位置姿勢を決定する。Ｓ１１では、コントローラ２０１が、撮像のためのマニピュレータ２０２の制御を行う。Ｓ１２では、撮像装置２が、Ｓ１０で決定された位置姿勢から、対象物体５の撮像を行う。Ｓ１３では、視覚情報取得部１０２が、撮像装置２が出力する対象物体５の画像情報を取得し、生成部１０７に送る。Ｓ１４では、生成部１０７が、ロボット２０の位置姿勢制御情報を生成する。これは、図１６のＳ１６０５の処理に相当する。Ｓ１５では、コントローラ２０１が、対象物体５を操作するためのマニピュレータ２０２と把持装置２０３の制御を行う。Ｓ１６では、マニピュレータ２０２は、Ｓ１５の制御結果に基づき、ロボット２０の動作、把持、搬送、配置といった対象物体５に対する操作を実行する。Ｓ１７では、ＣＰＵ２１は、次の対象物体５があるかどうかの判定をする。 The outline of the processing executed by the information processing system 100 will be described. In S1, the CPU 11 initializes the system. This corresponds to the process of S1601 in FIG. In S2, the voice input device 1 inputs the voice information instructed by the user to the robot to the information processing device. This corresponds to the process of S1602 in FIG. In S3, the estimation unit 104 estimates the work content from the input instruction. This corresponds to the process of S1603 in FIG. In S4, the controller 201 determines the position and orientation of the image pickup device 2. In S5, the controller 201 moves and controls the manipulator 202 to the imaging position. In S6, the imaging device 2 images the target object 5 in the supply region. In S7, the visual information acquisition unit 102 acquires image information obtained by capturing an image of the target object in the initial state. In S8, the work group identification unit 106 identifies a work group including the work indicated by the instruction information as a candidate for the work group from the work groups of a plurality of patterns. This corresponds to the process of S1604 in FIG. Further, the work group to be executed by the robot is specified based on the auxiliary information and the candidate work group. In S9, the work group specifying unit 106 determines the work order based on the specified work group. In S10, the controller 201 determines the shooting position and orientation of the target object 5. In S11, the controller 201 controls the manipulator 202 for imaging. In S12, the image pickup apparatus 2 takes an image of the target object 5 from the position and orientation determined in S10. In S13, the visual information acquisition unit 102 acquires the image information of the target object 5 output by the image pickup apparatus 2 and sends it to the generation unit 107. In S14, the generation unit 107 generates the position / attitude control information of the robot 20. This corresponds to the process of S1605 in FIG. In S15, the controller 201 controls the manipulator 202 and the gripping device 203 for operating the target object 5. In S16, the manipulator 202 executes operations on the target object 5 such as the operation, gripping, transporting, and arranging of the robot 20 based on the control result of S15. In S17, the CPU 21 determines whether or not there is the next target object 5.

まずＳ１において、ＣＰＵ１１は、システムの初期化処理を行う。すなわち、ＣＰＵ１１は、ＲＯＭ１２もしくは外部メモリ１４に格納されたプログラムをロードし、ＲＡＭ１３上に展開して実行可能な状態とする。また、情報処理装置１０に接続された各機器のパラメータの読み込みや初期位置への復帰を行い、使用可能な状態にする。 First, in S1, the CPU 11 performs a system initialization process. That is, the CPU 11 loads the program stored in the ROM 12 or the external memory 14 and expands it on the RAM 13 so that it can be executed. In addition, the parameters of each device connected to the information processing device 10 are read and returned to the initial position to make them usable.

Ｓ２では、音声入力装置１は、ユーザがロボットに対して指示をした音声情報を情報処理装置に入力する。これは、図１６のＳ１６０２の処理に相当する。すなわち、指示入力部１０１は、ユーザからの指示を入力する。音声入力装置１は、例えば指示者が特定のキーワードを発したら音声情報の入力を開始し、音声が一定時間入力されなくなったら音声情報の入力を終了する。音声入力の方法はこれに限らず、例えば、特定の人物の声を検知したときに、音声入力装置が録音を開始するようにしてもよい。入力される指示情報は、音声情報に限らない。例えば、ジェスチャを撮像した映像情報や、キーボードや情報端末からの所定の操作による操作情報であっても良い。 In S2, the voice input device 1 inputs the voice information instructed by the user to the robot to the information processing device. This corresponds to the process of S1602 in FIG. That is, the instruction input unit 101 inputs an instruction from the user. For example, the voice input device 1 starts inputting voice information when the instructor issues a specific keyword, and ends inputting voice information when the voice is not input for a certain period of time. The voice input method is not limited to this, and for example, the voice input device may start recording when the voice of a specific person is detected. The input instruction information is not limited to voice information. For example, it may be video information obtained by capturing a gesture or operation information by a predetermined operation from a keyboard or an information terminal.

Ｓ３では、推定部１０４が、入力された指示から作業内容を推定する。これは、図１６のＳ１６０３の処理に相当する。ここでは、Ｓ２で入力された音声情報の認識処理を行う。音声情報を認識した結果である指示情報には、対象物体の状態に関する情報を含む。指示情報は推定部１０４に入力される。音声情報の認識処理は、例えば、隠れマルコフモデルを用いて音声情報をテキスト情報に変換する。その後、ＬＳＴＭ（Ｌｏｎｇｓｈｏｒｔ−ｔｅｒｍｍｅｍｏｒｙ）を用いたＳｅｑ２Ｓｅｑ（ｓｅｑｕｅｎｃｅｔｏｓｅｑｕｅｎｃｅ）モデルによって、テキスト情報から目的とする対象物体５の状態と操作に関する情報を抽出する。ただし、対象物体５の状態と操作に関する情報の求め方は、テキスト情報を形態素解析により単語に分類した後、単語ごとの品詞を推定することで求めてもよいし、他の方法で求めてもよい。その他の既存技術を用いて音声情報を指示情報に変換してもよい。 In S3, the estimation unit 104 estimates the work content from the input instruction. This corresponds to the process of S1603 in FIG. Here, the recognition process of the voice information input in S2 is performed. The instruction information that is the result of recognizing the voice information includes information regarding the state of the target object. The instruction information is input to the estimation unit 104. The voice information recognition process converts voice information into text information using, for example, a hidden Markov model. Then, information on the state and operation of the target object 5 is extracted from the text information by the Seq2Seq (sequence to sequence) model using LSTM (Long short-term memory). However, the method of obtaining information on the state and operation of the target object 5 may be obtained by classifying the text information into words by morphological analysis and then estimating the part of speech for each word, or by other methods. Good. Voice information may be converted into instruction information using other existing techniques.

Ｓ４では、撮像装置２による撮影のために、撮影時のマニピュレータ２０２の位置姿勢を、予め設定された位置姿勢に基づいてコントローラ２０１が決定する。あるいは、図示しない撮像位置姿勢決定部が現在のロボットの位置姿勢に基づいて決定し、コントローラ２０１へ送る。決定される位置姿勢は、供給領域にある対象物体５を撮像することができれば、どのような方法で決定してもよい。なお、供給領域が広く、一度の撮像では供給領域の視覚情報を取得することが困難な場合は、複数の位置姿勢を決定し、複数の視覚情報を取得した結果を利用して位置姿勢を決定してもよい。また、目的の作業において、対象物体５を目標状態へ遷移させるために、供給領域とは異なる領域の視覚情報が必要な場合は、供給領域以外も対象として撮像位置姿勢を決定してもよい。例えば、対象物体５を供給領域から、供給領域とは異なる配置領域へ移動させる作業の場合、配置領域の視覚情報を取得するための撮像位置姿勢を決定し、撮像してもよい。なお、本実施形態では、供給領域には目標状態に必要な材料（物体）のみが準備されているものとする。別の目標状態に変更する場合は、その変更に合わせて供給領域に置く材料（物体）を変更する。 In S4, the controller 201 determines the position / orientation of the manipulator 202 at the time of photographing based on the preset position / orientation for the imaging by the imaging device 2. Alternatively, an imaging position / orientation determination unit (not shown) determines based on the current position / orientation of the robot and sends it to the controller 201. The position and orientation to be determined may be determined by any method as long as the target object 5 in the supply region can be imaged. If the supply area is wide and it is difficult to acquire visual information of the supply area with one imaging, a plurality of positions and orientations are determined, and the position and orientation are determined using the results of acquiring a plurality of visual information. You may. Further, in the target work, when visual information of a region different from the supply region is required in order to transition the target object 5 to the target state, the imaging position / posture may be determined for the target other than the supply region. For example, in the case of the work of moving the target object 5 from the supply area to an arrangement area different from the supply area, the imaging position / orientation for acquiring the visual information of the arrangement area may be determined and the image may be taken. In the present embodiment, it is assumed that only the materials (objects) necessary for the target state are prepared in the supply area. When changing to another target state, the material (object) placed in the supply area is changed according to the change.

Ｓ５では、コントローラ２０１が、撮像のためのマニピュレータ２０２の制御を行う。具体的には、Ｓ４で決定された位置姿勢にマニピュレータ２０２をどのように動かせばいいかを決定する。例えば、Ｓ４で生成された位置姿勢に対してマニピュレータ２０２を移動させるために、目標となる位置姿勢情報を順運動学によってマニピュレータ２０２の関節角度情報に変換する。マニピュレータ２０２の各関節のアクチュエータを動作させるための指令値を計算する。計算結果に基づき、マニピュレータ２０２が動作する。 In S5, the controller 201 controls the manipulator 202 for imaging. Specifically, it is determined how to move the manipulator 202 to the position / posture determined in S4. For example, in order to move the manipulator 202 with respect to the position / posture generated in S4, the target position / posture information is converted into the joint angle information of the manipulator 202 by forward kinematics. The command value for operating the actuator of each joint of the manipulator 202 is calculated. The manipulator 202 operates based on the calculation result.

Ｓ６では、撮像装置２が、供給領域にある対象物体５を撮像する。このときの対象物体を、初期状態の対象物体と定義する。 In S6, the imaging device 2 images the target object 5 in the supply region. The target object at this time is defined as the target object in the initial state.

Ｓ７では、視覚情報取得部１０２が、初期状態の対象物体を撮像した画像情報を取得する。つまり、撮像装置２が出力する供給領域の画像情報を取得し、推定部１０４に入力する。この画像情報によって、対象物体の初期状態を推定部１０４によって特定可能である。 In S7, the visual information acquisition unit 102 acquires image information obtained by capturing an image of the target object in the initial state. That is, the image information of the supply area output by the image pickup apparatus 2 is acquired and input to the estimation unit 104. From this image information, the initial state of the target object can be specified by the estimation unit 104.

Ｓ８では、作業群特定部１０６が、複数パターンの作業群から、指示情報が示す作業が含まれる作業群を、作業群の候補として特定する。これは、図１６のＳ１６０４の処理に相当する。 In S8, the work group identification unit 106 identifies a work group including the work indicated by the instruction information as a candidate for the work group from the work groups of a plurality of patterns. This corresponds to the process of S1604 in FIG.

図１７を用いてＳ８の処理の概要を説明する。Ｓ１８１では、作業群特定部１０６は、まず、指示された作業を特定する。つまり、Ｓ３で推定された作業についての情報を取得する。Ｓ１８２では、作業群特定部１０６は、ロボットに連続して実行させる作業群を複数パターン取得する。Ｓ１８３では、作業群特定部１０６は、複数パターンの作業群と、指示された作業とを照合して、作業群の候補を特定する。Ｓ１８４では、作業群特定部１０６は、指示された作業を含む作業群が複数あるか否かを判断する。複数ある場合は、Ｓ１８５に進む。１つの作業群に特定できる場合は、Ｓ１８６に進む。Ｓ１８５では、作業群特定部１０６ロボットに実行させる作業群を特定するための補助情報を取得する。ここでは、画像情報から推定された対象物体と、その対象物体についての状態遷移情報とに基づいて、その対象物体を用いて実行可能な作業を推定する。Ｓ１８６では、作業群特定部１０６が、補助情報に基づいて、作業群の候補からロボットに実行させる作業群を特定する。このような処理を行うことで、指示に不足がある、あるいは指示をうまく認識できない状況であっても、ロボットに実行させる作業群を特定できる。 The outline of the process of S8 will be described with reference to FIG. In S181, the work group identification unit 106 first identifies the instructed work. That is, the information about the work estimated in S3 is acquired. In S182, the work group identification unit 106 acquires a plurality of patterns of work groups to be continuously executed by the robot. In S183, the work group identification unit 106 collates a plurality of patterns of work groups with the instructed work, and identifies candidates for the work group. In S184, the work group identification unit 106 determines whether or not there are a plurality of work groups including the instructed work. If there are a plurality, the process proceeds to S185. If it can be identified as one work group, the process proceeds to S186. In S185, the work group specifying unit 106 acquires auxiliary information for specifying the work group to be executed by the robot. Here, based on the target object estimated from the image information and the state transition information about the target object, the work that can be performed using the target object is estimated. In S186, the work group identification unit 106 identifies the work group to be executed by the robot from the work group candidates based on the auxiliary information. By performing such processing, it is possible to identify a work group to be executed by the robot even in a situation where the instruction is insufficient or the instruction cannot be recognized well.

具体的に、弁当製造ラインにおいて、日の丸弁当を製造する工程で本システムを利用する場合について説明する。指示者は、空の弁当箱に対して、まず、白米を入れて、次に、その白米の上に梅干しを乗せることで、日の丸弁当（目標状態）を作ることを目的とする。このとき、指示者にとって、弁当箱に梅干しを入れる前に白米を乗せることは自明であることから、情報処理システム１００への作業指示として「弁当箱に梅干しを乗せるとこまでやっておいて」といった発言を行ったとする。この音声情報を音声入力装置１が指示入力部１０１に入力する。この指示からロボットは「梅干しを載せる」という作業は認識できる。しかし、指示された作業のみをロボット１０が実行すると、弁当箱の中に梅干しが載せられるだけで、白米がない状態になる。つまり、ロボットはユーザによって入力された指示からは目的とする日の丸弁当が作れない。すなわち、音声情報から認識されるロボットへの指示情報は、「弁当箱（の教示済みの位置）に梅干しを設置」する作業のみを指すため、不完全な指示であるといえる。そこで、推定部１０４において、入力された指示から、ユーザが意図する作業群を特定し、ロボットに連続的な作業を実行させる。 Specifically, a case where this system is used in the process of manufacturing Hinomaru bento in a bento manufacturing line will be described. The instructor aims to make a Hinomaru bento (target state) by first putting white rice in an empty lunch box and then putting dried plums on the white rice. At this time, since it is obvious to the instructor to put the white rice before putting the pickled plums in the lunch box, as a work instruction to the information processing system 100, he said, "Put the pickled plums in the lunch box." Suppose you make a statement. The voice input device 1 inputs this voice information to the instruction input unit 101. From this instruction, the robot can recognize the work of "putting dried plums". However, when the robot 10 executes only the instructed work, only the pickled plums are placed in the lunch box and there is no white rice. That is, the robot cannot make the desired Hinomaru bento from the instructions input by the user. That is, it can be said that the instruction information to the robot recognized from the voice information is an incomplete instruction because it refers only to the work of "installing the pickled plums in (the position where the lunch box has been taught)". Therefore, the estimation unit 104 identifies the work group intended by the user from the input instructions, and causes the robot to execute continuous work.

Ｓ１８１では、作業群特定部１０６は、推定部１０４で推定された作業を、作業群を特定するために受け付ける。ここでは、具体的には、「梅干しを載せる」という作業（図４（Ａ）の作業３）を特定する。Ｓ１８２では、作業群特定部１０６は、作業記憶部１０３から、ロボットが連続して実行する作業群を複数パターン取得する。具体的には、例えば、「日の丸弁当（目標状態Ｆ）」を作るための作業群（作業群Ｆと呼ぶ）は、図４（Ａ）の「弁当箱を設置（作業１）→白米を弁当箱に入れる（作業２）→梅干しを載せる（作業３）」である。作業記憶部１０３には、他にも、作業群Ｇ、作業群Ｅが記憶されている。作業群の具体例を図４（Ｃ）に示す。Ｓ１８３では、作業群特定部１０６は、複数パターンの作業群と、Ｓ１８１で受け付けた作業とを照合することで、ユーザに指示された作業を含む作業群の候補を特定する。例えば、図４（Ｃ）より、作業３を含む作業群は作業群Ｆと特定される。 In S181, the work group identification unit 106 accepts the work estimated by the estimation unit 104 in order to specify the work group. Here, specifically, the work of "putting dried plums" (work 3 of FIG. 4A) is specified. In S182, the work group identification unit 106 acquires a plurality of patterns of work groups continuously executed by the robot from the work storage unit 103. Specifically, for example, in the work group (called work group F) for making "Hinomaru bento (target state F)", "install a lunch box (work 1) → white rice is a lunch box" in FIG. 4 (A). Put it in a box (work 2) → put dried plums (work 3) ". In addition, the work group G and the work group E are stored in the work storage unit 103. A specific example of the work group is shown in FIG. 4 (C). In S183, the work group identification unit 106 identifies a candidate for the work group including the work instructed by the user by collating the work group of the plurality of patterns with the work received in S181. For example, from FIG. 4C, the work group including the work 3 is specified as the work group F.

Ｓ１８４では、作業群特定部１０６は、指示された作業を含む作業群が複数あるか否かを判断する。複数ある場合は、Ｓ１８５に進む。１つの作業群に特定できる場合は、Ｓ１８６に進む。ここでは、候補として特定された作業群Ｆ以外の、他の作業群には作業３が含まれないのでＳ１８６に進む。ここで、例えば、作業群Ｈとして、「作業１→作業２→作業３→作業４」という作業群を考える。このとき、Ｓ１８３で作業群Ｆと作業群Ｈが作業群の候補として特定される。その場合は、Ｓ１８５に進む。 In S184, the work group identification unit 106 determines whether or not there are a plurality of work groups including the instructed work. If there are a plurality, the process proceeds to S185. If it can be identified as one work group, the process proceeds to S186. Here, since the work 3 is not included in the other work groups other than the work group F specified as the candidate, the process proceeds to S186. Here, for example, as the work group H, consider a work group of "work 1-> work 2-> work 3-> work 4". At this time, in S183, the work group F and the work group H are specified as candidates for the work group. In that case, the process proceeds to S185.

Ｓ１８５では、作業群特定部１０６は、ロボットが実行するべき作業群を特定するための補助情報を取得する。補助情報には複数のケースが想定される。第１のケースとして、補足情報は、ユーザによって入力される選択指示である場合を説明する。例えば、作業群の候補を、表示装置等に出力し、ユーザに指示を入力させる。第２のケースとして、複数パターンの作業群の何れかで利用される材料に関わる材料情報である場合を説明する。ここでは、画像情報から推定された材料と、その材料を用いた目標状態を示す状態遷移情報とを取得する。材料情報は、ロボットの作業空間に存在する材料を特定する情報である。具体的には、ロボットの作業空間を撮像した画像に基づいて、ロボットの作業空間に存在する材料を認識した結果を用いる。なお、ここでは生産に用いる材料のみが作業空間に設置されているものとする。第３のケースとして、補助情報が、生産品目の売り上げ状況を示す生産情報である場合を説明する。例えば、過去の販売個数についての生産情報を状態記憶部１０５に記憶しておく。過去の販売個数が所定の値より大きい生産品目に対応する作業群を特定し、売れ筋の生産品目を優先して生産する。補助情報はこれら以外の情報を用いてもよい。 In S185, the work group identification unit 106 acquires auxiliary information for specifying the work group to be executed by the robot. Multiple cases are assumed for auxiliary information. As a first case, the case where the supplementary information is a selection instruction input by the user will be described. For example, the candidate of the work group is output to a display device or the like, and the user is made to input an instruction. As the second case, the case where the material information is related to the material used in any of the work groups of a plurality of patterns will be described. Here, the material estimated from the image information and the state transition information indicating the target state using the material are acquired. Material information is information that identifies a material existing in the robot's work space. Specifically, the result of recognizing the material existing in the robot work space is used based on the image obtained by capturing the robot work space. Here, it is assumed that only the materials used for production are installed in the work space. As a third case, a case where the auxiliary information is production information indicating the sales status of the production item will be described. For example, the production information about the past sales quantity is stored in the state storage unit 105. Identify the work group corresponding to the production items whose past sales quantity is larger than the predetermined value, and prioritize the production of the best-selling production items. Information other than these may be used as the auxiliary information.

Ｓ１８６では、作業群特定部１０６は、補助情報に基づいて、作業群の候補からロボットに実行させる作業群を決定する。Ｓ１８４で作業群が１つに特定できた場合には、その作業群をロボットに実行させる作業群として決定する。補助情報が入力指示である場合は、ユーザによって指示された作業群を特定する。ユーザ自身が指示を補完することができるため、ユーザに意図を正確にロボットに認識させることができ、人為的ミスを低減できる。補助情報が材料情報である場合は、認識された材料に関する作業を含む作業群を特定する。例えば、撮像画像に、弁当箱、白米、梅干しが含まれる場合は、作業１〜３が実行可能である。一方で、漬物が含まれないため、作業４は実行不能である。これらの認識結果から、作業群Ｆを特定する。画像情報を使って材料を認識する場合は、ユーザの手間を取らせずにロボット自身が指示を補完できるため、効率的に作業を実行できる。補助情報が生産情報である場合は、指示された作業が含まれる作業群のうち、最も販売個数が多い生産品目に対応する作業群を特定する。この場合も、ユーザの手間を取らせずにロボット自身が指示を補完できるため、効率的に作業を実行できる。 In S186, the work group identification unit 106 determines the work group to be executed by the robot from the work group candidates based on the auxiliary information. When the work group can be specified as one in S184, the work group is determined as the work group to be executed by the robot. When the auxiliary information is an input instruction, the work group instructed by the user is specified. Since the user can complement the instructions, the robot can accurately recognize the intention of the user, and human error can be reduced. If the auxiliary information is material information, identify a work group that includes work on the recognized material. For example, when the captured image includes a lunch box, white rice, and dried plums, steps 1 to 3 can be performed. On the other hand, work 4 is infeasible because it does not contain pickles. From these recognition results, the work group F is specified. When recognizing a material using image information, the robot itself can complement the instruction without taking the trouble of the user, so that the work can be executed efficiently. When the auxiliary information is production information, the work group corresponding to the production item with the highest sales quantity is specified from the work group including the instructed work. In this case as well, the robot itself can complement the instructions without taking the trouble of the user, so that the work can be executed efficiently.

Ｓ９では、作業群特定部１０６が、特定された作業群に基づいて、作業の実行順序を決定する。作業記憶部１０５は、ロボット２０が対象物体５を操作するための作業群を複数パターン記憶している。作業パターンとは、情報処理システム１００の動作のことであり、把持、搬送、配置や、これら以外の作業でもよい。作業群によって作業を実行する順序が異なるため、特定された作業群に応じて、指示された作業を実行するタイミングを決定する。作業群によっては、順不同な作業がある。その時は、例えば、材料とロボットとの距離が近い順に作業を行うようにしても良い。 In S9, the work group specifying unit 106 determines the execution order of the work based on the specified work group. The work storage unit 105 stores a plurality of patterns of work groups for the robot 20 to operate the target object 5. The work pattern is the operation of the information processing system 100, and may be gripping, transporting, arranging, or other work. Since the order in which the work is executed differs depending on the work group, the timing for executing the instructed work is determined according to the specified work group. Depending on the work group, there are tasks in no particular order. At that time, for example, the work may be performed in the order in which the distance between the material and the robot is short.

Ｓ１０では、コントローラ２０１が対象物体５の撮影位置姿勢を決定する。撮像装置２による撮影のためにマニピュレータ２０２が移動する位置姿勢をコントローラ２０１が決定する。あるいは、図示しない撮像位置姿勢決定手段により決定し、コントローラ２０１へ送る。Ｓ４において決定した撮像位置姿勢は供給領域を対象としたが、Ｓ１０では対象物体５を対象とするため、対象物体５の把持位置姿勢を決定することに適した撮像位置姿勢を決定する。決定される位置姿勢は、対象物体５を撮像することができれば、どのような方法で決定してもよい。 In S10, the controller 201 determines the shooting position and orientation of the target object 5. The controller 201 determines the position and orientation in which the manipulator 202 moves for shooting by the image pickup device 2. Alternatively, it is determined by an imaging position / orientation determining means (not shown) and sent to the controller 201. The imaging position / posture determined in S4 targets the supply region, but since the target object 5 is targeted in S10, the imaging position / posture suitable for determining the gripping position / posture of the target object 5 is determined. The position and orientation to be determined may be determined by any method as long as the target object 5 can be imaged.

Ｓ１１では、コントローラ２０１が、撮像のためのマニピュレータ２０２の制御を行う。Ｓ１２では、撮像装置２が、Ｓ１０で決定された位置姿勢から、対象物体５の撮像を行う。Ｓ１３では、視覚情報取得部１０２が、撮像装置２が出力する対象物体５の画像情報を取得し、生成部１０７に送る。 In S11, the controller 201 controls the manipulator 202 for imaging. In S12, the image pickup apparatus 2 takes an image of the target object 5 from the position and orientation determined in S10. In S13, the visual information acquisition unit 102 acquires the image information of the target object 5 output by the image pickup apparatus 2 and sends it to the generation unit 107.

Ｓ１４では、生成部１０７が、ロボット２０の位置姿勢制御情報を生成する。生成部１０７は、対象物体５を把持する場合、Ｓ１２で撮像した対象物体５の画像情報を取得し、対象物体５の把持位置姿勢を決定する。決定方法は、対象物体５の把持位置姿勢を求められればどのような方法でもよい。例えば、対象物体５のテンプレート画像を記憶しているならば、事前にテンプレートの物体座標系に対して設定しておいた把持位置姿勢に基づき求める。具体的には、まず、事前に対象物体５のテンプレートに設定しておいた物体座標系に対して、把持位置姿勢を設定する。このとき、把持位置姿勢は把持装置３の種類によって異なるため、チャックによる把持や吸着による把持など、それぞれの把持方法に応じて設定を行う。そして、情報処理システム１００の処理実行時は、テンプレートマッチングにより対象物体５の位置姿勢を求めたら、対象物体５の物体座標系が求まるため、相対的に把持位置姿勢を求めることができる。または、深層学習によって、事前に学習したモデルを利用し、対象物体５の画像情報から、把持位置姿勢を直接求めてもよい。または、把持装置２０３がチャックによって把持する場合、把持開閉幅に基づき、対象物体５における正対する面を求めることで把持位置姿勢を決定してもよい。または、把持装置２０３が吸着により把持する場合、吸着パッドの寸法に基づき、対象物体５を吸着できる平面部分を求めて把持位置姿勢を決定してもよいし、これら以外の方法で決定してもよい。 In S14, the generation unit 107 generates the position / attitude control information of the robot 20. When gripping the target object 5, the generation unit 107 acquires the image information of the target object 5 captured in S12 and determines the gripping position / posture of the target object 5. The determination method may be any method as long as the gripping position and posture of the target object 5 can be obtained. For example, if the template image of the target object 5 is stored, it is obtained based on the gripping position and posture set in advance with respect to the object coordinate system of the template. Specifically, first, the gripping position and posture are set with respect to the object coordinate system set in advance in the template of the target object 5. At this time, since the gripping position posture differs depending on the type of the gripping device 3, the setting is performed according to each gripping method such as gripping by a chuck or gripping by suction. Then, when the processing of the information processing system 100 is executed, if the position and orientation of the target object 5 are obtained by template matching, the object coordinate system of the target object 5 is obtained, so that the gripping position and orientation can be relatively obtained. Alternatively, the gripping position / posture may be directly obtained from the image information of the target object 5 by using the model learned in advance by deep learning. Alternatively, when the gripping device 203 grips with the chuck, the gripping position / posture may be determined by obtaining the facing surface of the target object 5 based on the gripping opening / closing width. Alternatively, when the gripping device 203 grips by suction, the gripping position / posture may be determined by obtaining a flat portion capable of sucking the target object 5 based on the dimensions of the suction pad, or may be determined by a method other than these. Good.

また、Ｓ１４では、対象物体５を搬送したり配置したりする場合のロボットの位置姿勢制御情報も決定する。配置位置姿勢制御情報の決定方法は、例えば、把持位置姿勢の決定と同様に、Ｓ１０で決定した作業群に基づいて、Ｓ１２で撮像した対象物体５の画像情報を取得し決定する。作業群決定結果に基づき、対象物体５を別の対象物体５の上に配置する場合、両物体の相対的位置関係は既知なので、配置先の対象物体５の位置姿勢情報に基づき把持している対象物体５の配置位置姿勢を決定する。また、生成部１０７は、位置姿勢制御情報をある一点だけでなく、複数の連続する点や軌跡として決定してもよい。例えば、決定した把持位置姿勢から配置位置姿勢まで対象物体５を搬送するための搬送経路を決定してもよい。また、把持、配置、搬送以外の操作のための位置姿勢制御情報を決定してもよい。例えば、対象物体５を把持することなく移動させる（グラスプレスマニピュレーションする）ためのロボット２０の位置姿勢や、対象物体５を検査するためのロボット２０の撮像位置姿勢を決定してもよい。これら以外の作業を目的とした位置姿勢を決定してもよい。 Further, in S14, the position / attitude control information of the robot when transporting or arranging the target object 5 is also determined. As for the method of determining the arrangement position / posture control information, for example, the image information of the target object 5 captured in S12 is acquired and determined based on the work group determined in S10, similarly to the determination of the gripping position / posture. When the target object 5 is placed on another target object 5 based on the work group determination result, since the relative positional relationship between the two objects is known, the target object 5 is gripped based on the position / orientation information of the placement destination target object 5. The placement position and orientation of the target object 5 are determined. Further, the generation unit 107 may determine the position / attitude control information not only as a certain point but also as a plurality of continuous points and loci. For example, the transport path for transporting the target object 5 from the determined grip position posture to the arrangement position posture may be determined. Further, the position / attitude control information for operations other than gripping, arranging, and transporting may be determined. For example, the position and orientation of the robot 20 for moving the target object 5 without grasping it (glass press manipulation) and the imaging position and orientation of the robot 20 for inspecting the target object 5 may be determined. The position and posture may be determined for the purpose of work other than these.

Ｓ１５では、コントローラ２０１が、対象物体５を操作するためのマニピュレータ２０２と把持装置２０３の制御を行う。具体的には、Ｓ１４で決定した位置姿勢に基づき、マニピュレータ２０２と把持装置２０３をどのように制御すればいいかを決定する。ここで、Ｓ１５で決定した位置姿勢は画像座標系における位置姿勢であるが、ロボット２０を制御するにはロボット座標系に変換する必要があるため、把持位置姿勢を求める場合を例に、座標変換について説明する。 In S15, the controller 201 controls the manipulator 202 and the gripping device 203 for operating the target object 5. Specifically, it is determined how to control the manipulator 202 and the gripping device 203 based on the position and orientation determined in S14. Here, the position / posture determined in S15 is the position / posture in the image coordinate system, but since it is necessary to convert to the robot coordinate system in order to control the robot 20, coordinate conversion is performed by taking as an example the case of obtaining the gripping position / posture. Will be described.

図８は、座標系の関係を示す図である。撮像装置２の座標系Σｃ、マニピュレータ２０２のベース座標系Σｂ、マニピュレータ２０２の先端にあたるメカニカルインタフェース座標系Σｍ、把持装置２０３の先端にあたるツール座標系Σｔ、対象物体５を把持する際の対象物体座標系Σｂを統一して扱う。そのため、図８に示すように、作業空間内で基準となる座標系として、ワールド座標系Σｗを設定する。まず、ワールド座標系Σｗからマニピュレータベース座標系Σｂまでの変位を（ＢＸ，ＢＹ，ＢＺ）とする。また、マニピュレータ２０２の姿勢を表す３×３の回転行列をＢＭとする。マニピュレータベース座標系Σｂからマニピュレータ先端座標系Σｍまでの変位は（ＭＸ，ＭＹ，ＭＺ）とする。また、マニピュレータ先端の姿勢を表す３×３の回転行列はＭＭとする。さらに、マニピュレータ先端座標系Σｍから把持装置２０３の座標系Σｔまでの変位は（ＴＸ，ＴＹ，ＴＺ）とする。また、把持装置２０３の先端の姿勢を表す３×３の回転行列はＴＭとする。 FIG. 8 is a diagram showing the relationship of the coordinate system. The coordinate system Σc of the image pickup device 2, the base coordinate system Σb of the manipulator 202, the mechanical interface coordinate system Σm at the tip of the manipulator 202, the tool coordinate system Σt at the tip of the gripping device 203, and the target object coordinate system when gripping the target object 5. Treat Σb in a unified manner. Therefore, as shown in FIG. 8, the world coordinate system Σw is set as the reference coordinate system in the work space. First, let the displacement from the world coordinate system Σw to the manipulator-based coordinate system Σb be (BX, BY, BZ). Further, a 3 × 3 rotation matrix representing the posture of the manipulator 202 is defined as BM. The displacement from the manipulator base coordinate system Σb to the manipulator tip coordinate system Σm is (MX, MY, MZ). Further, the 3 × 3 rotation matrix representing the posture of the tip of the manipulator is MM. Further, the displacement from the manipulator tip coordinate system Σm to the coordinate system Σt of the gripping device 203 is (TX, TY, TZ). Further, the rotation matrix of 3 × 3 representing the posture of the tip of the gripping device 203 is TM.

なお、把持装置２０３の先端とは、把持装置３が対象物体５に接触する部分であり、把持装置２０３が図８のようにチャックの場合、チャック先端の中央とする。マニピュレータ先端座標系Σｍから撮像装置座標系Σｃまでの変位は（ＣＸ，ＣＹ，ＣＺ）とする。また、撮像装置２の姿勢を表す３×３の回転行列はＣＭとする。 The tip of the gripping device 203 is a portion where the gripping device 3 comes into contact with the target object 5, and when the gripping device 203 is a chuck as shown in FIG. 8, it is the center of the chuck tip. The displacement from the manipulator tip coordinate system Σm to the image pickup device coordinate system Σc is (CX, CY, CZ). Further, the 3 × 3 rotation matrix representing the posture of the image pickup apparatus 2 is CM.

さらに、撮像装置座標系Σｃから対象物体５の対象物体座標系Σｂまでの変位は（ＢＸ，ＢＹ，ＢＺ）とする。また、対象物体５の姿勢を表す３×３の回転行列はＢＭとする。ここで、ワールド座標系Σｗから見た対象物体５の変位を（ＷＸ，ＷＹ，ＷＺ）、姿勢を表す３×３の回転行列をＷＭとする。 Further, the displacement from the image pickup device coordinate system Σc to the target object coordinate system Σb of the target object 5 is (BX, BY, BZ). Further, the 3 × 3 rotation matrix representing the posture of the target object 5 is BM. Here, the displacement of the target object 5 as seen from the world coordinate system Σw is (WX, WY, WZ), and the 3 × 3 rotation matrix representing the posture is WM.

ここで、マニピュレータ２０２の先端に取り付けられた把持装置２０３が対象物体５に接触しているときを考える。マニピュレータベース座標系Σｂからマニピュレータ先端座標系Σｍまでの変位を（ＭＸ１，ＭＹ１，ＭＺ１）とし、マニピュレータ先端の姿勢を表す３×３の回転行列をＭＭ１とすると、以下の数式（１）が成り立つ。 Here, consider the case where the gripping device 203 attached to the tip of the manipulator 202 is in contact with the target object 5. Assuming that the displacement from the manipulator base coordinate system Σb to the manipulator tip coordinate system Σm is (MX1, MY1, MZ1) and the 3 × 3 rotation matrix representing the posture of the manipulator tip is MM1, the following mathematical formula (1) is established.

また、撮像装置２により対象物体５が撮像されたとき、マニピュレータベース座標系Σｂからマニピュレータ先端座標系Σｍまでの変位を（ＭＸ２，ＭＹ２，ＭＺ２）とする。マニピュレータ先端の姿勢を表す３×３の回転行列をＭＭ２とすると、以下の数式（２）が成り立つ。 Further, when the target object 5 is imaged by the image pickup apparatus 2, the displacement from the manipulator base coordinate system Σb to the manipulator tip coordinate system Σm is defined as (MX2, MY2, MZ2). Assuming that the 3 × 3 rotation matrix representing the posture of the tip of the manipulator is MM2, the following mathematical formula (2) holds.

上記数式（１）（２）式はワールド座標系Σｗにおける対象物体５の位置姿勢を表しているので、下記の数式（３）式が成り立つ。 Since the above formulas (1) and (2) represent the position and orientation of the target object 5 in the world coordinate system Σw, the following formula (3) holds.

数式（３）より、以下の式が既知のとき、対象物体５を把持した時のマニピュレータ２０３の位置姿勢が求まる。（対象物体５を撮像した時のマニピュレータ２０２の位置姿勢、マニピュレータ先端座標系Σｍと撮像装置座標系Σｃの位置姿勢の関係、撮像装置座標系Σｃと対象物体５の位置姿勢の関係、マニピュレータ先端座標系Σｍと把持装置２０３の位置姿勢の関係。）よって、撮像装置２が対象物体５を撮影した画像から、対象物体５を把持するためのマニピュレータ２０２の位置姿勢を求めることができる。 From the mathematical formula (3), when the following formula is known, the position and orientation of the manipulator 203 when the target object 5 is gripped can be obtained. (The position and orientation of the manipulator 202 when the target object 5 is imaged, the relationship between the position and orientation of the manipulator tip coordinate system Σm and the image pickup device coordinate system Σc, the relationship between the position and orientation of the image pickup device coordinate system Σc and the target object 5, the manipulator tip coordinates. (Relationship between the system Σm and the position / orientation of the gripping device 203.) Therefore, the position / orientation of the manipulator 202 for gripping the target object 5 can be obtained from the image obtained by the imaging device 2 capturing the target object 5.

ここで、各変位、回転行列の求め方の一例を述べる。（ＢＸ，ＢＹ，ＢＺ）、ＢＭは、マニピュレータ２０２の設置時に、設定したワールド座標系Σｗとの位置関係から求められる。（ＭＸ，ＭＹ，ＭＺ）、ＭＭは、マニピュレータ２０２の関節角度情報から順運動学によって求められる。（ＴＸ，ＴＹ，ＴＺ）、ＴＭは、把持装置２０３の寸法から求められる。（ＣＸ，ＣＹ，ＣＺ）、ＣＭは、撮像装置２の寸法から求められる。もしくは、撮像装置２とマニピュレータ２０２の相対的な位置姿勢関係からキャリブレーションにより求めてもよい。例えば、マニピュレータ２０２が異なる複数の位置姿勢を取ったそれぞれの状態において、撮像装置２により、既知の二次元マーカーを撮像することによって求まるマニピュレータ２０２との相対的位置関係を使って求めてもよい。（ＢＸ，ＢＹ，ＢＺ）、ＢＭは、撮像装置２により対象物体５を撮像することにより求まる。なお、ここではワールド座標系Σｗとマニピュレータ２０２のベース座標系Σｂを分けて考えたが、それらを一致させて考えてもよい。 Here, an example of how to obtain each displacement and rotation matrix will be described. (BX, BY, BZ) and BM are obtained from the positional relationship with the world coordinate system Σw set at the time of installing the manipulator 202. (MX, MY, MZ) and MM are obtained by forward kinematics from the joint angle information of the manipulator 202. (TX, TY, TZ) and TM are obtained from the dimensions of the gripping device 203. (CX, CY, CZ) and CM are obtained from the dimensions of the image pickup apparatus 2. Alternatively, it may be obtained by calibration from the relative position-posture relationship between the image pickup apparatus 2 and the manipulator 202. For example, in each state in which the manipulator 202 takes a plurality of different positions and postures, the relative positional relationship with the manipulator 202 obtained by imaging a known two-dimensional marker with the image pickup device 2 may be used. (BX, BY, BZ) and BM can be obtained by imaging the target object 5 with the imaging device 2. Although the world coordinate system Σw and the base coordinate system Σb of the manipulator 202 are considered separately here, they may be considered as being matched.

また、ここでは対象物体５を把持するときの座標系の関係について述べたが、配置するときも同様に考える。Ｓ１４にて配置位置姿勢を決定した場合、対象物体５の対象物体座標系Σｏを配置先に置き換えることで対象物体５を配置するときの座標系も同様に導出できるため、マニピュレータ２０２と把持装置２０３の制御を行うことができる。 Further, although the relationship of the coordinate system when grasping the target object 5 has been described here, the same consideration will be given to the arrangement. When the placement position / orientation is determined in S14, the coordinate system when the target object 5 is placed can be derived in the same manner by replacing the target object coordinate system Σo of the target object 5 with the placement destination. Therefore, the manipulator 202 and the gripping device 203 Can be controlled.

Ｓ１６では、マニピュレータ２０２は、Ｓ１５の制御結果に基づき、ロボット２０の動作、把持、搬送、配置といった対象物体５に対する操作を実行する。ピックアンドプレースを行う場合、マニピュレータ２０２は、まず、把持位置姿勢に移動し、把持装置２０３は対象物体５の把持を行う。次いで、対象物体５を搬送・配置する動作のためのマニピュレータ２０２の制御を行う。搬送の経由点はＳ１４で決定してもよいし、事前に設定しておいてもよいし、撮像装置２によって得られた画像情報に基づき経路計画を行い決定してもよい。そして、Ｓ１４で決定した配置位置姿勢に基づき、マニピュレータ２０２は対象物体５の配置位置姿勢に移動し、対象物体５を配置する。 In S16, the manipulator 202 executes operations on the target object 5 such as the operation, gripping, transporting, and arranging of the robot 20 based on the control result of S15. When performing pick-and-place, the manipulator 202 first moves to the gripping position posture, and the gripping device 203 grips the target object 5. Next, the manipulator 202 for the operation of transporting and arranging the target object 5 is controlled. The waypoint of transportation may be determined in S14, may be set in advance, or may be determined by performing route planning based on the image information obtained by the image pickup apparatus 2. Then, based on the arrangement position / orientation determined in S14, the manipulator 202 moves to the arrangement position / orientation of the target object 5 and arranges the target object 5.

Ｓ１７では、ＣＰＵ２１は、次の対象物体５があるかどうかの判定をする。ＣＰＵ２１は、対象物体５が存在しない場合には処理を終了すると判断して図３に示す処理を終了する。対象物体５が存在する場合には、処理を継続すると判断してＳ１０に戻る。判定方法は、例えば、作業群決定結果に基づき操作すべき対象物体５の有無、かつ、供給領域における対象物体５の有無によって行う。供給領域における対象物体５の有無の判定は、撮像装置２によって対象物体５の供給場所を撮影し、得られた画像情報によって判定する。または、供給領域にセンサを配置しておき、そのセンサ情報から判定してもよい。センサ情報とは、例えば、供給領域に重量センサを配置しておき、供給領域の重さを計測することで秤量によって対象物体５の残りの個数を計測してもよい。または、これら以外の方法で判定してもよい。 In S17, the CPU 21 determines whether or not there is the next target object 5. The CPU 21 determines that the process is terminated when the target object 5 does not exist, and terminates the process shown in FIG. If the target object 5 exists, it is determined that the processing will be continued, and the process returns to S10. The determination method is performed based on, for example, the presence / absence of the target object 5 to be operated based on the work group determination result and the presence / absence of the target object 5 in the supply area. The presence or absence of the target object 5 in the supply region is determined based on the image information obtained by photographing the supply location of the target object 5 with the imaging device 2. Alternatively, a sensor may be arranged in the supply area and the determination may be made from the sensor information. As for the sensor information, for example, a weight sensor may be arranged in the supply area, and the weight of the supply area may be measured to measure the remaining number of the target objects 5 by weighing. Alternatively, the determination may be made by a method other than these.

なお、本実施形態では、情報処理システム１００が指示者の指示に従い対象物体５を操作する。そのため、情報処理システム１００が待機している設定モードと、情報処理システム１００が動作しているときの動作モードが１つのフローチャート内に両方存在する例について述べた。これは、ロボット２０として、近年、広まりつつある安全柵が不要な協働ロボットの利用を想定しており、ロボット２０が動作中に人が作業範囲内に入り動作を指示するために介入することを想定しているためである。しかし、必ずしも１つのフローチャートで実現する必要はなく、設定モードと動作モードを明確に分離し、異なるフローチャートで実行してもよい。 In the present embodiment, the information processing system 100 operates the target object 5 according to the instruction of the instructor. Therefore, an example has been described in which both the setting mode in which the information processing system 100 is on standby and the operation mode in which the information processing system 100 is operating exist in one flowchart. This assumes the use of a collaborative robot that does not require a safety fence, which has become widespread in recent years, as the robot 20, and intervenes in order for a person to enter the work range and instruct the operation while the robot 20 is in operation. This is because it is assumed. However, it is not always necessary to realize it with one flowchart, and the setting mode and the operation mode may be clearly separated and executed with different flowcharts.

＜変形例１＞
作業群を特定する方法として、推定された対象物体の状態から作業群を特定する方法もある。図４（Ａ）は、各状態における対象物体の画像情報と物体状態遷移の例を示す図である。また、状態記憶部１０５は、図４（Ｂ）のように対象物体５の各状態の画像情報とテキスト情報を記憶する。例えば、図４（Ａ）に示すように、状態遷移情報として、状態毎の特定の対象物体を示す画像情報に対応したテキスト情報を記憶する。具体的には、状態Ａは、弁当箱の画像情報と「弁当箱」というテキスト情報を記憶する。状態Ｂは白米の画像情報と「白米」というテキスト情報を記憶する。状態Ｃは梅干しの画像情報と「梅干し」というテキスト情報を記憶する。状態Ｄは漬物の画像情報と「漬物」というテキスト情報を記憶する。状態Ｅは白米弁当の画像情報と「白米弁当」というテキスト情報を記憶する。状態Ｆは日の丸弁当の画像情報と「日の丸弁当」というテキスト情報を記憶する。状態Ｇは漬物弁当の画像情報と「漬物弁当」というテキスト情報を記憶する。ただし、一つの状態において、複数の画像情報とテキスト情報を記憶してもよい。また、状態の情報を記憶する物体の単位は、「白米」や「梅干し」といった固有の物体単位でもよいし、「白米」「赤飯」「酢飯」などをまとめて「ご飯」といったようなクラス単位でもよい。また、図４（Ａ）において、対象物体５の初期状態として、状態Ａ〜Ｄは初期状態、状態Ｅは初期状態から目標状態へ遷移する過程の途中状態、状態Ｆ、Ｇは作業完了後の目標状態である。また、状態記憶部１０５は各状態遷移の可否情報を記憶しておき、図４（Ａ）に示すように、状態Ａ、Ｂから状態Ｅへ、状態Ｃ、Ｅから状態Ｆへ、状態Ｄ、Ｅから状態Ｇへ、それぞれ遷移可能である。図４（Ｃ）には、初期状態または途中状態における対象物体に対して、途中状態または目標状態を達成するために必要な作業を対応づけた作業群を概念的に説明する表である。 <Modification example 1>
As a method of specifying the work group, there is also a method of specifying the work group from the estimated state of the target object. FIG. 4A is a diagram showing an example of image information of the target object and the transition of the object state in each state. Further, the state storage unit 105 stores image information and text information of each state of the target object 5 as shown in FIG. 4 (B). For example, as shown in FIG. 4A, text information corresponding to image information indicating a specific target object for each state is stored as state transition information. Specifically, the state A stores the image information of the lunch box and the text information of "lunch box". The state B stores the image information of white rice and the text information of "white rice". The state C stores the image information of pickled plums and the text information of "pickled plums". The state D stores the image information of the pickles and the text information of "pickles". The state E stores the image information of the white rice lunch box and the text information of "white rice lunch box". The state F stores the image information of Hinomaru bento and the text information of "Hinomaru bento". The state G stores the image information of the pickled lunch and the text information of "pickled lunch". However, a plurality of image information and text information may be stored in one state. In addition, the unit of the object that stores the state information may be a unique object unit such as "white rice" or "pickled plum", or a class such as "rice" that collectively includes "white rice", "red rice", and "vinegared rice". It may be a unit. Further, in FIG. 4A, as the initial states of the target object 5, states A to D are initial states, states E are intermediate states in the process of transitioning from the initial state to the target state, and states F and G are after the work is completed. It is a target state. Further, the state storage unit 105 stores information on whether or not each state transition is possible, and as shown in FIG. 4A, states A, B to state E, states C, E to state F, and states D, Each transition is possible from E to state G. FIG. 4C is a table conceptually explaining a work group in which the work required to achieve the intermediate state or the target state is associated with the target object in the initial state or the intermediate state.

図１８を用いて、推定部１０４が対象物体の状態を推定する処理について詳細に説明する。推定部１０４は、状態記憶部１０５から状態遷移情報を取得する（Ｓ１９１）。指示者から、「次の弁当は梅干しだから、セットして」といったような不完全な指示があった場合を考える。このとき、推定部１０４は、指示入力部１０１から認識された指示を取得する（Ｓ１９２）。推定部１０４は、対象物体を用いた目標状態に関する状態遷移情報に基づいて、ロボットの操作に関する音声から推定された対象物体を推定する（Ｓ１９３）。ここの例では、指示情報に含まれる「梅干し」というテキスト情報を抽出し、状態遷移情報と比較して「初期状態の梅干し」が供給されることを推定する。そのため、推定部１０４では、音声情報に基づき、物体状態遷移の中から、対象物体が「梅干し」で、その状態が「初期状態」であることを特定する。 The process of estimating the state of the target object by the estimation unit 104 will be described in detail with reference to FIG. The estimation unit 104 acquires state transition information from the state storage unit 105 (S191). Consider the case where the instructor gives an incomplete instruction such as "The next lunch box is pickled plums, so set it." At this time, the estimation unit 104 acquires the instruction recognized from the instruction input unit 101 (S192). The estimation unit 104 estimates the target object estimated from the voice related to the operation of the robot based on the state transition information regarding the target state using the target object (S193). In this example, the text information "Umeboshi" included in the instruction information is extracted, and it is estimated that "Umeboshi in the initial state" is supplied by comparing with the state transition information. Therefore, the estimation unit 104 identifies that the target object is "umeboshi" and that state is the "initial state" from the object state transitions based on the voice information.

推定部１０４は、指示情報から特定された所定の対象物体を用いた達成可能な目標状態があるか否かを、対象物体を用いた目標状態の候補に関する情報（状態遷移情報）に基づいて、判断する（Ｓ１９４）。図５は、音声情報（指示情報）に基づき、対象物体とその状態とを特定した状態から、目標状態の候補を実線で、特定されていない各状態の対象物体を点線で示した図である。図５に示すように、音声情報だけで、一部の対象物体（具体的には梅干し）が初期状態であることを推定できる。しかし、他の対象物体の初期状態（その対象物体の有無）や、ユーザの意図する状態がどの途中状態、目標状態かを決定できない場合がある。この段階では、音声情報により、梅干しが初期状態として推定されるが、梅干しだけで構成される途中状態、目標状態が存在しないので、途中状態、目標状態が確定できない。すなわち、ある目標状態を達成するためには、途中状態の白米弁当、もしくは初期状態の弁当箱かつ初期状態の白米、が不足していることを認識し、目標状態を達成できないと判断する。もし、梅干しだけで達成可能な目標状態あるいは途中状態がある場合は、Ｓ１９７に進む。詳しい処理は後述する。 The estimation unit 104 determines whether or not there is an achievable target state using the predetermined target object specified from the instruction information, based on the information (state transition information) regarding the candidate of the target state using the target object. Judgment (S194). FIG. 5 is a diagram showing candidates for a target state with a solid line and a target object in each unspecified state with a dotted line from a state in which the target object and its state are specified based on voice information (instruction information). .. As shown in FIG. 5, it can be estimated that a part of the target object (specifically, pickled plum) is in the initial state only by the voice information. However, it may not be possible to determine the initial state of another target object (presence or absence of the target object) or which intermediate state or target state the user intends. At this stage, umeboshi is estimated as the initial state from the voice information, but since there is no intermediate state or target state composed only of umeboshi, the intermediate state or target state cannot be determined. That is, in order to achieve a certain target state, it is recognized that there is a shortage of white rice lunch in the middle state, or a lunch box in the initial state and white rice in the initial state, and it is determined that the target state cannot be achieved. If there is a target state or an intermediate state that can be achieved only by pickling plums, the process proceeds to S197. Detailed processing will be described later.

推定部１０４は、指示情報のみでは動作の対象物体とその物体の状態を推定できないと判断した場合（Ｓ１９４のＮｏ）、第１の対象物体を用いて達成可能な目標状態の候補を推定する（Ｓ１９５）。物体認識の方法は、例えば深層学習を用いて意味的領域分割（セマンティックセグメンテーション；Semantic Segmentation）によって行う。意味的領域分割で分類されるクラスは、状態記憶部１０５で記憶している物体（あるい／およびはクラス）単位と対応する。意味的領域分割におけるクラス分類は、領域ごとに各クラスに分類される確率を求め、その中から最も高い確率が求まったクラスへ分類する。確率を求める際は例えばＳｏｆｔｍａｘ関数を用いて、領域ごとに各クラスに分類される確率が正の値かつ総和が１となる処理を行う。ここで、必ずしも最も高い確率のクラスへ分類せずに、しきい値を設定することによって、どのクラスにも分類できないという結果を出力してもよい。例えば、Ｓｏｆｔｍａｘ関数を用いた場合、求まる確率は０〜１の範囲に収まるため、最大値は１となる。このとき、しきい値を０．８に設定した場合、最も高い確率が０．８以上で求まったクラスがあればそのクラスへ分類するが、０．８未満であった場合は最大値であってもそのクラスである可能性が低いとして、どのクラスにも分類できないとする。クラスへ分類できれば、クラスに対応させている状態記憶部１０５で記憶している物体（あるい／およびはクラス）単位を選択するが、クラスに分類できない場合は物体（あるい／およびはクラス）単位を選択できないとして、処理を継続する。図６は、取得した画像情報の例である。例えば図６が入力画像の場合、状態遷移図において記憶している情報のうち、「弁当箱」「梅干し」「白米」が画像内にあると判定する。物体認識の結果に基づき、初期状態として、図５に示した「梅干し」に加えて、図７に示すように初期状態の「弁当箱」「白米」も含まれると判定できるため、これら状態Ａ、Ｂ、Ｃから構成される状態Ｆを目標状態として決定する。また、状態Ｆに至るために状態Ｅを途中状態の候補として推定する。このようにして、目標状態を推定し、作業特定手段１０６に送る。なお、図７において、破線で示した状態Ｅ、Ｆが、得られた音声情報と画像情報に基づき推定した候補の目標状態である。 When the estimation unit 104 determines that the target object of motion and the state of the object cannot be estimated only by the instruction information (No in S194), the estimation unit 104 estimates a candidate for an achievable target state using the first target object (No). S195). The method of object recognition is performed by semantic segmentation (semantic segmentation) using, for example, deep learning. The class classified by the semantic domain division corresponds to the object (or / or class) unit stored in the state storage unit 105. In the classification in the semantic domain division, the probability of being classified into each class for each domain is obtained, and the class having the highest probability is classified from among them. When calculating the probability, for example, the Softmax function is used to perform processing in which the probability of being classified into each class for each region is a positive value and the sum is 1. Here, the result that it cannot be classified into any class may be output by setting the threshold value without necessarily classifying into the class with the highest probability. For example, when the Softmax function is used, the probability of finding it falls within the range of 0 to 1, so the maximum value is 1. At this time, when the threshold value is set to 0.8, if there is a class obtained with the highest probability of 0.8 or more, it is classified into that class, but if it is less than 0.8, it is the maximum value. However, it is unlikely that it is in that class, and it cannot be classified into any class. If it can be classified into a class, the unit of the object (or / or class) stored in the state storage unit 105 corresponding to the class is selected, but if it cannot be classified into the class, the object (or / or class) Assuming that the unit cannot be selected, processing continues. FIG. 6 is an example of the acquired image information. For example, when FIG. 6 is an input image, it is determined that "lunch box", "pickled plum", and "white rice" are included in the image among the information stored in the state transition diagram. Based on the result of object recognition, it can be determined that the initial state includes "lunch box" and "polished rice" in the initial state as shown in FIG. 7 in addition to the "pickled plum" shown in FIG. , B, and C are determined as the target state. Further, the state E is estimated as a candidate for the intermediate state in order to reach the state F. In this way, the target state is estimated and sent to the work identification means 106. In FIG. 7, the states E and F shown by the broken lines are the target states of the candidates estimated based on the obtained audio information and image information.

推定部１０４の推定方法の一例として、音声情報だけでは不足があり、画像情報を用いて補完を行う場合を説明したが、音声情報に不足がない場合は、音声情報のみで目標状態を選択してもよい。すなわち、指示情報のみで初期状態の対象物体を用いた途中状態または目標状態を１つに特定できた場合（Ｓ１９４のＹｅｓ）は、Ｓ１９５に進み、目標状態を推定する。Ｓ１９６では、作業群特定部１０６が、推定された目標状態と指示された作業に基づいて、ロボット２０の置かれている状況に応じた、指示された作業を含む作業群を特定する。状態遷移情報には、目標状態と作業群が予め紐づけられているものとする。例えば、Ｓ１９５において、途中状態Ｅ「白米弁当」を推定した場合、状態Ａ「弁当箱」と状態Ｂ「白米」から遷移する。このとき、対象物体５同士の相対的位置関係、すなわち状態Ａ「弁当箱」に対する状態Ｂ「白米」の位置関係は既知であることを利用し、「白米」を「弁当箱」に入れるための動作を決定する。具体的には、状態記憶部１０５が記憶している作業パターンから、把持、搬送、配置の作業パターンを用いて、ピックアンドプレースを行う。ここで、作業パターンの記憶について、それぞれパラメータとして変化する部分と変化しない部分を予め決めておき、情報処理システム１００による作業実行時には、パラメータとして変化する部分は認識結果に基づき決定する。例えば、対象物体５を把持するとき、把持位置姿勢は認識結果に基づきパラメータとして毎回変化するが、把持位置に近づくときの速度は作業パターンや、把持位置移動前に経由する位置は把持位置に対する相対的位置関係は固定値とする。ただし、このような遷移するときの情報について、状態記憶部１０５が遷移情報を記憶している場合は、その情報に基づいて変化させてもいい。例えば、状態記憶部１０５が、状態Ａ「弁当箱」に対して、状態Ｂ「白米」を入れるときの状態遷移について、状態Ｂ「白米」の搬送速度の情報も記憶しているような場合である。この場合、作業群特定部１０６は、作業記憶部１０３に記憶されている作業パターンの速度を、状態記憶部１０５の情報に基づき変更してもよい。作業群特定部１０６は、このように対象物体５をある状態から別の状態へ遷移させるための作業パターンを、各状態遷移に対して決定する。その際、状態遷移が成立するように、状態遷移の順序も合わせて決定する。例えば、目標状態として、状態Ｆへは状態Ｅから遷移するため、状態Ｆに遷移させるための動作は状態Ｅに遷移させる動作の後に行われるように作業群を決定する。 As an example of the estimation method of the estimation unit 104, the case where the voice information alone is insufficient and the image information is used for complementation has been described. However, when the voice information is not insufficient, the target state is selected only by the voice information. You may. That is, when the intermediate state or the target state using the target object in the initial state can be specified as one (Yes in S194), the process proceeds to S195 and the target state is estimated. In S196, the work group identification unit 106 identifies a work group including the instructed work according to the situation in which the robot 20 is placed, based on the estimated target state and the instructed work. It is assumed that the target state and the work group are linked in advance in the state transition information. For example, in S195, when the intermediate state E "white rice lunch" is estimated, the transition from the state A "lunch box" and the state B "white rice" occurs. At this time, the relative positional relationship between the target objects 5, that is, the positional relationship of the state B "white rice" with respect to the state A "lunch box" is known, so that the "white rice" can be put into the "lunch box". Determine the behavior. Specifically, pick and place is performed from the work pattern stored in the state storage unit 105 by using the work pattern of gripping, transporting, and arranging. Here, with respect to the memory of the work pattern, a portion that changes as a parameter and a portion that does not change are determined in advance, and when the work is executed by the information processing system 100, the portion that changes as a parameter is determined based on the recognition result. For example, when gripping the target object 5, the gripping position posture changes as a parameter each time based on the recognition result, but the speed when approaching the gripping position is the work pattern, and the position passed before moving the gripping position is relative to the gripping position. The positional relationship is a fixed value. However, when the state storage unit 105 stores the transition information, the information at the time of such a transition may be changed based on the information. For example, when the state storage unit 105 also stores information on the transport speed of the state B "white rice" for the state transition when the state B "white rice" is put into the state A "lunch box". is there. In this case, the work group identification unit 106 may change the speed of the work pattern stored in the work storage unit 103 based on the information in the state storage unit 105. The work group identification unit 106 determines a work pattern for transitioning the target object 5 from one state to another in this way for each state transition. At that time, the order of the state transitions is also determined so that the state transitions are established. For example, as a target state, since the state F is transitioned from the state E, the work group is determined so that the operation for transitioning to the state F is performed after the operation for transitioning to the state E.

ここで、音声情報に不足がない場合とは、指示者による作業指示の内容が「弁当箱と、白米と、梅干しが供給される。まず、弁当箱に白米を乗せて。次に、弁当箱の中の白米の上に梅干しを乗せて。これで作業完了」といったような、各状態をすべて正確に指示するような場合である。この場合、音声情報のみで図７に示したような初期状態Ａ、Ｂ、Ｃおよび途中状態Ｅ、目標状態Ｆを目標状態として選択できる。ただし、音声情報で選択した結果に対して、画像情報も考慮することで、選択に誤りがないかを確認してもよい。具体的には、図６に示すような画像情報を取得した場合、画像に対して物体認識を行うことで「白米」「弁当箱」「梅干し」があることが分かる。なので、音声情報による選択結果と矛盾がないことを確認し、情報処理システム１００が実行する作業の正確さを増してもよい。 Here, when there is no shortage of voice information, the content of the work instruction by the instructor is "Lunch box, white rice, and pickled plums are supplied. First, put the white rice on the lunch box. Next, the lunch box. This is a case where all the states are accurately indicated, such as "Put the pickled plums on the white rice in the box. This completes the work." In this case, the initial states A, B, C, the intermediate state E, and the target state F as shown in FIG. 7 can be selected as the target state using only the voice information. However, it may be confirmed that there is no error in the selection by considering the image information with respect to the result selected by the audio information. Specifically, when the image information as shown in FIG. 6 is acquired, it can be seen that there are "white rice", "lunch box", and "pickled plum" by performing object recognition on the image. Therefore, it may be confirmed that there is no contradiction with the selection result based on the voice information, and the accuracy of the work executed by the information processing system 100 may be increased.

また、仕向け変更や機種変更などにより、対象物体５の一部のみを変更する場合は、変更前に推定した目標状態の候補に対して、変更後の物体に置き換えることで目標状態を決定してもよい。例えば、製造する弁当において、白米の上に乗せていた梅干しを、漬物に変更するとする（図４の状態Ｆに示すような日の丸弁当から、状態Ｇに示す漬物弁当の製造に変更する）。この変更のため、指示者は「梅干しから漬物に変更して」といったような指示を行ったとする。この場合、図７において、目標状態の候補として選択されていた「梅干し」である状態Ｃから、「漬物」である状態Ｄへ置き換えて、目標状態の候補を選択し直す。そのため、状態記憶部１０５を参照し、初期状態Ａ、Ｂ、Ｄから求まる目標状態は状態Ｇの「漬物弁当」、また、途中状態は変更なく状態Ｅの「白米弁当」、として目標状態を選択する。 In addition, when changing only a part of the target object 5 due to a change of destination or a model change, the target state is determined by replacing the candidate of the target state estimated before the change with the changed object. May be good. For example, in the bento box to be manufactured, the pickled plums placed on the white rice are changed to pickles (the Hinomaru bento as shown in the state F in FIG. 4 is changed to the pickled lunch box shown in the state G). Because of this change, it is assumed that the instructor gave an instruction such as "change from pickled plums to pickles." In this case, in FIG. 7, the "pickled plum" state C, which was selected as the target state candidate, is replaced with the "pickled" state D, and the target state candidate is reselected. Therefore, referring to the state storage unit 105, the target state is selected as the "pickled lunch" in the state G as the target state obtained from the initial states A, B, and D, and the "white rice lunch" in the state E without changing the intermediate state. To do.

また、対象物体５そのものが状態記憶部１０５に記憶されていなくても、対象物体５と類似した対象物体であることを示すクラス情報に基づき目標状態を選択してもよい。例えば、弁当に含まれる白米を赤飯に変更したい場合があるとする。このとき、状態記憶部１０５に、物体として「白米」は登録されているが、「赤飯」は登録されていないとする。ただし、状態記憶部において、「白米」という物体単位だけでなく「ご飯」というクラス単位で記憶しておくことで、登録されていない「赤飯」へ対応する。クラス単位で記憶する場合、例えば「ご飯」のクラスには、「白米」「酢飯」「麦飯」といったご飯（米を使った食品）を記憶する。そのため、情報処理システム１００が作業を実行するとき、状態記憶部１０５に「赤飯」が記憶されていなくても、推定部１０４における物体認識により「ご飯」のクラスに分類されれば、「赤飯」を「白米」と同様に扱ってもよい。このとき、物体認識においては、物体単位（白米、酢飯、麦飯、赤飯など）でなく、クラス単位（ご飯）で認識を行う。 Further, even if the target object 5 itself is not stored in the state storage unit 105, the target state may be selected based on the class information indicating that the target object 5 is similar to the target object 5. For example, suppose you want to change the white rice contained in a lunch box to red rice. At this time, it is assumed that "white rice" is registered as an object in the state storage unit 105, but "red rice" is not registered. However, in the state storage unit, by storing not only the object unit of "white rice" but also the class unit of "rice", it corresponds to the unregistered "red rice". When memorizing in class units, for example, in the "rice" class, rice (food using rice) such as "white rice", "vinegared rice", and "wheat rice" is memorized. Therefore, when the information processing system 100 executes the work, even if "red rice" is not stored in the state storage unit 105, if it is classified into the "rice" class by the object recognition in the estimation unit 104, it is "red rice". May be treated in the same way as "white rice". At this time, in the object recognition, the recognition is performed not in the object unit (white rice, vinegared rice, barley rice, red rice, etc.) but in the class unit (rice).

また、音声情報が不足する場合について、指示者による発言の内容が不足している場合について説明したが、指示者が不足なく発言していても、情報処理システム１００がその内容を聞き取れなかった場合も音声情報に不足があると判断してよい。例えば、指示者の音声が小さかったり、雑音の影響があったりする場合、指示者が不足なく指示していても、音声情報に不足があると判断してもよい。 Further, regarding the case where the voice information is insufficient, the case where the content of the statement by the instructor is insufficient has been described, but the case where the information processing system 100 cannot hear the content even if the instructor is speaking without lack. It may be judged that there is a lack of voice information. For example, when the voice of the instructor is low or affected by noise, it may be determined that the voice information is insufficient even if the instructor gives instructions without any shortage.

また、指示者からの指示内容が、初期状態の情報を含まず、目標状態の情報のみであった場合、目標状態から逆算して初期状態の目標状態を選択してもよい。例えば、指示者が「日の丸弁当を作って」といったような指示をした場合、図４の状態Ｆの日の丸弁当が目標状態として決定されるので、状態Ｆに至るまでの状態として、状態Ａ、Ｂ、Ｃが初期状態、状態Ｅが途中状態として目標状態の候補として推定される。 Further, when the instruction content from the instructor does not include the information in the initial state but only the information in the target state, the target state in the initial state may be selected by calculating back from the target state. For example, when the instructor gives an instruction such as "make a Hinomaru bento", the Hinomaru bento in the state F of FIG. 4 is determined as the target state, so that the states A and B are the states up to the state F. , C is the initial state, and the state E is the intermediate state, and is estimated as a candidate for the target state.

また、推定部１０４において、初期状態から目標状態に至る途中状態は、必ずしも一つではなく、複数存在してもよい。複数存在する場合は、ユーザによって入力された指示によってさらに作業群を特定してもよい。 Further, in the estimation unit 104, the intermediate state from the initial state to the target state is not necessarily one, and there may be a plurality of states. When there are a plurality of work groups, the work group may be further specified by the instruction input by the user.

なお、作業群特定部１０６において、対象物体についての補助情報を十分に集めても達成可能な作業群がない場合は、エラー通知を出力するようにしてもよい。例えば、音声情報から、梅干しを操作することがわかっても、梅干しがロボットの操作可能な領域にない場合はエラー通知する。また、不足物体を推定して、その不足物体が補充されたか、操作可能な領域に存在するか否かをロボットが認識するようにしてもよい。例えば、推定部１０４は、目標状態の候補と、第１の対象物体との差分から、不足している対象物体（第２の対象物体）を推定する。例えば、日の丸弁当が目標状態の候補である場合は、弁当箱Ａと白米Ｂとが第２の対象物体として特定される。次に、ユーザが意図した目標状態はロボットによって操作可能な範囲にあるという前提を基に、目標状態から不足している第２の物体をロボットが入手可能であるか否かを判断する。すなわち、推定部１０４は、所定の情報に基づいて、第２の対象物体が入手可能か否かを判断する。第２の対象物体が入手可能であれば、第１の対象物体と入手可能な第２の対象物体とを用いた目標状態は、ユーザが意図する目標状態であると判断できる。ここまでで特定された第２の対象物体が入手不可能である場合は、ロボットは何を作ればいいか、何の動作をすればいいか特定できないので、エラー通知を出力し、ユーザにさらなる情報を要求するなどする。ここでは、補助情報として、さらに視覚情報取得部１０２から対象物体５の画像情報を取得する。具体的には、取得した画像情報を画像に含まれる対象物体の状態を出力する学習済みモデルに入力することで、認識結果を取得する。この認識結果に特定した第２の対象物体が含まれていれば、その認識結果に含まれる対象物体を用いた目標状態を作ることが出来る。 If there is no work group that can be achieved even if the work group identification unit 106 sufficiently collects auxiliary information about the target object, an error notification may be output. For example, even if it is known from the voice information that the pickled plums are to be operated, if the pickled plums are not in the operable area of the robot, an error notification is given. Further, the missing object may be estimated so that the robot recognizes whether the missing object is replenished or exists in the operable area. For example, the estimation unit 104 estimates the missing target object (second target object) from the difference between the target state candidate and the first target object. For example, when Hinomaru bento is a candidate for the target state, the lunch box A and the white rice B are specified as the second target objects. Next, based on the premise that the target state intended by the user is within the range that can be operated by the robot, it is determined whether or not the robot can obtain a second object that is missing from the target state. That is, the estimation unit 104 determines whether or not the second target object is available based on the predetermined information. If the second target object is available, it can be determined that the target state using the first target object and the available second target object is the target state intended by the user. If the second target object identified so far is not available, the robot cannot specify what to make or what to do, so it outputs an error notification and further informs the user. Request information, etc. Here, as auxiliary information, the image information of the target object 5 is further acquired from the visual information acquisition unit 102. Specifically, the recognition result is acquired by inputting the acquired image information into a trained model that outputs the state of the target object included in the image. If the recognition result includes the specified second target object, it is possible to create a target state using the target object included in the recognition result.

以上のように、本実施形態では、まず、指示入力部１０１により得た音声情報と、視覚情報取得部１０２により得た画像情報に基づき、作業記憶部１０３から初期状態と目標状態と、それらをつなぐ途中状態の候補を選択する。次いで、選択した目標状態間を遷移可能な動作を状態記憶部１０５から選択し、作業群を決定する。そして、作業群決定結果と画像情報に基づき、作業を実行するためのロボット２０の位置姿勢制御情報を決定し、ロボット２０により作業を実行する。 As described above, in the present embodiment, first, based on the voice information obtained by the instruction input unit 101 and the image information obtained by the visual information acquisition unit 102, the initial state and the target state from the working storage unit 103 and them are stored. Select a candidate in the middle of connecting. Next, an operation capable of transitioning between the selected target states is selected from the state storage unit 105, and a work group is determined. Then, based on the work group determination result and the image information, the position / attitude control information of the robot 20 for executing the work is determined, and the work is executed by the robot 20.

こうすることで、ユーザが情報処理システム１００へ作業を指示するとき、情報処理システム１００が作業を実行するための情報に不足がある場合でも、指示者の意図する作業を実行できる。そのため、ロボットの操作に慣れていないユーザでも容易にロボットを使って作業を実行できる。 By doing so, when the user instructs the information processing system 100 to perform the work, the work intended by the instructor can be executed even if the information processing system 100 lacks information for executing the work. Therefore, even a user who is not accustomed to operating the robot can easily execute the work using the robot.

＜第二の実施形態＞
次に、本発明の第二の実施形態について説明する。第二の実施形態では、上述した第一の実施形態に対して、さらに、指示内容に不足がある場合、情報処理システム１００が作業群を決定する際に補完した情報が適切か出力装置３により指示者に確認を取るよう構成したものである。具体的には、上述した第一実施形態と同様に、作業群を特定する。そして、出力装置３により、情報処理システム１００が特定した作業群を、指示者が画像あるいは音声情報によって確認できるように構成したものである。なお、指示者が確認する際、音声情報に基づき決定したのか、画像情報に基づき決定したのか、情報処理システム１００が情報を補完することで決定したのかを区別ができるようにする。こうすることで、指示者は情報処理システム１００が作業を開始する前に、どの情報に基づき、どのような作業群を特定したのかを確認できる。そのため、指示者の意図と異なる作業が決定されている場合は、情報処理システム１００が作業を開始する前に作業内容を修正できるので、作業のやり直しが減り、作業の効率化が図れる。 <Second embodiment>
Next, a second embodiment of the present invention will be described. In the second embodiment, if the instruction content is insufficient with respect to the first embodiment described above, is the information complemented when the information processing system 100 determines the work group appropriate by the output device 3? It is configured to ask the instructor for confirmation. Specifically, the work group is specified as in the first embodiment described above. Then, the output device 3 is configured so that the instructor can confirm the work group specified by the information processing system 100 by image or voice information. When the instructor confirms, it is possible to distinguish whether the decision is made based on the voice information, the image information, or the information processing system 100 by supplementing the information. By doing so, the instructor can confirm what kind of work group is specified based on what kind of information before the information processing system 100 starts the work. Therefore, when a work different from the intention of the instructor is determined, the work content can be corrected before the information processing system 100 starts the work, so that the work can be redone and the work efficiency can be improved.

図９は、本実施形態における情報処理装置１０’を備える情報処理システム１００’の構成例を示す図である。図９は機器構成の一例であり、本発明の適用範囲を限定するものではない。 FIG. 9 is a diagram showing a configuration example of an information processing system 100'provided with the information processing device 10'in the present embodiment. FIG. 9 is an example of a device configuration, and does not limit the scope of application of the present invention.

出力装置３は、作業群決定結果を情報処理装置１０’が可視化して画像として表示する場合、例えばディスプレイで構成される。出力装置３は、可視化した作業群を画像として表示できればディスプレイ以外の装置でもよく、例えばプロジェクタにより構成され、プロジェクションすることで表示してもよい。また、ＨＭＤ（ＨｅａｄＭｏｕｎｔｅｄＤｉｓｐｌａｙ）やＡＲ（ＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ）グラスのようなデバイスにより構成され、指示者がデバイスを装着することで表示してもよい。また、出力装置３は、特定された作業群を情報処理装置１０’が可聴化して音声として出力する場合、例えばスピーカーで構成される。スピーカーには指向性スピーカーを用いることで、特定の方向に対してのみ音声を出力してもよい。また、これらのデバイスを複数組み合わせることで表示あるいは／および音声出力してもよい。 The output device 3 is composed of, for example, a display when the information processing device 10'visualizes the work group determination result and displays it as an image. The output device 3 may be a device other than the display as long as the visualized work group can be displayed as an image. For example, the output device 3 may be configured by a projector and displayed by projection. Further, it may be composed of a device such as an HMD (Head Mounted Display) or an AR (Augmented Reality) glass, and may be displayed by the instructor wearing the device. Further, the output device 3 is composed of, for example, a speaker when the information processing device 10'makes the specified work group audible and outputs it as voice. By using a directional speaker as the speaker, sound may be output only in a specific direction. Further, display / / and audio output may be performed by combining a plurality of these devices.

作業群特定部１０６’は、複数パターンの作業群から、指示情報が示す作業が含まれるものを、ロボットが実行するべき作業群の候補として特定する。これによって情報処理システム１００’の作業群を特定する。作業群特定部１０６’は、特定した作業群を生成部１０７と、出力部１０８に送る。 The work group identification unit 106'identifies from a plurality of patterns of work groups, those including the work indicated by the instruction information as candidates for the work group to be executed by the robot. This identifies the work group of the information processing system 100'. The work group identification unit 106'sends the specified work group to the generation unit 107 and the output unit 108.

出力部１０８は、作業群特定部１０６’で特定された作業群を出力装置３で表示あるいは音声で出力することにより、指示者が確認できるようにする。表示する場合、作業群を画像情報（これを作業群情報と呼ぶ）に変換することによって可視化する。可視化した作業群情報は、出力装置３で表示される。また、音声として出力する場合、作業群を音声として自然言語に変換した作業群情報によって可聴化する。可聴化した作業群情報は、出力装置３で音声として出力される。 The output unit 108 displays the work group specified by the work group identification unit 106'on the output device 3 or outputs the work group by voice so that the instructor can confirm it. When displaying, the work group is visualized by converting it into image information (this is called work group information). The visualized work group information is displayed on the output device 3. In addition, when outputting as voice, the work group is made audible by the work group information converted into natural language as voice. The audible work group information is output as voice by the output device 3.

図１０は、本実施形態における情報処理システム１００’および情報処理装置１０’が実行する作業手順を示すフローチャートである。第一の実施形態に示した図３と処理内容が異なる部分について、それぞれ説明する。 FIG. 10 is a flowchart showing a work procedure executed by the information processing system 100'and the information processing device 10'in the present embodiment. The parts whose processing contents are different from those shown in FIG. 3 shown in the first embodiment will be described.

Ｓ８’では、作業群特定部１０６’が、複数パターンの作業群から、指示情報が示す作業が含まれるものを、ロボットが実行するべき作業群の候補として特定する。作業群特定部１０６’は、特定した作業群を、生成部１０７と、出力部１０８に送る。 In S8', the work group identification unit 106'identifies a plurality of patterns of work groups including the work indicated by the instruction information as candidates for the work group to be executed by the robot. The work group identification unit 106'sends the specified work group to the generation unit 107 and the output unit 108.

Ｓ１８では、出力部１０８が、作業群特定部１０６’から受け取った作業群を可視化あるいは／および可聴化する作業群情報を出力する。図１１は、出力部１０８が可視化した作業群の例である。出力部１０８は、作業群特定部１０６’が決定した作業群の初期状態、途中状態、目標状態をそれぞれ、指示者が確認できるように表示する。表示する際、各状態を、指示者からの指示による音声情報で決定したのか、画像情報により決定したのか、情報処理システム１００’が情報を補完することにより決定したのかを区別できるようにする。 In S18, the output unit 108 outputs the work group information for visualizing and / and audible the work group received from the work group identification unit 106'. FIG. 11 is an example of a work group visualized by the output unit 108. The output unit 108 displays the initial state, the intermediate state, and the target state of the work group determined by the work group identification unit 106'so that the instructor can confirm each. At the time of display, it is possible to distinguish whether each state is determined by the voice information instructed by the instructor, the image information, or the information processing system 100'by supplementing the information.

例えば、図１１は、初期状態である「梅干し」は指示者からの指示に基づき決定した。一方で、「弁当箱」と「白米」は画像情報により決定し、途中状態である「白米弁当」と、目標状態である「日の丸弁当」は、初期状態に基づき情報処理システム１００’が情報を補完することで決定した状態である。このとき、「梅干し」は音声情報により決定したことが分かるように「音」という表示を、「弁当箱」と「白米」は画像情報により決定したことが分かるように「画」という表示を行う。また、これら初期状態は音声情報あるいは画像情報の実際の情報に基づき決定したため実線の枠で表示する。一方、「白米弁当」「日の丸弁当」は情報処理システム１００’が情報を補完することで決定したため破線の枠で表示するといったように、実際の情報に基づき決定した状態と、情報を補完することで決定した状態を区別して表示する。情報処理システム１００’が補完した情報を表示するとき、指示者の指示内容に対して、システムが最も確率が高いと判断した状態のみを表示してもよいし、システムが判断した確率とともに複数の状態を表示してもよい。確率は、例えば深層学習により画像情報を用いて判断する場合、物体認識を行った結果の尤度の値を用いる。ただし、これ以外の方法で確率を求めてもよいし、異なる方法で表示してもよい。 For example, in FIG. 11, the initial state of "umeboshi" was determined based on the instruction from the instructor. On the other hand, the "lunch box" and "white rice" are determined by the image information, and the information information system 100'is based on the initial state of the "white rice lunch" which is in the middle state and the "Hinomaru bento" which is the target state. It is a state determined by complementing. At this time, "Umeboshi" is displayed as "sound" so that it can be understood that it is determined by voice information, and "lunch box" and "white rice" are displayed as "picture" so that it can be understood that it is determined by image information. .. Moreover, since these initial states are determined based on the actual information of the audio information or the image information, they are displayed in a solid line frame. On the other hand, "white rice bento" and "Hinomaru bento" are determined based on the actual information, such as displaying in the frame of the broken line because the information processing system 100'determined by complementing the information, and complementing the information. The states determined in are displayed separately. When the information processing system 100'displays the complemented information, only the state determined by the system to have the highest probability may be displayed with respect to the instruction content of the instructor, or a plurality of states may be displayed together with the probability determined by the system. The status may be displayed. For the probability, for example, when determining using image information by deep learning, the value of the likelihood of the result of object recognition is used. However, the probability may be obtained by a method other than this, or may be displayed by a different method.

また、図１１に表示している状態は作業群特定部１０６’によって特定された作業群に紐づけられた初期状態、途中状態、目標状態のいずれかの状態であるが、これら以外の決定されていない状態を表示しても構わない。例えば、作業群特定部１０６’で、作業群の候補として特定された作業群を表示してもよいし、作業記憶部１０３に記憶されている任意の作業群を表示してもよい。表示するかどうかは指示者からの音声指示の内容に基づき変更してもいいし、表示範囲を予め設定しておいてもいいし、その他の方法で設定してもよい。 Further, the state displayed in FIG. 11 is any of the initial state, the intermediate state, and the target state associated with the work group specified by the work group identification unit 106', but other states are determined. You may display the status that is not displayed. For example, the work group identification unit 106'may display the work group specified as a candidate for the work group, or any work group stored in the work storage unit 103 may be displayed. Whether or not to display may be changed based on the content of the voice instruction from the instructor, the display range may be set in advance, or may be set by another method.

また、作業群特定部１０６’において、作業群を特定できていない場合は、特定できていない旨を表示してもよい。その場合、特定できている部分までを表示することで、情報処理システム１００’が指示者の意図する作業群をどの程度決定できているかを可視化する。 Further, when the work group cannot be specified in the work group specifying unit 106', it may be displayed that the work group cannot be specified. In that case, by displaying up to the specified part, it is visualized to what extent the information processing system 100'can determine the work group intended by the instructor.

また、各状態の遷移方法を表示してもよい。例えば、「梅干し」を把持して、「白米弁当」の上まで搬送し、配置するといった遷移方法や、その際のロボット２０の動作を表示してもよい。 Moreover, you may display the transition method of each state. For example, a transition method such as grasping the "umeboshi", transporting it to the top of the "polished rice lunch box", and arranging it, or the operation of the robot 20 at that time may be displayed.

また、出力部１０８が、音声で確認できるように可聴化する場合、作業群特定部１０６‘の決定結果を自然言語に変換する。自然言語に変換する際は、例えば、状態記憶部１０５が記憶しているテキスト情報に基づき、状態についての情報を得て、可視化して表示する場合と同様に、どのような情報に基づきその状態が決定されたかを音声として出力する。 Further, when the output unit 108 makes the output unit 108 audible so that it can be confirmed by voice, the determination result of the work group identification unit 106'is converted into natural language. When converting to natural language, for example, based on the text information stored in the state storage unit 105, information about the state is obtained, visualized and displayed, and the state is based on what kind of information. Is output as a voice.

Ｓ１９では、出力部１０８が可視化あるいは／および可聴化した作業群を出力装置３で表示したり音声出力したりすることで、指示者が作業群を確認できるようにする。可視化の場合、表示方法は、動画で表示してもよいし、テキスト情報で表示してもよいし、表示装置がＨＭＤやＡＲグラスのようなＡＲ表示が可能なデバイスの場合は実物に重畳表示してもよい。遷移方法の表示は、状態遷移ごとに表示してもよいし、作業群の一連の流れを表示してもよいし、一部の遷移のみ表示してもよい。可聴化する場合は、スピーカーにより、自然言語に変換された作業群を音声として出力する。 In S19, the output unit 108 displays the visualized / and audible work group on the output device 3 and outputs the voice so that the instructor can confirm the work group. In the case of visualization, the display method may be a moving image or text information, and if the display device is a device capable of AR display such as an HMD or AR glass, the display may be superimposed on the actual object. You may. The transition method may be displayed for each state transition, a series of work group flows may be displayed, or only a part of the transitions may be displayed. When making it audible, the work group converted into natural language is output as voice by the speaker.

また、作業状態は、作業前のみ確認できるようにしてもいいし、作業中も確認できるようにしてもいい。作業中も確認できるようにする場合、状態遷移の変化に基づき内容を更新してもよく、例えば、情報処理システム１００’が状態遷移図の中のどの状態にあるかを確認できるようにしてもよい。また、作業状態を確認できる状態にするための設定は、音声で指示して変更してもよいし、事前に設定しておいてもよいし、その他の方法で設定してもよい。 Further, the working state may be confirmed only before the work, or may be confirmed during the work. If it is possible to check during work, the contents may be updated based on the change in the state transition. For example, even if the information processing system 100'can be checked in which state in the state transition diagram. Good. Further, the setting for confirming the working state may be changed by instructing by voice, may be set in advance, or may be set by another method.

Ｓ２０では、情報処理システム１００’は、指示者からの音声入力を行う。ここで、指示者は、確認結果をふまえて問題なければ作業を継続する指示を行う。問題があれば、作業群を修正するため、修正の指示を行う。修正の指示を行った場合は、情報処理システム１００’は、決定した作業群を更新するため、再度、目標状態を選択し直し、作業群を決定し直す。その際、新たに視覚情報を取得し直す。 In S20, the information processing system 100'performs voice input from the instructor. Here, the instructor gives an instruction to continue the work if there is no problem based on the confirmation result. If there is a problem, give instructions to correct the work group. When the correction instruction is given, the information processing system 100'updates the determined work group, so that the target state is selected again and the work group is determined again. At that time, new visual information is acquired.

以上のように、本実施形態では、音声情報と画像情報によって、ロボットが実行する作業群を特定する。そして、特定した作業群を、可視化と可聴化のいずれか一つ以上の方法により、指示者が確認できるようにする。こうすることで、指示者は情報処理システム１００’が作業を開始する前に、どのような作業を実行するかを確認でき、指示者の意図と異なる作業が決定されている場合は作業開始前に作業内容を修正できるため、作業のやり直しが減り、作業の効率化が図れる。 As described above, in the present embodiment, the work group to be executed by the robot is specified by the voice information and the image information. Then, the specified work group can be confirmed by the instructor by one or more methods of visualization and audibility. By doing so, the instructor can confirm what kind of work is to be executed before the information processing system 100'starts the work, and if a work different from the instructor's intention is decided, before the work starts. Since the work contents can be corrected, the work redoing can be reduced and the work efficiency can be improved.

＜第三の実施形態＞
次に、第三の実施形態について説明する。第三の実施形態では、上述した第一、第二の実施形態に対して、さらに、作業記憶部１０３に記憶されていない物体を扱う場合、状態遷移後の途中状態、目標状態の画像情報を生成して指示者に状態の確認を行うことで、作業を継続するよう構成したものである。具体的には、上述した第一、第二実施形態と同様に、ロボットが実行する作業群を特定する。次いで、上述した第二の実施形態と同様に、特定した作業群を、指示者に確認する。このとき、対象物体５が状態記憶部１０５に記憶されていない物体であると判断した場合、途中状態および目標状態の画像を生成することによって、確認を行う。そして、指示者が確認した結果、生成した画像が正しいと判断した場合、新たに状態記憶部１０５に登録を行う。このように、新規の物体を対象とする場合でも、指示者の意図を確認しながら作業を継続することができるため、システム停止時間が短縮され、生産性の向上が図れる。 <Third embodiment>
Next, the third embodiment will be described. In the third embodiment, in contrast to the first and second embodiments described above, when handling an object that is not stored in the working storage unit 103, image information of the intermediate state and the target state after the state transition is provided. It is configured to continue the work by generating it and confirming the status with the instructor. Specifically, the work group to be executed by the robot is specified as in the first and second embodiments described above. Then, as in the second embodiment described above, the identified work group is confirmed with the instructor. At this time, if it is determined that the target object 5 is an object that is not stored in the state storage unit 105, confirmation is performed by generating images of the intermediate state and the target state. Then, as a result of confirmation by the instructor, if it is determined that the generated image is correct, the image is newly registered in the state storage unit 105. In this way, even when a new object is targeted, the work can be continued while confirming the intention of the instructor, so that the system downtime can be shortened and the productivity can be improved.

図１２は、本実施形態における情報処理装置１０’’を備える情報処理システム１００’’の構成例を示す図である。図１２は機器構成の一例であり、本発明の適用範囲を限定するものではない。 FIG. 12 is a diagram showing a configuration example of an information processing system 100 ″ including the information processing device 10 ″ in the present embodiment. FIG. 12 is an example of a device configuration, and does not limit the scope of application of the present invention.

指示入力部１０１’’は、音声入力装置１で取得した音声情報をテキスト情報に変換し、対象物体５の状態と操作に関する情報を推定部１０４に送る。新規に状態記憶部１０５’’に状態を登録する場合は、状態記憶部１０５’’にも対象物体５の状態と操作に関する情報を送る。 The instruction input unit 101 ″ converts the voice information acquired by the voice input device 1 into text information, and sends information regarding the state and operation of the target object 5 to the estimation unit 104. When a state is newly registered in the state storage unit 105 ″, information regarding the state and operation of the target object 5 is also sent to the state storage unit 105 ″.

視覚情報取得部１０２’’は、撮像装置１から出力される視覚情報を取得する。視覚情報取得部１０２’’は、取得した視覚情報を画像情報に変換し、推定部１０４に出力する。新規に状態記憶部１０５’’に状態を登録する場合は、状態記憶部１０５’’にも画像情報を送る。 The visual information acquisition unit 102 ″ acquires the visual information output from the image pickup apparatus 1. The visual information acquisition unit 102 ″ converts the acquired visual information into image information and outputs it to the estimation unit 104. When a state is newly registered in the state storage unit 105 ″, image information is also sent to the state storage unit 105 ″.

状態記憶部１０５’’は、対象物体５の状態遷移情報を記憶する。新規状態遷移情報を登録する場合は、指示入力部１０１’’から対象物体５の状態と操作に関するテキスト情報を、視覚情報取得部１０２’’から対象物体５の画像情報を受け取り、登録する。 The state storage unit 105 ″ stores the state transition information of the target object 5. When registering new state transition information, text information regarding the state and operation of the target object 5 is received from the instruction input unit 101 ″ and image information of the target object 5 is received from the visual information acquisition unit 102 ″ and registered.

作業群特定部１０６’’は、複数パターンの作業群から、指示情報が示す作業が含まれるものを、ロボットが実行するべき作業群の候補として特定する。作業群特定部１０６’’は、特定した作業群を生成部１０７と、出力部１０８’’と、画像生成手段１０９に送る。 The work group identification unit 106 ″ identifies, from a plurality of patterns of work groups, those including the work indicated by the instruction information as candidates for the work group to be executed by the robot. The work group specifying unit 106 ″ sends the specified work group to the generation unit 107, the output unit 108 ″, and the image generation means 109.

出力部１０８’’は、作業群特定部１０６’’で特定された作業群を出力装置３で表示する場合、作業群を画像に変換することによって可視化する。画像生成手段１０９によって画像が生成された場合、画像情報を画像生成手段１０９から受け取り表示する。 When the work group specified by the work group identification unit 106 ″ is displayed on the output device 3, the output unit 108 ″ is visualized by converting the work group into an image. When an image is generated by the image generating means 109, the image information is received from the image generating means 109 and displayed.

画像生成手段１０９は、作業群特定部１０６’’が特定した作業群に基づき、途中状態と目標状態が新規の状態であると判定した場合、途中状態と目標状態の画像を生成する。生成した画像情報は、出力部１０８’’へ送る。 When the image generation means 109 determines that the intermediate state and the target state are new states based on the work group specified by the work group specifying unit 106 ″, the image generating means 109 generates an image of the intermediate state and the target state. The generated image information is sent to the output unit 108 ″.

図１３は、本実施形態における情報処理システム１００’’および情報処理装置１０’’が実行する作業手順を示すフローチャートである。第一、第二の実施形態に示した図３、図１０と処理内容が異なる部分について、それぞれ説明する。 FIG. 13 is a flowchart showing a work procedure executed by the information processing system 100 ″ and the information processing apparatus 10 ″ in the present embodiment. The parts whose processing contents are different from those of FIGS. 3 and 10 shown in the first and second embodiments will be described respectively.

Ｓ８’’では、作業群特定部１０６’’が、複数パターンの作業群から、指示情報が示す作業が含まれるものを、ロボットが実行するべき作業群の候補として特定する。作業群特定部１０６’’は、特定した作業群を、生成部１０７と、出力部１０８’’と、画像生成手段１０９に送る。 In S8 ″, the work group identification unit 106 ″ specifies from a plurality of patterns of work groups, those including the work indicated by the instruction information as candidates for the work group to be executed by the robot. The work group specifying unit 106 ″ sends the specified work group to the generation unit 107, the output unit 108 ″, and the image generation means 109.

Ｓ２１では、状態記憶部１０５’’に記憶されていない物体があるかの判定を行う。判定方法は、例えば、指示者からの音声指示の内容に基づき、状態記憶部１０５’’に登録がない物体を指示された場合、新規の物体であると判定する。または、指示内容の音声情報と、取得した画像情報から作業群を決定する際に矛盾が生じる場合、出力装置３によって指示者に確認を行うことで、新規の物体であるか判定を行ってもよい。新規の物体であると判定した場合は、Ｓ２２にて途中状態、目標状態の画像生成を行う。新規の物体でないと判定した場合は、Ｓ１８’’にて作業群の可視化を行う。 In S21, it is determined whether or not there is an object that is not stored in the state storage unit 105 ″. The determination method is, for example, based on the content of the voice instruction from the instructor, when an object not registered in the state storage unit 105 ″ is instructed, it is determined that the object is a new object. Alternatively, if there is a contradiction between the voice information of the instruction content and the acquired image information when determining the work group, the output device 3 confirms with the instructor to determine whether the object is a new object. Good. When it is determined that the object is a new object, an image of the intermediate state and the target state is generated in S22. If it is determined that it is not a new object, the work group is visualized in S18 ″.

Ｓ２２では、途中状態、目標状態の画像生成を行う。生成方法は、例えば、状態記憶部１０５’’に記憶されている物体に対して、新規の物体の画像を合成することで行う。新規の物体の領域は、例えば、背景差分により求めてもよいし、深層学習を用いたインスタンスセグメンテーションにより求めてもよいし、その他の方法で求めてもよい。求めた物体領域を、既知の物体の画像に対して重畳することによって、新規の物体を含んだ途中状態、目標状態の画像を生成する。または、深層学習を用いて、直接、途中状態、目標状態の画像を生成してもよい。例えば、深層学習を利用し、初期状態の複数の画像を入力、入力に対応する途中状態、目標状態の画像を出力として、ＧＡＮ（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ）により学習を行い、画像を生成できるようにしてもよい。 In S22, images of the intermediate state and the target state are generated. The generation method is performed, for example, by synthesizing an image of a new object with the object stored in the state storage unit 105 ″. The region of the new object may be obtained by, for example, background subtraction, instance segmentation using deep learning, or other methods. By superimposing the obtained object region on the image of a known object, an image of an intermediate state and a target state including a new object is generated. Alternatively, deep learning may be used to directly generate images of the intermediate state and the target state. For example, using deep learning, multiple images in the initial state are input, images in the intermediate state and target state corresponding to the input are output, and learning is performed by GAN (Generative Advanced Network) so that images can be generated. May be good.

Ｓ１８’’では、出力部１０８’’が、作業群特定部１０６’’から受け取った作業群を可視化する。 In S18 ″, the output unit 108 ″ visualizes the work group received from the work group identification unit 106 ″.

Ｓ２０’’では、情報処理システム１００’’は、指示者からの音声入力を行う。ここで、指示者は、生成された画像が途中状態、目標状態として適切か否かの判定を行い、音声によりシステムに判定結果を入力する。 In S20 ″, the information processing system 100 ″ performs voice input from the instructor. Here, the instructor determines whether or not the generated image is appropriate as an intermediate state or a target state, and inputs the determination result to the system by voice.

Ｓ２３では、Ｓ２０’’の入力結果を受けて、生成した画像が途中状態、目標状態として適切か否かによって、後段の処理を変更する。適切であると判定された場合はＳ２５へ、適切でないと判定された場合はＳ２４の処理を行う。 In S23, in response to the input result of S20 ″, the subsequent processing is changed depending on whether or not the generated image is appropriate as an intermediate state or a target state. If it is determined to be appropriate, the process is performed in S25, and if it is determined to be inappropriate, the process is performed in S24.

Ｓ２４では、システムから指示者へ途中状態、目標状態の新規登録の指示の入力を受ける。画像生成手段１０９で適切な画像を生成することができなかったため、指示者に実際に途中状態、目標状態を作ってもらい、それぞれの状態を状態記憶部１０５’’に登録していく。 In S24, the system receives an instruction for new registration of the intermediate state and the target state from the instructor. Since the image generation means 109 could not generate an appropriate image, the instructor is asked to actually create an intermediate state and a target state, and each state is registered in the state storage unit 105 ″.

Ｓ２５では、Ｓ２２で生成した画像情報あるいはＳ２４で指示者が作成した実物の物体状態を状態記憶部１０５’’に登録する。登録する情報は画像情報とテキスト情報の組み合わせである。ここで、画像情報は、Ｓ２３において、Ｓ２２で生成した画像情報が適切であると判定された場合は、この生成された画像情報を登録する。逆に適切でないと判定された場合は、Ｓ２４で指示者に依頼した実物の物体を使って作られた状態を撮像することで画像情報を得て、登録する。テキスト情報は、画像の登録と合わせて、指示者から指示された音声情報をテキスト情報に変換して得て、登録する。登録の際、物体単位で登録するか、クラス単位で登録するか選択してもよい。クラス単位で登録する場合は、記憶済のクラスの中から対象のクラスを選択してもよいし、新規にクラスを作成してもよい。選択方法は、登録対象の画像情報に基づき、システム側でクラス分類を行うことで登録先を出力装置３に表示して指示者へ確認を行うことで決定してもよい。出力装置３に候補となる選択肢を表示して、その中から適切なクラスを指示者に選択させることで決定してもよい。 In S25, the image information generated in S22 or the actual object state created by the instructor in S24 is registered in the state storage unit 105 ″. The information to be registered is a combination of image information and text information. Here, as the image information, if it is determined in S23 that the image information generated in S22 is appropriate, the generated image information is registered. On the contrary, when it is determined that it is not appropriate, the image information is obtained and registered by imaging the state created by using the real object requested by the instructor in S24. The text information is obtained by converting the voice information instructed by the instructor into text information together with the registration of the image, and is registered. At the time of registration, it may be selected whether to register in units of objects or in units of classes. When registering in class units, the target class may be selected from the memorized classes, or a new class may be created. The selection method may be determined by displaying the registration destination on the output device 3 by classifying the class based on the image information to be registered and confirming with the instructor. It may be decided by displaying the candidate options on the output device 3 and letting the instructor select an appropriate class from them.

また、遷移可否の情報も合わせて登録する。その際、遷移方法も合わせて登録してもよい。登録方法は、例えば、物体状態を登録する際に、クラス単位で登録した場合、クラス内の他の物体と同様の遷移方法を出力装置３に表示して指示者へ確認を行うことで決定しもよい。出力装置３に候補となる遷移方法を表示して、その中から適切な方法を指示者に選択させることで決定してもよい。 In addition, information on whether or not transition is possible is also registered. At that time, the transition method may also be registered. For example, when registering an object state, the registration method is determined by displaying the same transition method as other objects in the class on the output device 3 and confirming with the instructor when the object state is registered in class units. May be good. The output device 3 may display a candidate transition method, and the instructor may select an appropriate method from the transition methods.

Ｓ２６では、遷移完了後の状態を状態記憶部１０５’’に対して新規に登録する。Ｓ２５では、生成した画像あるいは指示者が作成した状態の画像を登録したため、実際に情報処理システム１００’’が作業した結果を登録しているわけではない。Ｓ２６では、実際に情報処理システム１００’’が作業した結果の画像情報を登録することによって、データを増やす。ここでは画像情報を増やす処理が行われるので、テキスト情報や遷移可否情報、遷移方法などはＳ２５で登録された情報を適用する。 In S26, the state after the transition is completed is newly registered in the state storage unit 105 ″. In S25, since the generated image or the image created by the instructor is registered, the result of the actual work of the information processing system 100 ″ is not registered. In S26, the data is increased by registering the image information of the result of the actual work of the information processing system 100 ″. Since the process of increasing the image information is performed here, the information registered in S25 is applied to the text information, the transition possibility information, the transition method, and the like.

以上のように、本実施形態では、音声情報と画像情報によって目標状態を選択し、作業群を決定する。作業群の決定の際、対象物体５が状態記憶部１０５’’に記憶されていない物体であると判断した場合、途中状態および目標状態の画像を生成することによって、指示者へ作業群の確認を行う。そして、指示者が確認した結果、生成した画像が正しいと判断した場合、新たに状態記憶部１０５’’に登録を行う。こうすることで、新規の物体を対象とする場合でも、指示者の意図を確認しながら作業を継続することができるため、システム停止時間が短縮され、生産性の向上が図れる。 As described above, in the present embodiment, the target state is selected by the audio information and the image information, and the work group is determined. When determining the work group, if it is determined that the target object 5 is an object that is not stored in the state storage unit 105'', the work group is confirmed to the instructor by generating images of the intermediate state and the target state. I do. Then, as a result of confirmation by the instructor, if it is determined that the generated image is correct, the image is newly registered in the state storage unit 105 ″. By doing so, even when a new object is targeted, the work can be continued while confirming the intention of the instructor, so that the system downtime can be shortened and the productivity can be improved.

［ハードウェア構成］
情報処理装置１０は、例えばパーソナルコンピュータ（ＰＣ）により構成されている。図１５は、情報処理装置１０のハードウェア構成の一例である。情報処理装置１０は、ＣＰＵ１１と、ＲＯＭ１２と、ＲＡＭ１３と、外部メモリ１４と、入力部１５と、表示部１６と、通信Ｉ／Ｆ１７と、システムバス１８とを備える。ＣＰＵ１１は、情報処理装置１０における動作を統括的に制御するものであり、システムバス１８を介して、各構成部（１１〜１７）を制御する。ＲＯＭ１２は、ＣＰＵ１１が処理を実行するために必要なプログラムを記憶する不揮発性メモリである。なお、当該プログラムは、外部メモリ１４や着脱可能な記憶媒体（不図示）に記憶されていてもよい。ＲＡＭ１３は、ＣＰＵ１１の主メモリ、ワークエリアとして機能する。すなわち、ＣＰＵ１１は、処理の実行に際してＲＯＭ１２から必要なプログラムをＲＡＭ１３にロードし、当該プログラムを実行することで各種の機能動作を実現する。 [Hardware configuration]
The information processing device 10 is composed of, for example, a personal computer (PC). FIG. 15 is an example of the hardware configuration of the information processing device 10. The information processing device 10 includes a CPU 11, a ROM 12, a RAM 13, an external memory 14, an input unit 15, a display unit 16, a communication I / F 17, and a system bus 18. The CPU 11 comprehensively controls the operation of the information processing apparatus 10, and controls each component (11 to 17) via the system bus 18. The ROM 12 is a non-volatile memory for storing a program required for the CPU 11 to execute a process. The program may be stored in an external memory 14 or a removable storage medium (not shown). The RAM 13 functions as a main memory and a work area of the CPU 11. That is, the CPU 11 loads a program required from the ROM 12 into the RAM 13 when executing the process, and executes the program to realize various functional operations.

外部メモリ１４は、例えば、ＣＰＵ１１がプログラムを用いた処理を行う際に必要な各種データや各種情報を記憶している。また、外部メモリ１４には、例えば、ＣＰＵ１１がプログラムを用いた処理を行うことにより得られた各種データや各種情報が記憶される。入力部１５は、例えばキーボードやマウスのポインティングデバイスにより構成され、オペレータが入力部１５を介して当該情報処理装置１０に指示を与えることができるようになっている。表示部１６は、液晶ディスプレイ（ＬＣＤ）等のモニタで構成される。通信Ｉ／Ｆ１７は、外部機器と通信するためのインターフェースである。システムバス１８は、ＣＰＵ１１、ＲＯＭ１２、ＲＡＭ１３、外部メモリ１４、入力部１５、表示部１６及び通信Ｉ／Ｆ１７を通信可能に接続する。このように、情報処理装置１０は、通信Ｉ／Ｆ１７を介して、外部機器である音声入力装置１、撮像装置２、出力装置３、音声出力装置４、光源６、ロボット２０とそれぞれ通信可能に接続されており、これらの外部機器の動作を制御する。 The external memory 14 stores, for example, various data and various information necessary for the CPU 11 to perform processing using a program. Further, in the external memory 14, for example, various data and various information obtained by the CPU 11 performing processing using a program are stored. The input unit 15 is composed of, for example, a keyboard or mouse pointing device, and an operator can give an instruction to the information processing device 10 via the input unit 15. The display unit 16 is composed of a monitor such as a liquid crystal display (LCD). The communication I / F 17 is an interface for communicating with an external device. The system bus 18 communicatively connects the CPU 11, ROM 12, RAM 13, external memory 14, input unit 15, display unit 16, and communication I / F17. In this way, the information processing device 10 can communicate with the external devices such as the voice input device 1, the image pickup device 2, the output device 3, the voice output device 4, the light source 6, and the robot 20 via the communication I / F17. It is connected and controls the operation of these external devices.

＜実施例の効果＞
第一の実施形態によれば、ユーザが情報処理システム１００へ作業を指示するとき、情報処理システム１００が作業を実行するための情報に不足がある場合でも、指示者の意図する作業を実行できる。そのため、ロボットの操作に慣れていないユーザでも容易にロボットを使って作業を実行できる。 <Effect of Examples>
According to the first embodiment, when the user instructs the information processing system 100 to perform the work, the work intended by the instructor can be executed even if the information processing system 100 lacks information for executing the work. .. Therefore, even a user who is not accustomed to operating the robot can easily execute the work using the robot.

第二の実施形態によれば、指示者は情報処理システム１００’が作業を開始する前に、どのような作業を実行するかを確認でき、指示者の意図と異なる作業が決定されている場合は作業開始前に作業内容を修正できる。そのため、作業のやり直しが減り、作業の効率化が図れる。 According to the second embodiment, when the instructor can confirm what kind of work is to be executed before the information processing system 100'starts the work, and the work different from the instructor's intention is decided. Can modify the work contents before the work starts. Therefore, the reworking of the work is reduced, and the work efficiency can be improved.

第三の実施形態によれば、新規の物体を対象とする場合でも、指示者の意図を確認しながら作業を継続することができるため、システム停止時間が短縮され、生産性の向上が図れる。 According to the third embodiment, even when a new object is targeted, the work can be continued while confirming the intention of the instructor, so that the system downtime can be shortened and the productivity can be improved.

＜その他の実施形態＞
第二、第三の実施形態において、音声入力装置１と、撮像装置２と、出力装置３は、作業群の決定結果を確認するだけでなく、別の指示者が使うシステムとのインターフェースとして利用してもよい。例えば、指示者Ａが、音声入力装置１を通して別のラインにいる別のシステムを利用する指示者Ｂに対して呼びかけると、別のラインにいる指示者Ｂは、システムの出力装置３を通して指示者Ａの呼びかけを確認することができる。または、パーソナルコンピュータを利用しているユーザＣに対しても、情報処理システム１００’へ指示者Ａが入力した情報を、ユーザＣがパーソナルコンピュータに接続されたディスプレイやスピーカーを使って確認してもよい。また、逆に、指示者Ｂ、ユーザＣから指示者Ａに対して情報を送ってもよい。 <Other Embodiments>
In the second and third embodiments, the voice input device 1, the image pickup device 2, and the output device 3 are used not only for confirming the determination result of the work group but also as an interface with the system used by another instructor. You may. For example, when the instructor A calls to the instructor B who uses another system in another line through the voice input device 1, the instructor B in another line is instructed through the output device 3 of the system. You can confirm A's call. Alternatively, even for the user C who is using the personal computer, the information input by the instructor A to the information processing system 100'can be confirmed by the user C using the display or the speaker connected to the personal computer. Good. On the contrary, information may be sent from the instructor B and the user C to the instructor A.

第二の実施形態において、出力部１０８が確認できるようにする情報は、作業群確認結果でなく、目標状態選択結果でもよい。ただし、目標状態選択結果を出力装置３により確認できるようにする場合、途中状態が決まっていなかったり、動作も含めての遷移情報は分からなかったりする場合があるので、そのことを示す。 In the second embodiment, the information that can be confirmed by the output unit 108 may be the target state selection result instead of the work group confirmation result. However, when the target state selection result can be confirmed by the output device 3, the intermediate state may not be determined or the transition information including the operation may not be known, which is indicated.

第三の実施形態において、新規に登録する対象物体５に対して、状態記憶部１０５に適切な作業パターンが記憶されていない場合、位置姿勢制御情報取得手段１１０、動作生成手段１１１により、新しい作業パターンを登録する。図１４は、第三の実施形態における機器構成に対して、位置姿勢制御情報取得手段１１０、動作生成手段１１１を追加した構成例である。位置姿勢制御情報取得手段１１０はコントローラ２０１’’’から位置姿勢制御情報を受け取る。動作生成手段１１１は、位置姿勢制御情報取得部が取得した制御情報に基づき、動作を生成する。位置姿勢制御情報取得手段１１０が取得する情報は位置姿勢の制御情報であるが、動作生成手段１１１が生成する動作は、これら制御情報を連続的につなげたある作業パターンである。動作を生成する方法は、ロボット２０’’’に対して指示者がダイレクトティーチングで動作を教示してもよいし、ティーチングペンダントを使って動作を生成してもよいし、プログラムを書くことで動作を生成してもよいし、その他の方法でもよい。 In the third embodiment, when an appropriate work pattern is not stored in the state storage unit 105 for the newly registered target object 5, a new work is performed by the position / attitude control information acquisition means 110 and the motion generation means 111. Register the pattern. FIG. 14 is a configuration example in which the position / attitude control information acquisition means 110 and the motion generation means 111 are added to the device configuration according to the third embodiment. The position / attitude control information acquisition unit 110 receives the position / attitude control information from the controller 201 ″. The motion generating means 111 generates motions based on the control information acquired by the position / attitude control information acquisition unit. The information acquired by the position / attitude control information acquisition means 110 is the position / attitude control information, but the motion generated by the motion generation means 111 is a certain work pattern in which these control informations are continuously connected. As for the method of generating the motion, the instructor may teach the robot 20'''the motion by direct teaching, the motion may be generated by using the teaching pendant, or the motion may be generated by writing a program. May be generated, or any other method may be used.

なお、上述した各処理部のうち、推定部１０４等については、その代わりとして、機械学習された学習済みモデルを代わりに用いて処理しても良い。その場合には、例えば、その処理部への入力データと出力データとの組合せを学習データとして複数個準備し、それらから機械学習によって知識を獲得し、獲得した知識に基づいて入力データに対する出力データを結果として出力する学習済みモデルを生成する。学習済みモデルは、例えばニューラルネットワークモデルで構成可能である。そして、その学習済みモデルは、前記処理部と同等の処理をするためのプログラムとして、ＣＰＵあるいはＧＰＵなどと協働で動作することにより、前記処理部の処理を行う。なお、上記学習済みモデルは、必要に応じて一定の処理後に更新しても良い。 Of the above-mentioned processing units, the estimation unit 104 and the like may be processed by using a machine-learned trained model instead. In that case, for example, a plurality of combinations of input data and output data to the processing unit are prepared as training data, knowledge is acquired from them by machine learning, and output data for the input data is obtained based on the acquired knowledge. Generates a trained model that outputs as a result. The trained model can be constructed by, for example, a neural network model. Then, the trained model performs the processing of the processing unit by operating in collaboration with the CPU, GPU, or the like as a program for performing the same processing as the processing unit. The trained model may be updated after a certain process if necessary.

本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、データ通信用のネットワーク又は各種記憶媒体を介してシステム或いは装置に供給する。そして、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。また、そのプログラムをコンピュータが読み取り可能な記録媒体に記録して提供してもよい。 The present invention is also realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiment is supplied to the system or device via a network for data communication or various storage media. Then, the computer (or CPU, MPU, etc.) of the system or device reads and executes the program. Further, the program may be recorded and provided on a computer-readable recording medium.

１音声入力装置
２撮像装置
３出力装置
５対象物体
６光源
１０情報処理装置
２０ロボット
１００情報処理システム
２０１コントローラ
２０２マニピュレータ
２０３把持装置 1 Voice input device 2 Imaging device 3 Output device 5 Target object 6 Light source 10 Information processing device 20 Robot 100 Information processing system 201 Controller 202 Manipulator 203 Gripping device

Claims

An information processing device that causes a robot to execute a work group consisting of multiple tasks.
An input means for inputting instruction information indicating at least a part of the work in the work group, and
An information processing apparatus comprising: a specific means for specifying a work group including a work indicated by the instruction information as a candidate for a work group to be executed by the robot among the work groups having a plurality of patterns.

The information processing apparatus according to claim 1, wherein the specific means specifies a work group to be executed by the robot based on supplementary information when there are a plurality of patterns of the candidate work groups. ..

The information processing apparatus according to claim 2, wherein the supplementary information is a selection instruction input by the user.

The information processing apparatus according to claim 2, wherein the supplementary information is material information related to a material used in any of the plurality of patterns of work groups.

The information processing device according to claim 4, wherein the material information is information for identifying a material existing in the work space of the robot.

The information processing apparatus according to claim 4 or 5, wherein the material information is image information obtained by photographing a material existing in the work space of the robot.

The information processing device according to any one of claims 1 to 6, wherein the input means inputs the instruction information which is voice information emitted by a user.

The information processing apparatus according to any one of claims 1 to 7, further comprising a storage means for storing the plurality of patterns of work groups.

The information processing apparatus according to any one of claims 1 to 8, wherein the specific means determines the order of work to be performed based on the specified work group.

The information processing apparatus according to any one of claims 1 to 9, further comprising a generation means for generating control information of the robot based on the work group specified by the specific means.

The information processing apparatus according to any one of claims 1 to 10, further comprising a presentation means for presenting the work determined by the determination means and the order of the work to the user.

The information processing apparatus according to claim 9, wherein the presentation means presents the work by generating an image showing the work and the order of the work.

The information processing apparatus according to any one of claims 1 to 12, further comprising a learning means for learning control information of the work when a work not included in the work group is instructed.

A program for causing a computer to function as each means included in the information processing apparatus according to any one of claims 1 to 13.

An information processing method that causes a robot to execute a work group consisting of multiple tasks.
An input process for inputting instruction information indicating at least a part of the work in the work group, and
An information processing method comprising:, among the plurality of patterns of the work group, a specific step of specifying a work group including the work indicated by the instruction information as a candidate of the work group to be executed by the robot.