CN114833830A - Grabbing method and device, electronic equipment and storage medium - Google Patents
Grabbing method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN114833830A CN114833830A CN202210456336.3A CN202210456336A CN114833830A CN 114833830 A CN114833830 A CN 114833830A CN 202210456336 A CN202210456336 A CN 202210456336A CN 114833830 A CN114833830 A CN 114833830A
- Authority
- CN
- China
- Prior art keywords
- target object
- action
- pose
- state
- duration
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 118
- 238000003860 storage Methods 0.000 title claims abstract description 24
- 230000009471 action Effects 0.000 claims abstract description 190
- 230000002787 reinforcement Effects 0.000 claims abstract description 73
- 230000006870 function Effects 0.000 claims description 101
- 230000033001 locomotion Effects 0.000 claims description 60
- 239000012636 effector Substances 0.000 claims description 33
- 238000004590 computer program Methods 0.000 claims description 14
- 238000009826 distribution Methods 0.000 claims description 14
- 238000011156 evaluation Methods 0.000 claims description 8
- 230000007704 transition Effects 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 abstract description 16
- 238000009776 industrial production Methods 0.000 abstract description 9
- 230000000875 corresponding effect Effects 0.000 description 65
- 238000012545 processing Methods 0.000 description 28
- 230000008569 process Effects 0.000 description 25
- 238000010586 diagram Methods 0.000 description 23
- 239000003795 chemical substances by application Substances 0.000 description 14
- 238000005516 engineering process Methods 0.000 description 13
- 238000004891 communication Methods 0.000 description 10
- 230000008901 benefit Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 5
- 238000010295 mobile communication Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000004088 simulation Methods 0.000 description 5
- 230000010365 information processing Effects 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000012885 constant function Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000007774 longterm Effects 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000013178 mathematical model Methods 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 241000282414 Homo sapiens Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001066 destructive effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012854 evaluation process Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000005309 stochastic process Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1602—Programme controls characterised by the control system, structure, architecture
- B25J9/161—Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1661—Programme controls characterised by programming, planning systems for manipulators characterised by task planning, object-oriented languages
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1664—Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Automation & Control Theory (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Manipulator (AREA)
Abstract
The disclosure relates to a grabbing method and device, electronic equipment and a storage medium, wherein the method comprises the following steps: and according to the acquired first pose of the target object, determining the action of the mechanical arm, and the duration which can include at least one time step and corresponds to the action, generating a single-step track of the mechanical arm for one time step according to the action of the mechanical arm and the first pose of the target object, and repeatedly executing the single-step track according to the duration so as to enable the target object to move to a second pose. The embodiment of the disclosure can adaptively combine repeated actions, thereby reducing the inference times of a reinforcement learning algorithm, further improving the industrial production efficiency, and simultaneously improving the success rate of grabbing.
Description
Technical Field
The present disclosure relates to the field of computer vision technologies, and in particular, to a capture method and apparatus, an electronic device, and a storage medium.
Background
With the rapid development of robotics and artificial intelligence techniques, machines can replace human beings to perform various tasks. In order to realize that the machine replaces the manual work to carry out various kinds of work, the mechanical arm is required to carry out machine learning (such as deep learning and reinforcement learning) so as to interact with the external environment and realize various grabbing tasks.
In an industrial scenario, in order to reduce production cost, a robot arm uses a suction rod with single-point contact as an end effector, and the structure of an object to be grasped (e.g., an industrial part) is often complicated, requiring the robot arm to grasp the object to a specified target position and a specified rotation angle. In this case, if the reinforcement learning method in the related art is used to realize the grasping task of the robot arm, there are many repetitive actions during the entire operation of the grasping task, resulting in a decrease in the efficiency of industrial production.
Disclosure of Invention
The present disclosure provides a grabbing technical scheme.
According to an aspect of the present disclosure, there is provided a grasping method including: acquiring a first posture of a target object, wherein the first posture comprises a three-dimensional position of the current state of the target object and a three-dimensional corner of the current state of the target object; determining the action of the mechanical arm and the duration corresponding to the action according to the first position of the target object, wherein the duration comprises at least one time step; generating a single-step track of the mechanical arm according to the action of the mechanical arm and the first position of the target object, wherein the single-step track is generated for one time step; repeatedly executing the single-step trajectory in accordance with the duration to move the target object to a second pose, the second pose including a three-dimensional position of a next state of the target object and a three-dimensional corner of the next state of the target object.
In one possible implementation, the determining, according to the first position of the target object, an action of a robot arm and a duration corresponding to the action includes: determining the state of the mechanical arm according to the first pose of the target object, at least one historical pose of the target object and at least one historical action of the mechanical arm; and determining the action of the mechanical arm and the duration corresponding to the action according to the state of the mechanical arm.
In one possible implementation, the determining, according to the state of the robot arm, an action of the robot arm and a duration corresponding to the action includes: determining the action of the mechanical arm and the duration corresponding to the action by utilizing a strategy according to the state of the mechanical arm; wherein the policy is used to adjust the action and the duration corresponding to the action to accommodate each state.
In one possible implementation, the policy includes a multi-layer perceptron in a reinforcement learning model obtained by a reinforcement learning method, where a state space of the reinforcement learning model represents a state set of the robot arm, an action space represents a cartesian product of the action set and a duration set of the robot arm, a state transition probability represents a probability of a next state after a single-step trajectory corresponding to an action is repeatedly executed according to the duration in each state, an initial state distribution represents a probability distribution for generating an initial state, a reward function represents an evaluation after the single-step trajectory corresponding to the action is repeatedly executed according to the duration in each state, and a discount coefficient is used for adjusting the reward function.
In a possible implementation manner, the reward function includes at least one of a success reward function, a motion reward function, a safety reward function and a time penalty reward function, and the success reward function is used for evaluating whether the second pose of the target object belongs to an expected pose set after a single-step track corresponding to an action is repeatedly executed according to the duration in each state; the motion reward function is used for evaluating the distance between the second pose of the target object and an expected pose after the single-step track corresponding to the action repeatedly executed according to the duration in each state is evaluated; the safety reward function is used for evaluating the danger degree of the mechanical arm action in each state; the time penalty reward function is used for restricting the reasoning times of the reinforcement learning model.
In one possible implementation, after repeatedly executing the single-step trajectory according to the duration to move the target object to the second position, the method further includes: and determining the second pose of the target object as the first pose of the target object, and re-executing the steps of acquiring the first pose of the target object and the subsequent steps.
In one possible implementation, the generating a single-step trajectory of the robotic arm according to the motion of the robotic arm and a first pose of the target object includes: and generating the single-step motion trail of the mechanical arm by using the action of the mechanical arm and the first position of the target object by using an inverse dynamics method.
In one possible implementation, the robotic arm includes an end effector that includes a single-touch effector.
According to an aspect of the present disclosure, there is provided a grasping apparatus including: the acquisition module is used for acquiring a first posture of a target object, wherein the first posture comprises a three-dimensional position of the current state of the target object and a three-dimensional corner of the current state of the target object; the determining module is used for determining the action of the mechanical arm and the duration corresponding to the action according to the first position and the orientation of the target object, wherein the duration comprises at least one time step; a generating module, configured to generate a single-step trajectory of the robot arm according to the motion of the robot arm and a first position of the target object, where the single-step trajectory is a trajectory generated for one time step; and the execution module is used for repeatedly executing the single-step track according to the duration so as to enable the target object to move to a second pose, wherein the second pose comprises a three-dimensional position of the next state of the target object and a three-dimensional corner of the next state of the target object.
In one possible implementation manner, the determining module is configured to: determining the state of the mechanical arm according to the first pose of the target object, at least one historical pose of the target object and at least one historical action of the mechanical arm; and determining the action of the mechanical arm and the duration corresponding to the action according to the state of the mechanical arm.
In one possible implementation, the determining, according to the state of the robot arm, an action of the robot arm and a duration corresponding to the action includes: determining the action of the mechanical arm and the duration corresponding to the action by utilizing a strategy according to the state of the mechanical arm; wherein the policy is used to adjust the action and the duration corresponding to the action to accommodate each state.
In one possible implementation, the policy includes a multi-layer perceptron in a reinforcement learning model obtained by a reinforcement learning method, where a state space of the reinforcement learning model represents a state set of the robot arm, an action space represents a cartesian product of the action set and a duration set of the robot arm, a state transition probability represents a probability of a next state after a single-step trajectory corresponding to an action is repeatedly executed according to the duration in each state, an initial state distribution represents a probability distribution for generating an initial state, a reward function represents an evaluation after the single-step trajectory corresponding to the action is repeatedly executed according to the duration in each state, and a discount coefficient is used for adjusting the reward function.
In a possible implementation manner, the reward function includes at least one of a success reward function, a motion reward function, a safety reward function and a time penalty reward function, and the success reward function is used for evaluating whether the second pose of the target object belongs to an expected pose set after a single-step track corresponding to an action is repeatedly executed according to the duration in each state; the motion reward function is used for evaluating the distance between the second pose of the target object and an expected pose after the single step track corresponding to the action repeatedly executed according to the duration in each state is evaluated; the safety reward function is used for evaluating the danger degree of the mechanical arm action in each state; the time penalty reward function is used for restricting the reasoning times of the reinforcement learning model.
In a possible implementation manner, the executing module is further configured to repeatedly execute the single step trajectory according to the duration, so that after the target object is moved to a second pose, the second pose of the target object is determined as the first pose of the target object, and the steps after acquiring the first pose of the target object are re-executed.
In one possible implementation manner, the generating module is configured to: and generating the single-step motion trail of the mechanical arm by using the action of the mechanical arm and the first position of the target object by using an inverse dynamics method.
In one possible implementation, the robotic arm includes an end effector that includes a single-touch effector.
According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
In the embodiment of the disclosure, the motion of the mechanical arm can be determined according to the acquired first pose of the target object, and the duration corresponding to the motion, which may include at least one time step, a single-step trajectory of the mechanical arm for one time step is generated according to the motion of the mechanical arm and the first pose of the target object, and the single-step trajectory is repeatedly executed according to the duration, so that the target object moves to the second pose. By the mode, in the grabbing process of the mechanical arm, each action can correspond to different duration time so as to accurately move the target object from the first pose (comprising the three-dimensional position and the three-dimensional corner) to the second pose (comprising the three-dimensional position and the three-dimensional corner), so that the repeated actions can be combined in a self-adaptive manner, the inference times of a reinforcement learning algorithm are reduced, the industrial production efficiency is improved, and the grabbing success rate can be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a diagram illustrating reinforcement learning in the related art.
Fig. 2 shows a schematic diagram of an application scenario of a grabbing method according to an embodiment of the present disclosure.
Fig. 3 shows a flow diagram of a grabbing method according to an embodiment of the present disclosure.
Fig. 4 shows a schematic diagram of a target object according to an embodiment of the present disclosure.
Fig. 5 shows a frame schematic of a grabbing method according to an embodiment of the present disclosure.
Fig. 6 shows a block diagram of a grasping apparatus according to an embodiment of the present disclosure.
Fig. 7 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.
Fig. 8 shows a block diagram of another electronic device in accordance with an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
In order to better understand the capture method of the embodiment of the present disclosure, the concept of reinforcement learning in the related art is introduced first.
Reinforcement Learning (RL), also known as re-exciting Learning, evaluative Learning or Reinforcement Learning, is one of the paradigms and methodologies of machine Learning, and is used to describe and solve the problem that an agent (i.e., a subject of Reinforcement Learning, such as an intelligent model included as a learner or a decision maker) can achieve maximum reward or achieve a certain goal through Learning strategies in the process of interacting with the environment.
Reinforcement Learning is different from Supervised Learning (Supervised Learning) and Unsupervised Learning (Unsupervised Learning), which does not require any data to be given in advance, but obtains Learning information and updates model parameters by receiving an incentive (feedback) for an action of an environment. The supervised learning is to realize algorithms such as regression, classification and the like through a labeled data learning rule; the unsupervised learning is to find a hidden mode in unlabeled data to realize algorithms such as clustering and dimension reduction.
As shown in the diagram of reinforcement learning of fig. 1, if a certain behavior strategy of an agent results in a positive reward (reinforcement signal) of the environment, the tendency of the agent to generate this behavior strategy later is reinforced. The goal of the agent is to find the optimal strategy at each discrete state to maximize the desired discount reward and value.
The reinforcement learning regards learning as a heuristic evaluation process, an agent selects an action for an environment, the state of the environment changes after receiving the action, a reinforcement signal (reward or punishment) is generated and fed back to the agent, the agent selects the next action according to the reinforcement signal and the current state of the environment, and the selection principle is to increase the probability of being reinforced (reward). The action selected affects not only the immediate enhancement value, but also the state at the next moment in the environment and the final enhancement value.
That is, reinforcement learning is learning by the agent in a "trial and error" manner, and the reward guidance behavior obtained by interacting with the environment aims to make the agent obtain the maximum reward, which is different from supervised learning in association with the meaning learning and mainly represented on a reinforcement signal, and the reinforcement signal provided by the environment in reinforcement learning is an evaluation (usually a scalar signal) of the quality of the action rather than telling the agent how to generate the correct action. Because the information provided by the external environment is very little, the reinforcement learning system must learn by its own experience. In this way, the reinforcement learning system gains knowledge in the context of "action-evaluation" and improves the course of action to suit the context.
A common model for reinforcement learning is the standard Markov Decision Process (MDP). A reinforcement learning task is referred to as a Markov decision process if Markov is satisfied. MDP is a mathematical model that models the randomness strategy and yield of agents in an environment, and the state of the environment has Markov properties. Markov Property (Markov Property) is a concept in probability theory. When a random process is given a current state and all past states, the conditional probability distribution of the future state depends only on the current state; in other words, given a present state that is condition independent of past states, the stochastic process has a markov property.
Based on the above, the robot arm can be grabbed by using a reinforcement learning method, and the reinforcement learning is used for training to obtain the optimal strategy of the reinforcement learning model, so that the mapping from the environment state to the grabbing action is realized.
In the embodiment of the disclosure, the grabbing task of the mechanical arm can be suitable for not only the scenes in daily life, but also the industrial scenes, and has the following characteristics:
firstly, in an industrial scene, in order to reduce production cost and facilitate operation of subsequent industrial processes, a rod-shaped end effector can be used by a mechanical arm, and compared with other end effectors (such as a clamping jaw and a multi-finger manipulator), the rod-shaped end effector performs single-point touch on an object, and the degree of freedom is low;
secondly, the grabbing task requires that an object (such as an industrial part) reaches a specified pose (comprising a three-dimensional position and a three-dimensional corner), and action decision can be made in a 3-dimensional space (such as moving the grabbed object in a certain three-dimensional space);
third, the geometry of the object to be grasped (e.g., an industrial part) may be complex and difficult to model mechanically to control.
In this case, if a reinforcement learning method (for example, the method shown in fig. 1) in the related art is used in the industrial scenario, a large number of repetitive actions are generated, resulting in a reduction in the efficiency of industrial production.
In view of this, in the embodiments of the present disclosure, by acquiring a first pose (i.e., a three-dimensional position and a three-dimensional rotation angle of a current state of a target object) of the target object (i.e., an object to be grabbed), and according to the first pose of the target object, determining an action of a robot arm, and a duration corresponding to the action, which may include at least one time step; and generating a single-step track of the mechanical arm for one time step according to the action of the mechanical arm and the first pose of the target object, and repeatedly executing the single-step track according to the duration so as to move the target object to the second pose (namely the three-dimensional position and the three-dimensional corner of the next state of the target object).
By the method, the target object with the complex geometric structure can be grabbed, in the grabbing process of the mechanical arm, each action can correspond to different duration time so as to move the target object from the first pose to the second pose, the repeated actions can be combined in a self-adaptive mode, the inference times of a reinforcement learning algorithm are reduced, the industrial production efficiency is improved, and meanwhile, the grabbing success rate can be improved.
Fig. 2 shows a schematic diagram of an application scenario of a grabbing method according to an embodiment of the present disclosure. As shown in fig. 2, the end effector of the robot arm may be a single-point touch rod-shaped effector, the target object may be an industrial component, and if the target object is difficult to grasp due to shielding of a material frame wall, the target object may be pre-grasped by using the grasping method according to the embodiment of the disclosure, and the target object is moved to a position and a corner where the target object can be grasped.
It should be understood that FIG. 2 is merely illustrative and that the present disclosure is not limited to the configuration and shape of the robotic arm. Corresponding to the mechanical arm with other structures, the target object can be moved to the target pose (including the target three-dimensional position and the target three-dimensional corner) by the grabbing operation of the mechanical arm on the target object by adopting the grabbing method of the embodiment of the disclosure. The grasping method of the embodiment of the disclosure may be applied not only to life scenes (for example, grasping food and tableware) but also to industrial scenes (for example, grasping industrial parts), and the application scene of the grasping method is not limited by the embodiment of the disclosure.
Fig. 3 shows a flowchart of a grabbing method according to an embodiment of the present disclosure, as shown in fig. 3, the grabbing method includes:
in step S11, acquiring a first pose of a target object, where the first pose includes a three-dimensional position of a current state of the target object and a three-dimensional corner of the current state of the target object;
in step S12, determining a motion of the robot arm and a duration corresponding to the motion according to the first posture of the target object, where the duration includes at least one time step;
in step S13, generating a single-step trajectory of the robot arm, which is a trajectory generated for one time step, according to the motion of the robot arm and a first posture of the target object;
in step S14, the single-step trajectory is repeatedly executed according to the duration to move the target object to a second pose including a three-dimensional position of the target object next state and a three-dimensional corner of the target object next state.
In one possible implementation, the robot and the environment in which the robot is located may be a simulation model written by program codes, the capturing method may be performed by an electronic device such as a terminal device or a server, the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling computer readable instructions stored in a memory. Alternatively, the method may be performed by a server.
In a possible implementation manner, the robot may be a real-existing robot in the real world, and the environment in which the robot is located is a real environment in the real world, in which case, the grabbing manner may be controlled and executed by a processor of the robot, where the processor includes one or a combination of a GPU (Graphics Processing Unit), a NPU (Neural-Network Processing Unit), a DSP (Digital Signal Processing), a Field Programmable Gate Array (FPGA) chip, and the like. The present disclosure is not limited to a particular type of processor.
In one possible implementation, the target object is an object to be grasped. Fig. 4 shows a schematic diagram of a target object of an embodiment of the present disclosure. As shown in fig. 4, the target object is shown in the form of an industrial component, however, fig. 4 is only an illustrative example, and may be any other object with a regular or irregular geometric shape, and the shape and size of the target object are not limited by the present disclosure.
In one possible implementation, in step S11, an image including the target object may be captured by an image capture device (e.g., a binocular camera), and pose detection may be performed on the target object in the image to acquire a first pose of the target object; alternatively, a function module (for example, including a three-dimensional laser scanner) capable of detecting the first pose of the target object may be invoked through a software interface, and the first pose of the target object may be acquired according to a detection result of the function module. The present disclosure does not limit the specific method of acquiring the first pose of the target object.
The first pose of the target object may be a 6D (degree of freedom) pose, which may include a three-dimensional position of the current state of the target object and a three-dimensional corner (i.e., euler angle, which represents a rotation angle of the target object around the coordinate axis in which the target object is located) of the current state of the target object.
For example, taking a spatial rectangular coordinate system as an example, the three-dimensional position of the target object may be represented as coordinates (x, y, z) in the spatial rectangular coordinate system, and the three-dimensional rotation angle of the target object may be represented as a roll angle rotating around an x axis, a yaw angle rotating around a y axis, and a pitch angle rotating around a z axis. It should be understood that the first pose of the target object may be represented using a polar coordinate system, a spherical coordinate system, a cylindrical coordinate system, etc., in addition to the spatial rectangular coordinate system, and the present disclosure does not limit the coordinate system used to represent the first pose of the target object, and the specific representation method.
Having acquired the first pose of the target object at step S11, an action of the robotic arm and a duration of the action including at least one time step may be determined from the first pose of the target object at step S12. For example, a simulation environment can be created according to an application scene actually grabbed by the mechanical arm, and reinforcement learning can be used to find an optimal strategy. Strategies such as reinforcement learning models can include multi-layer perceptors that can be trained using reinforcement learning to obtain optimal strategies for reinforcement models to achieve a mapping of target object states to actions of the robotic arms and corresponding action durations. After the optimal strategy is obtained, the first posture of the target object can be input into the optimal strategy, and the action of the mechanical arm and the duration corresponding to the action can be output.
If the duration time corresponding to the action of the mechanical arm comprises a larger number of time steps, the duration time of the action is longer; conversely, if the duration corresponding to the movement of the robot arm includes a smaller number of time steps, the duration of the movement is shorter.
In one possible implementation, step S12 may include: determining the state of the mechanical arm according to the first pose of the target object, at least one historical pose of the target object and at least one historical action of the mechanical arm; and determining the action of the mechanical arm and the duration corresponding to the action according to the state of the mechanical arm.
Exemplarily, assuming that the current state is the t-th time, in order to retain the state information related to the time sequence, the first posture q of the target object at the t-th time may be set t (i.e., the second attitude q of the target object at time t-1 t ) And the historical pose q of the target object at the previous t-1 time t-1 (i.e., the second attitude q of the target object at time t-2 t-1 ) Historical motion a of the robot arm at the previous t-1 th time t-1 Historical pose q of target object at previous t-2 time t-2 (i.e., the second attitude q of the target object at time t-3 t-2 ) Historical motion a of the robot arm at the previous time t-2 t-2 Historical movement a of the robot arm at the previous time t-3 t-3 Merging processing is carried out, and the state s of the mechanical arm at the current t-th moment is determined t Namely: s t =(a t-3,t-2,t-1 ,q t-2,t-1,t )。
Then, the state s of the mechanical arm at the current t-th moment can be determined based on the optimal strategy found by reinforcement learning t Inputting the optimal strategy and outputting the action a of the mechanical arm t And the action a t Corresponding duration.
It should be understood that the above only takes the three latest poses of the target object and the three latest historical actions made by the robot arm as an example, and the embodiments of the present disclosure are not limited thereto, and may utilize a plurality of historical poses and a plurality of historical states in relation to a time sequence.
By the method, the latest poses of the target object and the latest historical actions of the mechanical arm can be combined together and input to the reinforcement learning algorithm as a complete state, state information related to time sequence can be more fully utilized, and the action of the mechanical arm and the duration corresponding to the action can be more accurately determined.
The motion of the robot arm and the duration corresponding to the motion are determined in step S12, and a single-step trajectory of the robot arm, which is a trajectory generated for one time step, may be generated from the motion of the robot arm and the first posture of the target object in step S13.
In one possible implementation, step S13 may include: and generating the single-step motion trail of the mechanical arm by using the action of the mechanical arm and the first position of the target object by using an inverse dynamics method.
For example, the generated Motion of the mechanical arm and the first position of the target object may be input to the trajectory generator, and after receiving the Motion and the first position, the trajectory generator may obtain a Motion Planning (Motion Planning) of a mechanical arm joint that completes the Motion, that is, a moving trajectory of the target object for one time step, by inverse kinematics method, where the first position of the target object may be used to determine a starting position of the moving trajectory.
That is, the trajectory generator may obtain desired robot motion state information (including, for example, positions of joints of the robot, joint speeds and accelerations, etc.) according to the input motion and the first posture, and may solve, through inverse dynamics analysis, whether or not the driving force (or moment) applied to each stem of the robot can be realized by the motion system.
The inverse dynamics may be solved by using a geometric method, an iterative Newton-Euler Algorithm (Recursive Newton-Euler Algorithm), a lagrangian equation (Lagrange), and the like, which is not limited by the present disclosure.
By the mode, the trajectory generator can plan the motion trajectory of the end effector by simultaneously considering the action and the posture of the target object, and the grabbing accuracy of the mechanical arm is improved.
The single step trajectory is determined in step S13, the single step trajectory may be repeatedly executed according to the duration (i.e., the movement trajectory of the motion is extended) by the number of time steps included in the duration, and the repeated motion may be input to the robot arm to be executed in step S14, so as to change the three-dimensional position and the three-dimensional rotation angle of the target object, and move the target object to the second pose.
By the method, the target object with a complex geometric structure can be grabbed, in the grabbing process of the mechanical arm, each action can correspond to different duration time, so that the target object can be moved from a first pose (comprising a three-dimensional position and a three-dimensional corner) to a second pose (comprising the three-dimensional position and the three-dimensional corner), the repeated actions can be combined in a self-adaptive manner, the inference times of a reinforcement learning algorithm are reduced, the industrial production efficiency is improved, and the grabbing success rate can be improved.
In one possible implementation, after step S14, the method further includes step S15: and determining the second pose of the target object as the first pose of the target object, and re-executing the steps of acquiring the first pose of the target object and the subsequent steps.
For example, assume a first pose S of the target object for a current time t (current wheel) t Through the steps S11-S14, the target object can be moved to the second pose S t+1 The second position S of the target object can be determined through step S15 t+1 Determining a first pose S of the target object as a next time t +1 (next round) t+1 And repeatedly executing the steps S11-S14 until the target object is moved to the target pose.
In this way, the first pose of the current target object can be continuously observed at each moment (each round), the action of the mechanical arm and the duration corresponding to the action are determined, and the mechanical arm repeatedly executes the action within the duration to move the target object to the second pose. The target object may be moved from the initial pose to the target pose through multiple rounds of iterative execution, each action may correspond to a different duration. Compared with the related art, the method has the advantages that the repeated actions can be generated in a mode that each action of the mechanical arm needs to be continued for the same time (namely the execution frequency of each action is the same), the method disclosed by the embodiment of the invention is favorable for combining the repeated actions in a self-adaptive manner, the inference times of a reinforcement learning algorithm are saved, and the industrial production efficiency is improved.
The following is a description of the grasping method according to the embodiment of the present disclosure.
First, an environment in which the robot arm performs an action may be defined, which may include an interface between the robot arm and the environment. The environment may be a simulation model or a real physical system, which is not limited by this disclosure. In order to reduce the cost and improve the safety of the mechanical arm grabbing experiment, a simulation environment can be preferentially adopted for the grabbing experiment, as shown in fig. 5, and the mechanical arm, the target object and the operation cabin are all simulation models written by program codes.
The grabbing task in the current environment may then be modeled. Assuming that in an industrial scene, a robotic arm uses a rod-like end effector to control the pose of a target object, which may include industrial components of complex geometry, the graspable pose of the target object is typically limited in number and easily occluded, as shown in fig. 2. In such cases, the robotic arm may generate commands that are not feasible, resulting in destructive collisions between the end effector of the robotic arm and other objects (e.g., the walls of the operating chamber). Based on the method, a pose correction task from an infeasible grabbing pose to an available grabbing pose can be established, so that the corrected target object is in the expected target pose, and the accuracy of the next process is improved. As shown in fig. 4, assuming that there are three different configurations of industrial components, each industrial component may include 2 possible grab poses, i.e., the desired target pose after correction.
For example, assuming that the initial three-dimensional position of the target object is P, the three-dimensional corner is O, and P and O represent the desired target position set and target corner set of the target object, respectively, that is, the termination condition of the grabbing process of the robot arm may be expressed as: p ∈ P ∈ O ∈ O. The method encourages to find a better strategy, so that the mechanical arm can adopt fewer actions, and the target object can be grabbed to the target pose meeting the termination condition with more accurate success rate.
Further, in this process, the robot arm may satisfy the following conditions in the grasping process:
(1) during the whole grabbing process, a target object (such as an industrial part) is located in a preset area (such as an operation cabin);
(2) in order to reduce damage to the target object and/or the end effector of the robot arm, the force and velocity of the end effector of the robot arm may be kept within a preset range. Also, movement of the target object from top to bottom is discouraged;
(3) the end effector of the robotic arm is discouraged from contacting objects other than the target object (e.g., the walls of the operating chamber) to reduce collisions of the end effector of the robotic arm with other objects.
After modeling the grabbing task, the target object can be grabbed and trained based on a frequency-adaptive reinforcement learning method. Fig. 5 shows a frame schematic of a grabbing method according to an embodiment of the present disclosure. Compared with the reinforcement learning method in the related art, the grabbing operation is to output the action according to a certain fixed frequency, for example, the frequency of 5Hz can be fixed, and a new action is generated every 0.2 seconds, in the process, a large number of repeated actions are easily generated, and the inference times of reinforcement learning are increased. The frequency-adaptive reinforcement learning model (see fig. 5) of the embodiment of the disclosure not only can generate actions based on the current state, but also can generate a suitable duration for each action, and can adaptively combine repeated actions, thereby reducing the inference times of a reinforcement learning algorithm and improving the capturing efficiency and the capturing accuracy.
The reinforcement learning research is a sequential decision process of interaction between an agent as a subject and an environment as an object, namely a process of applying a strategy by the agent, and an action is applied in the current state to obtain a new state and a reward. The frequency-adaptive based reinforcement learning framework shown in FIG. 5 can be normalized to one action lasting N time steps (N ∈ N) + ,N + A set of positive integers and the value of n for each action may be different), which is a mathematical model that models the randomness strategy and reward of an agent (e.g., the frequency-adaptive reinforcement learning model of fig. 5) in an environment whose state has markov properties, i.e., when a random entity is presentThe conditional probability distribution of a process' future state depends only on the current state, given the current state and all past states.
In one possible implementation, the Markov decision process may include elements such as a state space, an action space, a state transition probability, an initial state distribution, a reward function, a discount coefficient, and so on.
The state space of the reinforcement learning model represents a state set of the mechanical arm, the action space represents a Cartesian product of the action set of the mechanical arm and a duration set, the state transition probability represents the probability of a next state after a single step track corresponding to an action is repeatedly executed according to the duration in each state, the initial state distribution represents the probability distribution for generating the initial state, the reward function represents evaluation after the single step track corresponding to the action is repeatedly executed according to the duration in each state, and the discount coefficient is used for adjusting the reward function.
For example, the Markov decision process may be represented as a tuple (S, A N) + ,P,d 0 ,R,γ)。
The state space S may represent a set of states of the environment in a markov decision process. For example, at time t, the state s of the environment t ∈S,s t =(a t-3,t-2,t-1 ,q t-2,t-1,t ),s t Three first poses q which can be nearest to the target object t-2,t-1,t Three recent historical actions a with the robot arm t-3,t-2,t-1 And (4) forming. The present disclosure is directed to state s t Without specific limitation, multiple historical poses and multiple historical states associated with a time sequence may be utilized.
Motion space AxN + Set A and duration set N representing actions of a robot arm + The cartesian product of (a).
Where the set of actions a may be a set of points that the end effector of the robotic arm can reach. In order to maintain a spatial domain in which the motion field of the end effector of the robot arm can be within a certain appropriate region regardless of the shape of the target object, a spherical coordinate system (i.e., a three-dimensional orthogonal coordinate system in which the position of a point in three-dimensional space is represented by spherical coordinates) may be selected as a representationThe spherical space of (2) is used as a motion domain, and the radius of the spherical space is within a preset threshold value in a period limiting mode. For example, assume that at time t, action a t ∈A,a t =(β t ,θ t ,φ t ),β t Representing the distance, theta, from the origin of the coordinates t Represents the elevation angle phi t Representing azimuth, and limiting beta t <β max ,β max Can represent twice the length of the target object, the present disclosure is on beta max The specific value of (a) is not limited and can be set according to the actual application scene.
Wherein the duration is set N + Can be a set of positive integers, each action a t Can correspond to N time steps, N belongs to N + The step size of each time step is the same. For example, if the step size of the time step is T seconds, action a t The corresponding duration is nxT seconds, the value of n is different, and the action a t The corresponding durations may vary and the present disclosure is not limited to n values.
Probability of state transition P(s) t+1 |s t A, n) represents in each state s t The next state s after the single step track corresponding to the action is repeatedly executed according to the duration t+1 Probability of (i.e. current state s under strategy pi) t Transition to a new state s t+1 The probability of (c).
d 0 Representing the distribution s of initial states 0 ~d 0 Can be used to set an initial state s 0 。
And R represents a reward function used for determining evaluation after the single-step track corresponding to the action repeatedly executed according to the duration in each state.
In one possible implementation, the reward function includes at least one of a success reward function, a sport reward function, a security reward function, and a time penalty reward function. The successful reward function is used for evaluating whether the second pose of the target object belongs to an expected pose set after the single-step track corresponding to the action is repeatedly executed according to the duration in each state; the motion reward function is used for evaluating the distance between the second pose of the target object and an expected pose after the single step track corresponding to the action repeatedly executed according to the duration in each state is evaluated; the safety reward function is used for evaluating the danger degree of the mechanical arm action in each state; the time penalty reward function is used for restricting the reasoning times of the reinforcement learning model.
For example, the reward function R may be expressed as:
R=100r v +0.2r px +0.2r po +10r fx +5r fz -0.1 (1)
it should be appreciated that the reward function R may include a successful reward function R v Function of motion reward r px 、r px Safety reward function r fx 、r fz At least one of the time penalty reward functions-0.1 (constant function), the function terms included in the reward function R and the weighting coefficients of the function terms are not particularly limited in the present disclosure and may be set according to the actual application scenario, the weighting coefficients of the function terms in the formula (1) are merely illustrative, and the present disclosure is not limited thereto.
In the formula (1), r v Representing a successful reward function, can be expressed as:
where q represents the second pose of the target object and G represents the set of desired poses of the target object. A discrete reward can be returned to the strategy by checking whether the second pose q of the target object is in the set G of expected poses after the single-step trajectory corresponding to the action is repeatedly executed according to the duration in each state. For example, where the second pose q of the target object belongs to the set of expected poses G, the success reward function r v Has a value of 1; in the case that the second pose q of the target object does not belong to the set of expected poses G, the successful reward function r v The value of (d) is 0.
In formula (1), the motion reward function may include a motion reward function r for evaluating a three-dimensional position of the target object px And a system for evaluating a three-dimensional corner of a target objectDynamic reward function r po 。
By setting the successful reward function, the target object can be rapidly and accurately grabbed to the target pose.
Wherein the sport reward function r px Can be expressed as:
in the formula (3), v c Representing the moving speed of the target object (i.e., the moving speed of the end effector of the robot arm); x is the number of t-1 Representing the three-dimensional position of the target object before the single-step track corresponding to the action is repeatedly executed according to the duration in each state; x is the number of t Representing the three-dimensional position of the target object after the single-step track corresponding to the action is repeatedly executed according to the duration in each state;representing the desired three-dimensional position of the target object.
Wherein the sport reward function r po Can be expressed as:
in the formula (4), the first and second groups,representing the three-dimensional rotation angle (orientation) ρ of the target object and the desired three-dimensional rotation angle after repeatedly performing the single-step trajectory corresponding to the motion in each state for the durationThe deviation of (2).
By setting the motion reward function, the moving speed of the target object is favorably enabled to be within a reasonable range, and the target object is favorably and accurately moved to the target pose quickly.
At publicIn equation (1), the security reward function may include a security reward function r for locating the target object in a predetermined space (e.g., a space within the operation cabin) fx And a safety reward function r for adjusting the force application range of the mechanical arm end effector fz And the like, a reward function for evaluating the degree of risk of the robot arm action in each state.
Wherein the secure reward function r fx Can be expressed as:
in formula (5), x represents the three-dimensional position of the target object, and x min Representing the minimum operating space boundary, x max Representing the maximum operating space boundary.
By setting a secure reward function r fx In the whole grabbing process, the target object can be located in the operation cabin, and the target object is located outside the operation cabin and is prevented from colliding with the wall of the operation cabin.
Wherein the secure reward function r fz Can be expressed as:
in the formula (6), f z Normal force applied to the z-axis of the target object (e.g., normal force of the end effector to the target object contact plane), f, representing the end effector of the robotic arm max Representing the maximum normal force that the mechanical end effector can apply to the target object.
By setting a secure reward function r fz The pressing operation of the end effector of the mechanical arm on the target object is reduced, and the damage of the target object caused by the fact that the end effector of the mechanical arm exerts excessive force on the target object is reduced.
γ represents the discount coefficient, γ ∈ [0,1), which can be used to adjust the weight of each prize in the jackpot.
The time penalty reward function can be a constant function, for example, a constant function with a value of-0.1, and the specific value of the time penalty reward function is not limited in the disclosure and can be set according to a specific application scene.
By setting the time punishment reward function, the number of inference times of the reinforcement learning model is favorably reduced, namely, the number of states is favorably reduced in the process that the mechanical arm moves the target object from the initial pose to the target pose, namely the number of actions adopted by the mechanical arm.
In an example, a tuple (S, A N) according to a Markov decision process + ,P,d 0 R, γ), the optimal strategy of the frequency-adaptive reinforcement learning model can be solved by maximizing the cumulative reward, namely:
in the formula (7), γ represents a discount coefficient for balancing the long-term benefit and the short-term benefit, the larger the value of γ, the more the long-term benefit is weighted, the benefit is the accumulation of the rewards, R represents a reward function, s t Strategy pi: S → A X N of frequency-adaptive reinforcement learning model representing state at t moment + Can be regarded as input as state s t The output is action a t And action a t The meaning of the function of the corresponding n time steps can be expressed as the state s at time t t Action a to be selected t And the duration corresponding to the action, wherein the state s t E.g., S, action a t ∈A,n∈N + 。π * Representing an optimal strategy.
It should be understood that equation (7) can be solved using reinforcement learning algorithms such as Q-learning (Q-learning), SARSA algorithm, Policy Gradient (Policy Gradient) and its derivatives, and the disclosure is not limited thereto.
In this way, the (S, A N) tuples based on the Markov decision process + ,P,d 0 R, γ), which is advantageous for determining the optimal strategy of the frequency-adaptive reinforcement learning model more efficiently, and thus for utilizing the futureThe front state inputs the optimal strategy to output the action of the mechanical arm and the duration of the action.
In one possible implementation, the actions of the robot arm and the duration corresponding to the actions are determined by a policy according to the state of the robot arm. The strategy comprises a multi-layer perceptron in a reinforcement learning model obtained by a reinforcement learning method, and the strategy is used for adjusting the action and the duration corresponding to the action to adapt to each state.
For example, suppose that the optimal strategy pi of the frequency adaptive reinforcement learning model is solved according to the formula (7) * Using the strategy pi * And grabbing the target object, and grabbing the target object to a target position. For example, assume the initial pose of the target object is s 0 The initial first pose can be set as s 0 Input strategy pi * (s 0 ) Act a of outputting 0 And the action a 0 Corresponding n 0 At each time step, the mechanical arm is at n 0 Repeatedly performing action a within a time step 0 Grabbing the target object to a second position S 1 (ii) a Then, the second position S is set 1 (i.e. the first pose at the next moment) input strategy pi * (s 1 ) Act a of outputting 1 And the action a 1 Corresponding n 1 At each time step, the mechanical arm is at n 1 Repeatedly performing action a within a time step 0 Grabbing the target object to a second position S 2 (i.e., the first pose at the next time); and analogizing in turn until the target object is grabbed to the target pose S t 。
The strategy of the frequency-adaptive reinforcement learning model may be formed by a multi-layer perceptron, which may be a multi-layer neural network including an input layer, an output layer, and at least one hidden layer.
The n time steps corresponding to the action may be an index of an element with the largest value in the vector selected by the argmax algorithm, that is, an index corresponding to an action that maximizes the total reward sum of the action and subsequent actions. For example, assuming that the output vector is (1,1.5,3.6,2.0), then the maximum element index is 3 and the action corresponds to 3 time steps.
By the method, each action in the grabbing process and the corresponding duration time of each action can be determined by utilizing the strategy, so that the repeated actions can be combined in a self-adaptive manner, the inference times of the reinforcement learning algorithm are saved, and the industrial production efficiency is improved.
In summary, in the embodiments of the present disclosure, the first pose of the current target object may be continuously observed at each time (each round), the motion of the mechanical arm and the duration corresponding to the motion are determined, and then the mechanical arm repeatedly executes the motion within the duration to move the target object to the second pose. The target object may be moved from the initial pose to the target pose through multiple rounds of iterative execution, each action may correspond to a different duration.
Compared with the related technology, the mechanical arm needs to be configured with a flexible end effector (such as a clamping jaw, a multi-finger manipulator and the like), the grabbing task is mostly a daily life task with low requirement on accuracy, and the geometric structure of the object to be grabbed is simple and the like. The grabbing method disclosed by the embodiment of the disclosure can meet the application scene and is suitable for grabbing tasks in an industrial scene, the end effector of the mechanical arm can be a single-point touch effector, the target object can be moved to an expected three-dimensional position and an expected three-dimensional corner to improve grabbing accuracy, and the grabbing method is suitable for the situation that the geometric structure of the target object is complex.
Further, compared with the related art, the execution frequency of each action of the mechanical arm is the same, which results in a large number of repeated actions. The method disclosed by the embodiment of the invention is beneficial to adaptively combining repeated actions, saves the reasoning times of a reinforcement learning algorithm and improves the industrial production efficiency.
It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.
In addition, the present disclosure also provides a capturing apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the capturing methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions of the method portions are not repeated.
Fig. 6 shows a block diagram of a grasping apparatus according to an embodiment of the present disclosure, as shown in fig. 6, the apparatus including:
an obtaining module 61, configured to obtain a first pose of a target object, where the first pose includes a three-dimensional position of a current state of the target object and a three-dimensional corner of the current state of the target object;
a determining module 62, configured to determine, according to the first position and orientation of the target object, an action of the robot arm and a duration corresponding to the action, where the duration includes at least one time step;
a generating module 63, configured to generate a single-step trajectory of the mechanical arm according to the motion of the mechanical arm and the first position of the target object, where the single-step trajectory is a trajectory generated for one time step;
an execution module 64 for repeatedly executing the single step trajectory according to the duration to move the target object to a second pose, the second pose including a three-dimensional position of the target object next state and a three-dimensional corner of the target object next state.
In a possible implementation manner, the determining module 62 is configured to: determining the state of the mechanical arm according to the first pose of the target object, at least one historical pose of the target object and at least one historical action of the mechanical arm; and determining the action of the mechanical arm and the duration corresponding to the action according to the state of the mechanical arm.
In one possible implementation, the determining, according to the state of the robot arm, an action of the robot arm and a duration corresponding to the action includes: determining the action of the mechanical arm and the duration corresponding to the action by utilizing a strategy according to the state of the mechanical arm; wherein the policy is used to adjust the action and the duration corresponding to the action to accommodate each state.
In one possible implementation, the policy includes a multi-layer perceptron in a reinforcement learning model obtained by a reinforcement learning method, where a state space of the reinforcement learning model represents a state set of the robot arm, an action space represents a cartesian product of the action set and a duration set of the robot arm, a state transition probability represents a probability of a next state after a single-step trajectory corresponding to an action is repeatedly executed according to the duration in each state, an initial state distribution represents a probability distribution for generating an initial state, a reward function represents an evaluation after the single-step trajectory corresponding to the action is repeatedly executed according to the duration in each state, and a discount coefficient is used for adjusting the reward function.
In a possible implementation manner, the reward function includes at least one of a success reward function, a motion reward function, a safety reward function and a time penalty reward function, and the success reward function is used for evaluating whether the second pose of the target object belongs to an expected pose set after a single-step track corresponding to an action is repeatedly executed according to the duration in each state; the motion reward function is used for evaluating the distance between the second pose of the target object and an expected pose after the single step track corresponding to the action repeatedly executed according to the duration in each state is evaluated; the safety reward function is used for evaluating the danger degree of the mechanical arm action in each state; the time penalty reward function is used for restricting the reasoning times of the reinforcement learning model.
In a possible implementation manner, the executing module 64 is further configured to repeatedly execute the single step trajectory according to the duration, so that after the target object is moved to a second pose, the second pose of the target object is determined as the first pose of the target object, and the steps after acquiring the first pose of the target object are re-executed.
In a possible implementation manner, the generating module 63 is configured to: and generating the single-step motion trail of the mechanical arm by using the action of the mechanical arm and the first position of the target object by using an inverse dynamics method.
In one possible implementation, the robotic arm includes an end effector that includes a single-touch effector.
The method has specific technical relevance with the internal structure of the computer system, and can solve the technical problems of how to improve the hardware operation efficiency or the execution effect (including reducing data storage capacity, reducing data transmission capacity, improving hardware processing speed and the like), thereby obtaining the technical effect of improving the internal performance of the computer system according with the natural law.
In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.
Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium.
An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.
Embodiments of the present disclosure also provide a computer program product, which includes computer readable code or a non-volatile computer readable storage medium carrying computer readable code, when the computer readable code runs in a processor of an electronic device, the processor in the electronic device executes the above method.
The electronic device may be provided as a terminal, server, or other form of device.
Fig. 7 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or other terminal device.
Referring to fig. 7, electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as wireless network (Wi-Fi), second generation mobile communication technology (2G), third generation mobile communication technology (3G), fourth generation mobile communication technology (4G), long term evolution of universal mobile communication technology (LTE), fifth generation mobile communication technology (5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.
Fig. 8 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server or terminal device. Referring to fig. 8, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932 TM ) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X) TM ) Multi-user, multi-process computer operating system (Unix) TM ) Free and open native code Unix-like operating System (Linux) TM ) Open native code Unix-like operating System (FreeBSD) TM ) Or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be interpreted as a transitory signal per se, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or an electrical signal transmitted through an electrical wire.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, a product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'express consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is regarded as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization by modes of popping window information or asking a person to upload personal information of the person by himself, and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
Claims (11)
1. A method of grasping, comprising:
acquiring a first posture of a target object, wherein the first posture comprises a three-dimensional position of the current state of the target object and a three-dimensional corner of the current state of the target object;
determining the action of the mechanical arm and the duration corresponding to the action according to the first position of the target object, wherein the duration comprises at least one time step;
generating a single-step track of the mechanical arm according to the action of the mechanical arm and a first position of the target object, wherein the single-step track is a track generated for one time step;
repeatedly executing the single-step trajectory in accordance with the duration to move the target object to a second pose, the second pose including a three-dimensional position of a next state of the target object and a three-dimensional corner of the next state of the target object.
2. The method of claim 1, wherein determining the motion of the robotic arm and the duration of the motion corresponding to the first pose of the target object comprises:
determining the state of the mechanical arm according to the first pose of the target object, at least one historical pose of the target object and at least one historical action of the mechanical arm;
and determining the action of the mechanical arm and the duration corresponding to the action according to the state of the mechanical arm.
3. The method of claim 2, wherein said determining a motion of said robotic arm, and a duration corresponding to said motion, based on said state of said robotic arm comprises:
determining the action of the mechanical arm and the duration corresponding to the action by utilizing a strategy according to the state of the mechanical arm;
wherein the policy is used to adjust the action and the duration corresponding to the action to accommodate each state.
4. The method of claim 3, wherein the strategy comprises a multi-layered perceptron in a reinforcement learning model obtained using a reinforcement learning method,
the state space of the reinforcement learning model represents a state set of the mechanical arm, the action space represents a Cartesian product of the action set of the mechanical arm and a duration set, the state transition probability represents the probability of a next state after a single step track corresponding to an action is repeatedly executed according to the duration in each state, the initial state distribution represents the probability distribution for generating the initial state, the reward function represents evaluation after the single step track corresponding to the action is repeatedly executed according to the duration in each state, and the discount coefficient is used for adjusting the reward function.
5. The method of claim 4, wherein the reward function comprises at least one of a success reward function, a sport reward function, a security reward function, a time penalty reward function,
the successful reward function is used for evaluating whether the second pose of the target object belongs to an expected pose set after the single-step track corresponding to the action is repeatedly executed according to the duration in each state;
the motion reward function is used for evaluating the distance between the second pose of the target object and an expected pose after the single step track corresponding to the action repeatedly executed according to the duration in each state is evaluated;
the safety reward function is used for evaluating the danger degree of the mechanical arm action in each state;
the time penalty reward function is used for restricting the reasoning times of the reinforcement learning model.
6. The method of any one of claims 1-5, wherein after repeating performing the single-step trajectory according to the duration to move the target object to a second position, the method further comprises:
and determining the second pose of the target object as the first pose of the target object, and re-executing the steps of acquiring the first pose of the target object and the subsequent steps.
7. The method of any of claims 1-5, wherein generating a single-step trajectory of the robotic arm from the motion of the robotic arm and a first pose of the target object comprises:
and generating the single-step motion trail of the mechanical arm by using the action of the mechanical arm and the first position of the target object by using an inverse dynamics method.
8. The method of any of claims 1-7, wherein the robotic arm comprises an end effector comprising a single-touch effector.
9. A grasping device, comprising:
the acquisition module is used for acquiring a first posture of a target object, wherein the first posture comprises a three-dimensional position of the current state of the target object and a three-dimensional corner of the current state of the target object;
the determining module is used for determining the action of the mechanical arm and the duration corresponding to the action according to the first position and the orientation of the target object, wherein the duration comprises at least one time step;
a generating module, configured to generate a single-step trajectory of the robot arm according to the motion of the robot arm and a first position of the target object, where the single-step trajectory is a trajectory generated for one time step;
and the execution module is used for repeatedly executing the single-step track according to the duration so as to enable the target object to move to a second pose, wherein the second pose comprises a three-dimensional position of the next state of the target object and a three-dimensional corner of the next state of the target object.
10. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the memory-stored instructions to perform the method of any one of claims 1 to 8.
11. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210456336.3A CN114833830A (en) | 2022-04-27 | 2022-04-27 | Grabbing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210456336.3A CN114833830A (en) | 2022-04-27 | 2022-04-27 | Grabbing method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114833830A true CN114833830A (en) | 2022-08-02 |
Family
ID=82568657
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210456336.3A Withdrawn CN114833830A (en) | 2022-04-27 | 2022-04-27 | Grabbing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114833830A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150231786A1 (en) * | 2014-02-19 | 2015-08-20 | Toyota Jidosha Kabushiki Kaisha | Movement control method for mobile robot |
CN106956261A (en) * | 2017-04-11 | 2017-07-18 | 华南理工大学 | A kind of man-machine interaction mechanical arm system and method with security identification zone |
CN108803499A (en) * | 2017-04-28 | 2018-11-13 | 发那科株式会社 | control device and machine learning device |
CN111325768A (en) * | 2020-01-31 | 2020-06-23 | 武汉大学 | Free floating target capture method based on 3D vision and simulation learning |
CN112307898A (en) * | 2020-09-27 | 2021-02-02 | 中国电力科学研究院有限公司 | Data glove action segmentation method and device in virtual reality environment |
US20210201156A1 (en) * | 2018-05-18 | 2021-07-01 | Google Llc | Sample-efficient reinforcement learning |
WO2021218683A1 (en) * | 2020-04-30 | 2021-11-04 | 华为技术有限公司 | Image processing method and apparatus |
CN113807460A (en) * | 2021-09-27 | 2021-12-17 | 北京地平线机器人技术研发有限公司 | Method and device for determining intelligent body action, electronic equipment and medium |
-
2022
- 2022-04-27 CN CN202210456336.3A patent/CN114833830A/en not_active Withdrawn
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150231786A1 (en) * | 2014-02-19 | 2015-08-20 | Toyota Jidosha Kabushiki Kaisha | Movement control method for mobile robot |
CN106956261A (en) * | 2017-04-11 | 2017-07-18 | 华南理工大学 | A kind of man-machine interaction mechanical arm system and method with security identification zone |
CN108803499A (en) * | 2017-04-28 | 2018-11-13 | 发那科株式会社 | control device and machine learning device |
US20210201156A1 (en) * | 2018-05-18 | 2021-07-01 | Google Llc | Sample-efficient reinforcement learning |
CN111325768A (en) * | 2020-01-31 | 2020-06-23 | 武汉大学 | Free floating target capture method based on 3D vision and simulation learning |
WO2021218683A1 (en) * | 2020-04-30 | 2021-11-04 | 华为技术有限公司 | Image processing method and apparatus |
CN112307898A (en) * | 2020-09-27 | 2021-02-02 | 中国电力科学研究院有限公司 | Data glove action segmentation method and device in virtual reality environment |
CN113807460A (en) * | 2021-09-27 | 2021-12-17 | 北京地平线机器人技术研发有限公司 | Method and device for determining intelligent body action, electronic equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10293483B2 (en) | Apparatus and methods for training path navigation by robots | |
US11413763B2 (en) | Charging robot and control method thereof | |
EP3779773B1 (en) | Body posture prediction method, apparatus, device, and storage medium | |
Wang et al. | Controlling object hand-over in human–robot collaboration via natural wearable sensing | |
US9844873B2 (en) | Apparatus and methods for haptic training of robots | |
CN108986801B (en) | Man-machine interaction method and device and man-machine interaction terminal | |
US12131529B2 (en) | Virtual teach and repeat mobile manipulation system | |
US11029803B2 (en) | Robot | |
Bohez et al. | Sensor fusion for robot control through deep reinforcement learning | |
US20200016767A1 (en) | Robot system and control method of the same | |
Scheiderer et al. | Bézier curve based continuous and smooth motion planning for self-learning industrial robots | |
CN111589138B (en) | Action prediction method, device, equipment and storage medium | |
Jiang et al. | A data-efficient goal-directed deep reinforcement learning method for robot visuomotor skill | |
KR102464906B1 (en) | Electronic device, server and method thereof for recommending fashion item | |
CN114833830A (en) | Grabbing method and device, electronic equipment and storage medium | |
CN116079703A (en) | Robot teaching method, apparatus, device and computer readable storage medium | |
Coskun et al. | Robotic Grasping in Simulation Using Deep Reinforcement Learning | |
EP4335598A1 (en) | Action abstraction controller for fully actuated robotic manipulators | |
Guo et al. | Adaptive Admittance Control for Physical Human-Robot Interaction based on Imitation and Reinforcement Learning | |
Maeda et al. | View-based programming with reinforcement learning for robotic manipulation | |
Phiri et al. | Enhanced robot learning using fuzzy q-learning & context-aware middleware | |
Rohith et al. | Systematic Review of Mobile Robot Control by Using Human-Robot Interaction | |
WO2023057518A1 (en) | Demonstration-driven reinforcement learning | |
CN118769260A (en) | Flexible hand self-adaptive grabbing method based on multi-mode fusion imitation learning | |
Lorthioir et al. | Enhancing Teleoperation in Dynamic Environments: A Novel Shared Autonomy Framework Leveraging Multimodal Language Models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220802 |