[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114851201B - Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction - Google Patents

Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction Download PDF

Info

Publication number
CN114851201B
CN114851201B CN202210551360.5A CN202210551360A CN114851201B CN 114851201 B CN114851201 B CN 114851201B CN 202210551360 A CN202210551360 A CN 202210551360A CN 114851201 B CN114851201 B CN 114851201B
Authority
CN
China
Prior art keywords
grabbing
mechanical arm
camera
coordinate system
voxel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210551360.5A
Other languages
Chinese (zh)
Other versions
CN114851201A (en
Inventor
欧林林
徐靖
禹鑫燚
周利波
魏岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202210551360.5A priority Critical patent/CN114851201B/en
Publication of CN114851201A publication Critical patent/CN114851201A/en
Application granted granted Critical
Publication of CN114851201B publication Critical patent/CN114851201B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1679Programme controls characterised by the tasks executed
    • B25J9/1692Calibration of manipulator
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a six-degree-of-freedom visual closed-loop grabbing method for a mechanical arm based on TSDF three-dimensional reconstruction. The method comprises the following steps: step one: calibrating a mechanical arm base coordinate system and a camera coordinate system by a Zhengyou camera calibration method and an ArUco Markers calibration method; step two: performing three-dimensional scene reconstruction on the acquired image information by using a TSDF function so as to reduce environmental noise points between objects; step three: establishing a reinforcement learning network model; step four: the predicted grabbing gesture of the end effector is back projected to a three-dimensional reconstruction scene, and the predicted grabbing quality is judged; step five: the robot is used for completing the grabbing movement of the mechanical arm through forward and backward kinematics; step six: training the reinforcement learning model to enable the mechanical arm to complete grabbing actions; the invention overcomes the defects of the prior art, and provides the mechanical arm six-degree-of-freedom visual closed-loop grabbing system which is easy to realize and high in applicability and is based on TSDF three-dimensional reconstruction, the system reduces the environmental noise caused by shielding and stacking among objects, and the depth error caused by the interference of a single visual sensor is reduced. In addition, the system can realize rapid real-time target detection and complete grabbing actions while ensuring high precision.

Description

Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction
Technical Field
The invention belongs to a six-degree-of-freedom object grabbing technology for a mechanical arm in an unstructured environment, and particularly relates to the field of robot grabbing control.
Background
Target capture has always been a robotically important issue, but there has always been no satisfactory corresponding solution. The multifunctional control system of the robot arm is beneficial to flexibly controlling the end effector with high freedom degree in a three-dimensional space, and realizing flexible grabbing of objects and dynamic response capability to environmental changes. Recently, with the rapid development of deep learning and reinforcement learning and the construction of corresponding systems, various feasible ideas are provided for the intelligent grabbing mode of the mechanical arm.
Although the robotic arm 6 degree of freedom grab control has high practical value for more complex operating environments, most data driven grab algorithms currently only grab (4 dof: x, y, z, yaw) top down in a simple desktop setup, or use physical analysis to grasp the proper grab pose. However, due to the limited three-dimensional mobile gripping, the algorithm is very limited to the application scenario, and the robot arm end effector can only approach the object from a vertically downward direction, in some cases, it cannot grip the object along this direction. For example, it is difficult for the gripper to grasp a horizontally placed plate with a hand grip. Furthermore, these methods based on physical analysis not only need to calculate a large amount of experimental environmental data, but also need to calculate an accurate target model for estimation, thus resulting in a large amount of time and calculation cost, but also tend to be not general for the physical model of the unstructured target object, so that most algorithms are difficult to directly apply to the grabbing of novel objects, and the robustness and generalization of the system are poor.
Thereby, the mechanical arm 6 degree of freedom (6-DOF) grabbing idea is proposed. Although, the PointnetGPD proposed by Hongzhuo Liang uses a two-step sample-and-evaluate method, a reliable grip gesture is determined by evaluating a large number of samples. However, this method is certainly quite time consuming. (Hongzhuo Liang et al "Pointnetgpd: detecting grasp configurations from point sets". In:2019International Conference on Robotics and Automation (ICRA). IEEE.2019, pp.3629-3635). Florence et al make gesture transitions from existing grip gestures. However, the success rate of the algorithm for unrecorded target objects and object geometries in the data set is relatively low, so that the algorithm cannot be popularized to new application scenes. (Peter Florence, lucas manueli, and Russ tendrake. "Dense Object Nets: learning Dense Visual Object Descriptors By and For Robotic Manipulation". In: conference on Robot Learning (CoRL) (2018)). Recently, a point cloud scene is reconstructed from a partial view acquired by an RGBD camera, and then after object features are extracted by using a designed Pointnet++ network model, 6Dof grabbing gesture regression is directly completed, and grabbing planning is performed on the reconstructed point cloud scene. (P.Ni, W.Zhang, X.Zhu, and Q.Cao, "Learning an end-to-end spatial grasp generation and refinement algorithm from simulation," Machine Vision and Applications, vol.32, no.1, pp.1-12,2021).
Disclosure of Invention
The invention overcomes the defects of the prior art and provides a six-degree-of-freedom grabbing method for a mechanical arm, which is easy to realize and high in applicability. The invention designs a feature extraction network and a Policy Gradient reinforcement learning framework, and can realize the positioning and six-degree-of-freedom plan grabbing of an unrecorded object in a training data set.
The invention takes an image sequence as input, and firstly constructs a mechanical arm coordinate system and a camera coordinate system through a Zhengyou camera calibration method and an ArUco Markers calibration method. Color and depth information of the capture console is then acquired using a Intel Rrealsense D435 depth camera. The acquired image information is subjected to three-dimensional scene reconstruction through Truncated Signed Distance Function (TSDF) to reduce the environmental noise points from object to object. And intercepting a two-dimensional overlook image of the reconstructed scene from top to bottom, taking the two-dimensional overlook image as input of a design grabbing neural network model, carrying out coding and feature extraction on the input image by the model, and inputting a predicted grabbing area of the current environment and a grabbing gesture of a mechanical arm. The network model structure is shown in figure 1. The model adopts a Renet full convolution feature extraction network architecture, and learns to extract unstructured features from RGB-D part observation. The model channels the color features and depth features, using 2 additional Batch Normalization (BN), nonlinear activation functions (ReLU), and 1 x 1Convolutional Layers interleaving. And then, feature embedding of the same pixel unit is aggregated by using Average Pooling (Average Pooling), linear up-sampling is performed, so that convolution features are shared in different positions and directions, output pictures are consistent with input resolutions, coordinates of a hotspot block with the maximum grabbing probability are selected, the coordinates are corresponding to hemispherical three-dimensional grabbing gesture points (see fig. 2) defined before an experiment, and the corresponding 6-dimensional gesture is used as a predicted 6-degree-of-freedom grabbing gesture. And then back projecting the predicted grabbing gesture input by the model into a reconstructed three-dimensional scene, intercepting a depth image under the current gesture from the three-dimensional scene by using a TSDF function, and judging the grabbing quality according to the depth information at the two ends of the clamp. And finally, when the depth of the target coordinate point in the rendered image is smaller than or equal to the depth (15 cm) of fingertips at two ends of the clamp, inputting the predicted gesture into the robot positive kinematics to obtain a corresponding mechanical arm motion track, and completing the grabbing action of the mechanical arm. If the grabbing is successful, the value is assigned to the intelligent agent rewards 1, otherwise, the value is assigned to the intelligent agent rewards 0. The system grabbing flow chart is shown in fig. 3.
The invention discloses a six-degree-of-freedom visual closed-loop grabbing method of a mechanical arm based on TSDF three-dimensional reconstruction, which comprises the following specific steps:
step 1: zhang Zhengyou camera calibration method and ArUco Markers calibration method are used for calibrating a mechanical arm base coordinate system and a camera coordinate system:
firstly, an Intel D415 depth camera is vertically fixed at the tail end of a mechanical arm, so that the camera and a clamping jaw of an end effector are in fixed posture transformation, and a binocular camera can acquire image information of an object on a grabbing operation table. Then, the Zhang Zhengyou camera calibration method and the ArUco Markers calibration method calibrate the internal parameters and the external parameters of the camera, and a three-dimensional coordinate system with the base of the mechanical arm as an origin is constructed. The camera internal and external parameters calibration formulas are as follows:
wherein x is w ,y w ,z w Three-dimensional coordinates in a world coordinate system; x is x c ,y c ,z c The coordinates are under an image coordinate system; u, v is the pixel coordinate system; z m Is a point in the hypothetical camera coordinate system; p is an imaging point in the image coordinate system; f is the imaging distance; d, d x ,d y Coordinates of the pixel points in an image coordinate system; r, T are rotation matrix and translation matrix. Equation (1) is a world coordinate system to camera coordinate system equation. Equation (2) isCamera coordinate system to ideal image coordinate system formula. Equation (3) is a matrix of camera coordinates to pixel coordinates, i.e., an internal reference matrix.
Step 2: three-dimensional scene reconstruction is carried out on the acquired image information by using a TSDF function so as to reduce the environmental noise points between objects:
step 2.1: in the invention, a single sensor is adopted to acquire visual information in an actual experiment, and information (color information and depth information) captured by a binocular camera on a grabbing operation table is acquired through OpenCV.
Step 2.2: due to the light rays, the environmental factors such as mutual shielding among objects and the like cause larger noise points in the depth image acquired by the sensor, so that the grabbing prediction of the network model on the image is affected. Firstly, the spherical point sampling is carried out on the grabbing operation platform before each grabbing, and the depth error in a single image is reduced by sampling partial images for multiple times. And secondly, fusing the sampling point posture and the sampling image information into a three-dimensional reconstruction voxel grid by using a TSDF function. For a single voxel, not only the intuitive x, y, z coordinates are included, but the distance of the voxel to the outermost surface is represented by two other values (SDF and CAM). Where SDF is a signed distance function used for depth map fusion, it uses 0 to represent surface, positive value to represent model external points, negative value to represent model internal points, and the value is proportional to the distance of the point to the nearest surface. The surface is easily extracted as a zero crossing of the function.
Step 2.3: defining the coordinate system of the voxel grid as a world coordinate system, wherein the voxel coordinates can be expressed asThe pose of the camera is->The internal reference matrix of the camera is K.
The world will be seated according to the transformation matrix derived from ICP registrationVoxel grid under standard systemProjecting to a camera coordinate system, and converting to an image coordinate system I according to a camera internal parameter matrix K ij Where i e (0, 1,., n), j e (0, 1,., m).
After projecting the voxel grid into the image coordinate system, first, each voxel initialization SDF value needs to be calculated as shown in formula (18)
In the middle ofRepresenting the position information of the jth voxel in world coordinate system,/for>Representing the position information of the ith camera pose in world coordinate system, dep (I ij ) Representing the depth value of the jth voxel in the ith camera depth image coordinate system.
The sf value of each voxel is then truncated according to equation (19).
In tsdf i For the voxel truncated distance value, trunc represents the artificially set truncated distance, which can be understood as the error value of the depth information of the camera, if the error is larger, the value is set to be larger, otherwise, much information acquired by the depth camera may be lost, and the item trunc is set to be 1.
After clipping the sdf value for each voxel, the tsdf value for each voxel is recalculated according to equation (20):
wherein omega j (x, y, z) is the weight, ω, of the voxels in the global voxel grid of the current frame j-1 (x, y, z) is the weight of the voxel in the voxel grid of the previous frame, tsdf j Tsdf for the distance of voxels in the global data cube of the current frame to the object surface j-1 V is the distance from the voxel to the object surface in the global data cube of the previous frame .z Representing the z-axis coordinates of the voxel in the camera coordinate system, and D (u, v) represents the depth value at the current frame depth image (u, v).
Equation (9) calculates a weight value for each voxel.
w j =w j-1 +w j (9)
And rendering the environment information of the grabbing operation platform from top to bottom and taking the environment information as the input of the network model.
Step 3: establishing a reinforcement learning network model:
step 3.1: designing a mechanical arm grabbing strategy, and defining an action space for reinforcement learning: the invention designs a method for directly returning to a 6-DoF grabbing gesture by increasing network dimension by using a two-dimensional convolution network. Definition of gripper the gripping action is shown in figure 2.
Step 3.2: each element in the array is converted into an angle that the tail end of the mechanical arm rotates around three coordinate axes of x, y and z respectively, and a specific conversion formula is as follows:
r z =(best_pix_ind[2]*θ) (12)
wherein a is x Expressed as the rotation angle of the tail end of the mechanical arm around the x-axis, namely the tail endRoll angle of the actuator; b y The rotation angle of the tail end of the mechanical arm around the y axis is expressed as a pitch angle of the end effector; r is (r) z The yaw angle of the end effector is represented as the rotation angle of the tail end of the mechanical arm around the z axis;the bias values were set for the experiments.
Step 3.3: designing a characteristic extraction network model: the model adopts a Resnet full convolution feature extraction network architecture, and learns to extract unstructured features from RGB-D part observation. After the multi-layer feature extraction convolution layer extracts the features of the color image and depth image, the color features and depth features are channel connected by a Concat function, with 2 additional Batch Normalization (BN), nonlinear activation functions (ReLU), 1 x 1Convolutional Layers, and upsampling to keep the output thermodynamic diagram resolution consistent with the input image. And selecting the corresponding gesture in the action space corresponding to the maximum vector value coordinate in the tensor graph as the predicted grabbing gesture to output.
Step 3.4: forward reasoning is performed on the network by the following formula:
wherein equation (13) represents the expected return under state s, action a, where g t Representing the grabbing action taken at time t, s t Representing the state at time t,r t Indicating the return at time t; equation (14) represents the overall payback function of the network; equation (15) is a state distribution function; equation (16) represents a state-action function.
Step 3.5, designing a reinforcement learning network loss function, and adopting a calculation cross entropy function, so that the detection precision of an algorithm is further improved:
where τ=s 0 a 0 s 1 a 1 ...s n a n .. it represents a markov process.
Due toP r { a|s } = pi (s, a), and thus equation (18) can be obtained.
The weight update function is as follows:
wherein f ω S.times.A.fwdarw.R is a pairIs the approximate function of (1), when f ω Taking the minimum value, Δω=0, equation (21) can be derived
Step 3.6: setting reinforcement learning rewards: if the grabbing is successful, the value is assigned to the intelligent agent rewards 1, otherwise, the value is assigned to the intelligent agent rewards 0.
Step 4: the predicted grabbing gesture of the end effector is back projected into a three-dimensional reconstruction scene, and the predicted grabbing quality is judged: and back projecting the 6-degree-of-freedom grabbing gesture output by the network into a three-dimensional voxel scene established before grabbing, and intercepting a depth image under the predicted gesture through view cone projection, wherein the view cone projection is shown in figure 4. Wherein the viewing cone is a three-dimensional body whose position is related to the camera's extrinsic matrix, the shape of the viewing cone determines how the model is projected from pixel space into a plane. The perspective projection uses a pyramid as the viewing cone with the camera located at the apex of the pyramid. The pyramid is truncated by two front and rear planes to form a pyramid called the View Frustum, the model of which is shown in figure 5, where only the model located inside Frustum is visible. In this clipping space there are two planes, called near clipping plane (near clipping plane) and garden clipping plane (far clipping plane), respectively.
And (3) by comparing depth information at two ends of the clamp holder in the rendering image, deducing the distance position between the parallel clamping jaw and the object, thereby judging the grabbing quality of the predicted grabbing gesture and forming closed loop feedback of the mechanical arm 6 degree-of-freedom grabbing system.
Step 5: the mechanical arm grabbing movement is completed through forward and backward kinematics of the robot:
firstly, when the depth of two ends of the clamp holder in the depth image rendered in the step 4 is smaller than or equal to the fingertip depth of the clamp holder, that is, the predicted gesture of the clamp holder is positioned on two sides of the object, the predicted gesture is judged to be grabbed. Then, the control mechanical arm executes a grabbing plan according to the predicted grabbing gesture: and solving 6 joint angle degrees of the mechanical arm in the current state through inverse kinematics of the robot. And then, outputting the predicted 6-dimensional grabbing gesture of the tail end of the mechanical arm by the reinforcement learning network in the step 3, inputting the predicted 6-dimensional grabbing gesture into the positive kinematics of the robot, and obtaining the movement track of the end effector of the mechanical arm from the current gesture to the predicted gesture point. After the mechanical arm end effector moves to the predicted grabbing gesture, the robot sends a clamp closing signal to try to grab. After the clamp is closed, the end effector moves vertically upwards by 15cm, an up-moved depth image of the effector is obtained from a Intel Rrealsense D435 camera, and whether the actual grabbing is successful or not is judged by calculating the depth at the two ends of the clamp. When the grabbing is successful, the reinforcement learning model rewards are assigned as 1; and when the grabbing fails, the reinforcement learning network rewards are assigned with value 0.
Step 6: training the reinforcement learning model to enable the mechanical arm to complete grabbing actions:
and (5) continuously repeating the steps 2 to 5 in the CoppelianSim reduced simulation environment, and updating the model weight parameters by reducing the loss function in the reinforcement learning model through reinforcement learning rewards. Finally, the trained weight parameters are imported into a real mechanical arm UR5, experimental debugging is carried out, the steps 1 to 5 are repeated, and the six-degree-of-freedom closed-loop grabbing task of the mechanical arm is completed.
In summary, the method has the advantage that the 6-dimensional grabbing gesture of the output mechanical arm is directly predicted by increasing the dimension of the two-dimensional neural network. Meanwhile, the invention carries out three-dimensional reconstruction on the complex grabbing environment through the TSDF function, reduces the environmental noise caused by shielding and stacking among objects, and reduces the depth error caused by the interference of a single vision sensor. Meanwhile, a reinforcement learning network is designed aiming at the method, the defects of complicated calculation of the grabbing gesture of the mechanical arm through traditional physical deduction and high time cost are overcome, and the problem that the 6-DOF grabbing gesture of the mechanical arm cannot be applied to an unrecorded target object in a data set is solved. The method not only ensures higher grabbing success rate of the mechanical arm model, but also is beneficial to generalization of reinforcement learning, namely, the method can be applied to new grabbing objects, time-consuming calculation of the traditional method is solved, and instability of the input part point cloud model is reduced. The invention realizes the functions of real-time detection of the grabbing object and 6-DOF grabbing.
Drawings
FIG. 1 is a block diagram of a network model in the present invention;
FIG. 2 is a diagram of hemispherical three-dimensional gripping attitude angles defined by the present invention;
FIG. 3 is a system capture flow diagram of the present invention;
FIG. 4 is a perspective cone projection view;
FIG. 5 is a View Frustum model diagram;
FIG. 6 is an illustration of an experimental grabbing example;
Detailed Description
The invention is further described below with reference to the drawings.
The invention relates to a six-freedom-degree visual closed-loop grabbing method for a mechanical arm based on TSDF three-dimensional reconstruction, wherein an example of object grabbing is shown in fig. 6, and the specific process is as follows:
step 1: zhang Zhengyou camera calibration method and ArUco Markers calibration method are used for calibrating a mechanical arm base coordinate system and a camera coordinate system:
firstly, an Intel D415 depth camera is vertically fixed at the tail end of a mechanical arm, so that the camera and a clamping jaw of an end effector are in fixed posture transformation, and a binocular camera can acquire image information of an object on a grabbing operation table. Then, the Zhang Zhengyou camera calibration method and the ArUco Markers calibration method calibrate the internal parameters and the external parameters of the camera, and a three-dimensional coordinate system with the base of the mechanical arm as an origin is constructed. The camera internal and external parameters calibration formulas are as follows:
wherein x is w ,y w ,z w Three-dimensional coordinates in a world coordinate system; x is x c ,y c ,z c The coordinates are under an image coordinate system; u, v is the pixel coordinate system; z m Is a point in the hypothetical camera coordinate system; p is an imaging point in the image coordinate system; f is the imaging distance; d, d x ,d y Coordinates of the pixel points in an image coordinate system; r, T is rotation matrix and peaceAnd (5) matrix shifting. Equation (1) is a world coordinate system to camera coordinate system equation. Equation (2) is a camera coordinate system to ideal image coordinate system equation. Equation (3) is a matrix of camera coordinates to pixel coordinates, i.e., an internal reference matrix.
Step 2: three-dimensional scene reconstruction is carried out on the acquired image information by using a TSDF function so as to reduce the environmental noise points between objects:
step 2.1: in the invention, a single sensor is adopted to acquire visual information in an actual experiment, and information (color information and depth information) captured by a binocular camera on a grabbing operation table is acquired through OpenCV.
Step 2.2: due to the light rays, the environmental factors such as mutual shielding among objects and the like cause larger noise points in the depth image acquired by the sensor, so that the grabbing prediction of the network model on the image is affected. Firstly, the spherical point sampling is carried out on the grabbing operation platform before each grabbing, and the depth error in a single image is reduced by sampling partial images for multiple times. And secondly, fusing the sampling point posture and the sampling image information into a three-dimensional reconstruction voxel grid by using a TSDF function. For a single voxel, not only the intuitive x, y, z coordinates are included, but the distance of the voxel to the outermost surface is represented by two other values (SDF and CAM). Where SDF is a signed distance function used for depth map fusion, it uses 0 to represent surface, positive value to represent model external points, negative value to represent model internal points, and the value is proportional to the distance of the point to the nearest surface. The surface is easily extracted as a zero crossing of the function. If the depth value D (u, V) at the i-th frame depth image D (u, V) is not 0, comparing D (u, V) with z in the voxel camera coordinates V (x, y, z), if D (u, V) is greater than z, indicating that this voxel is closer to the camera for reconstructing the scene outer surface; if D (u, v) is less than z, this voxel is illustrated as being farther from the camera, reconstructing the scene interior surface. I.e. the voxel point is located outside the surface (closer to the camera side), the SDF value is positive and the voxel is inside the surface, the SDF value is negative.
Step 2.3: defining the coordinate system of the voxel grid as a world coordinate system, wherein the voxel coordinates can be expressed asThe pose of the camera is->The internal reference matrix of the camera is K.
The voxel grid in the world coordinate system is based on the transformation matrix obtained by ICP registrationProjecting to a camera coordinate system, and converting to an image coordinate system I according to a camera internal parameter matrix K ij Where i e (0, 1,., n), j e (0, 1,., m).
After projecting the voxel grid into the image coordinate system, first, each voxel initialization SDF value needs to be calculated as shown in formula (18)
In the middle ofRepresenting the position information of the jth voxel in world coordinate system,/for>Representing the position information of the ith camera pose in world coordinate system, dep (I ij ) Representing the depth value of the jth voxel in the ith camera depth image coordinate system.
The sf value of each voxel is then truncated according to equation (19).
The trunc represents a manually set truncation distance, which can be understood as an error value of the depth information of the camera, if the error is larger, the value is set to be larger, otherwise, a lot of information acquired by the depth camera can be lost, and the trunc is set to be 1.
After clipping the sdf value for each voxel, the tsdf value for each voxel is recalculated according to equation (20):
wherein omega j (x, y, z) is the weight, ω, of the voxels in the global voxel grid of the current frame j-1 (x, y, z) is the weight of the voxel in the voxel grid of the previous frame, tsdf j Tsdf for the distance of voxels in the global data cube of the current frame to the object surface j-1 V is the distance from the voxel to the object surface in the global data cube of the previous frame .z Representing the z-axis coordinates of the voxel in the camera coordinate system, and D (u, v) represents the depth value at the current frame depth image (u, v).
Equation (9) calculates a weight value for each voxel.
w j =w j-1 +w j (9)
And rendering the environment information of the grabbing operation platform from top to bottom and taking the environment information as the input of the network model.
Step 3: establishing a reinforcement learning network model:
step 3.1: designing a mechanical arm grabbing strategy, and defining an action space for reinforcement learning: the object capture is typically divided into two capture location areas and three-dimensional rotation amounts, which are predicted by two network models, respectively. The invention designs and uses a two-dimensional convolution network to directly return to the 6-DoF capturing gesture by increasing the network dimension. The invention provides a single object grabbing action space of 16 x 28 actions, wherein 16 is the rotation action (22.5 degrees) of an end effector around a z coordinate axis, and 28 x 28 is the rotation action quantity of the end effector on an x coordinate axis and a y coordinate axis. Definition of gripper the gripping action is shown in figure 2.
Step 3.2: each element in the array is converted into an angle that the tail end of the mechanical arm rotates around three coordinate axes of x, y and z respectively, and a specific conversion formula is as follows:
r z =(best_pix_ind[2]*θ) (12)
wherein a is x The rotation angle of the tail end of the mechanical arm around the x axis is expressed as the roll angle of the end effector; b y The rotation angle of the tail end of the mechanical arm around the y axis is expressed as a pitch angle of the end effector; r is (r) z The rotation angle of the tail end of the mechanical arm around the z axis is shown as the yaw angle of the end effector.
In the actual capture example, the deviation value is brought inθ=180/16, and the predicted grip gesture is calculated.
Step 3.3: designing a characteristic extraction network model: the model adopts a Resnet full convolution feature extraction network architecture, and learns to extract unstructured features from RGB-D part observation. After the multi-layer feature extraction convolution layer extracts the features of the color image and depth image, the color features and depth features are channel connected by a Concat function, with 2 additional Batch Normalization (BN), nonlinear activation functions (ReLU), 1 x 1Convolutional Layers, and upsampling to keep the output thermodynamic diagram resolution consistent with the input image. And selecting the corresponding gesture in the action space corresponding to the maximum vector value coordinate in the tensor graph as the predicted grabbing gesture to output.
Step 3.4: forward reasoning is performed on the network by the following formula:
wherein equation (13) is expressed in state s The expected return under action a, where g t Representing the grabbing action taken at time t, s t Indicating the state at time t, r t Indicating the return at time t; equation (14) represents the overall payback function of the network; equation (15) is a state distribution function; equation (16) represents a state-action function.
Step 3.5, designing a reinforcement learning network loss function, and adopting a calculation cross entropy function, so that the detection precision of an algorithm is further improved:
where τ=s 0 a 0 s 1 a 1 ...s n a n .. it represents a markov process.
Due toP r { a|s } = pi (s, a), and thus equation (18) can be obtained.
The weight update function is as follows:
wherein f ω S.times.A.fwdarw.R is a pairIs the approximate function of (1), when f ω Taking the minimum value, Δω=0, equation (21) can be derived
Step 3.6: setting reinforcement learning rewards: if the grabbing is successful, the value is assigned to the intelligent agent rewards 1, otherwise, the value is assigned to the intelligent agent rewards 0.
Step 4: the predicted grabbing gesture of the end effector is back projected into a three-dimensional reconstruction scene, and the predicted grabbing quality is judged: and back projecting the 6-degree-of-freedom grabbing gesture output by the network into a three-dimensional voxel scene established before grabbing, and intercepting a depth image under the predicted gesture through view cone projection, wherein the view cone projection is shown in figure 4. Wherein the viewing cone is a three-dimensional body whose position is related to the camera's extrinsic matrix, the shape of the viewing cone determines how the model is projected from pixel space into a plane. The perspective projection uses a pyramid as the viewing cone with the camera located at the apex of the pyramid. The pyramid is truncated by two front and rear planes to form a pyramid called the View Frustum, the model of which is shown in figure 5, where only the model located inside Frustum is visible. In this clipping space there are two planes, called near clipping plane (near clipping plane) and garden clipping plane (far clipping plane), respectively.
And (3) by comparing depth information at two ends of the clamp holder in the rendering image, deducing the distance position between the parallel clamping jaw and the object, thereby judging the grabbing quality of the predicted grabbing gesture and forming closed loop feedback of the mechanical arm 6 degree-of-freedom grabbing system.
Step 5: the mechanical arm grabbing movement is completed through forward and backward kinematics of the robot:
firstly, when the depth of two ends of the clamp holder in the depth image rendered in the step 4 is smaller than or equal to the fingertip depth of the clamp holder, that is, the predicted gesture of the clamp holder is positioned on two sides of the object, the predicted gesture is judged to be grabbed. Then, the control mechanical arm executes a grabbing plan according to the predicted grabbing gesture: and solving 6 joint angle degrees of the mechanical arm in the current state through inverse kinematics of the robot. And then, outputting the predicted 6-dimensional grabbing gesture of the tail end of the mechanical arm by the reinforcement learning network in the step 3, inputting the predicted 6-dimensional grabbing gesture into the positive kinematics of the robot, and obtaining the movement track of the end effector of the mechanical arm from the current gesture to the predicted gesture point. After the mechanical arm end effector moves to the predicted grabbing gesture, the robot sends a clamp closing signal to try to grab. After the clamp is closed, the end effector moves vertically upwards by 15cm, an up-moved depth image of the effector is obtained from a Intel Rrealsense D435 camera, and whether the actual grabbing is successful or not is judged by calculating the depth at the two ends of the clamp. When the grabbing is successful, the reinforcement learning model rewards are assigned as 1; and when the grabbing fails, the reinforcement learning network rewards are assigned with value 0.
Step 6: training the reinforcement learning model to enable the mechanical arm to complete grabbing actions:
and (3) leading the trained model weight parameters into a real mechanical arm UR5, performing experimental debugging, and repeating the steps 1 to 5 to finish the six-degree-of-freedom closed-loop grabbing task of the mechanical arm. Through 200 rounds of grabbing tests, the grabbing success rate of the mechanical arm six-degree-of-freedom grabbing method based on TSDF for unstructured objects is 85.3% (10 target objects are randomly stacked). When the target object is increased to 15 random stacking situations, the grabbing success rate of the invention still has 81.5 percent. In summary, the invention designs the reinforcement learning network, overcomes the defects of complicated calculation of the grabbing gesture of the mechanical arm and high time cost by the traditional physical derivation, and solves the problem that the 6-DOF grabbing gesture of the mechanical arm cannot be applied to an unrecorded target object in a data set. The method not only ensures higher grabbing success rate of the mechanical arm model, but also is beneficial to generalization of reinforcement learning, can be applied to unrecorded grabbing objects, and solves the problems of time-consuming calculation and instability of the input part point cloud model of the traditional method.

Claims (4)

1. A six-degree-of-freedom visual closed-loop grabbing method of a mechanical arm based on TSDF three-dimensional reconstruction is characterized in that: the method comprises the following steps:
step 1: calibrating a mechanical arm base coordinate system and a camera coordinate system by a Zhengyou camera calibration method and an ArUco Markers calibration method;
step 2: performing three-dimensional scene reconstruction on the acquired image information by using a TSDF function so as to reduce environmental noise points between objects; the method comprises the following specific steps:
2.1): the method comprises the steps that visual information is obtained through a single sensor, information captured by a binocular camera on a grabbing operation table is obtained through OpenCV, and the captured information comprises color information and depth information;
2.2): because of the environmental factors of mutual shielding between light rays and objects, larger noise points exist in the depth image acquired by the sensor, so that the grabbing prediction of the network model on the image is affected; firstly, sampling spherical points of a grabbing operation table before each grabbing, and reducing depth errors in a single image by sampling partial images for multiple times; secondly, fusing the sampling point posture and the sampling image information into a three-dimensional reconstruction voxel grid by using a TSDF function; for a single voxel, not only the intuitive x, y, z coordinates are contained, but also the distance from the voxel to the most surface is represented by two values of SDF and CAM, wherein SDF is a sign distance function used for fusion of depth maps, 0 is used for representing the surface, positive value is used for representing the external point of the model, negative value is used for representing the internal point of the model, and the value is proportional to the distance from the point to the nearest surface; the surface is easy to extract as zero crossing points of functions; if the depth value D (u, V) at the i-th frame depth image D (u, V) is not 0, comparing D (u, V) with z in the voxel camera coordinates V (x, y, z), if D (u, V) is greater than z, indicating that this voxel is closer to the camera for reconstructing the scene outer surface; if D (u, v) is less than z, this voxel is farther from the camera, reconstructing the scene interior surface; the voxel point is located outside the surface and closer to one side of the camera, and the SDF value is a positive value; if the voxel is within the surface, then the SDF value is negative;
2.3): defining the coordinate system of the voxel grid as a world coordinate system, and expressing the voxel coordinates asThe pose of the camera is->The internal reference matrix of the camera is K;
the voxel grid in the world coordinate system is based on the transformation matrix obtained by ICP registrationProjecting to a camera coordinate system, and converting to an image coordinate system I according to a camera internal parameter matrix K ij Where i e (0, 1,., n), j e (0, 1,., m);
after the voxel grid is projected to the image coordinate system, first, the initialization SDF value of each voxel needs to be calculated, as shown in formula (6)
In the middle ofRepresenting the position information of the jth voxel in world coordinate system,/for>Representing the position information of the ith camera pose in world coordinate system, dep (I ij ) Representing a depth value of the jth voxel in an ith camera depth image coordinate system;
truncating the sf value of each voxel according to formula (7);
in tsdf i For the voxel cut-off distance value, the trunc represents the manually set cut-off distance, which can be understood as the error value of the depth information of the camera, if the error is larger, the trunc is set to be larger, otherwise, a lot of information acquired by the depth camera can be lost, and the trunc is set to be 1;
after cutting off the sdf value of each voxel, the tsdf value of each voxel is recalculated according to formula (8):
wherein omega j (x, y, z) is the weight, ω, of the voxels in the global voxel grid of the current frame j-1 (x, y, z) is the weight of the voxel in the voxel grid of the previous frame, tsdf j Tsdf for the distance of voxels in the global data cube of the current frame to the object surface j-1 V is the distance from the voxel to the object surface in the global data cube of the previous frame .z Representing the z-axis coordinates of the voxel in the camera coordinate system, D (u, v) representing the depth value at the current frame depth image (u, v);
equation (9) calculates a weight value for each voxel;
w j =w j-1 +w j (9)
rendering environment information of the grabbing operation table from top to bottom and taking the environment information as input of a network model;
step 3: establishing a reinforcement learning network model;
step 4: the predicted grabbing gesture of the end effector is back projected to a three-dimensional reconstruction scene, and the predicted grabbing quality is judged; the method comprises the following specific steps:
the predicted grabbing gesture of the end effector is back projected into a three-dimensional reconstruction scene, and the predicted grabbing quality is judged: back projecting the 6-degree-of-freedom grabbing gesture output by the network into a three-dimensional voxel scene established before grabbing, and intercepting a depth image under the predicted gesture through view cone projection; the distance positions between the parallel clamping jaws and the object are inferred by comparing depth information at two ends of the clamp holder in the rendering image, so that the grabbing quality of the predicted grabbing gesture is judged, and closed loop feedback of a mechanical arm 6 degree-of-freedom grabbing system is formed;
step 5: the robot is used for completing the grabbing movement of the mechanical arm through forward and backward kinematics; the method comprises the following specific steps:
when the depth of two ends of the clamp holder in the rendered depth image is smaller than or equal to the fingertip depth of the clamp holder, namely, the predicted gesture of the clamp holder is positioned on two sides of the object, and the predicted gesture can be grasped; then, the control mechanical arm executes a grabbing plan according to the predicted gesture: solving 6 joint angle degrees of the mechanical arm in the current state through inverse kinematics of the robot; then, outputting the predicted 6-dimensional grabbing gesture of the tail end of the mechanical arm by the reinforcement learning network model in the step 3, inputting the predicted 6-dimensional grabbing gesture into the positive kinematics of the robot, and obtaining the moving track of an end effector of the mechanical arm from the current gesture to a predicted gesture point; when the mechanical arm end effector moves to the predicted gesture, the robot sends a signal for closing the clamp holder to try to perform grabbing action; after the clamp is closed, the end effector moves up vertically by 15cm, an up-moved depth image of the end effector is obtained from the binocular camera, and whether actual grabbing is successful or not is judged by calculating the depth at the two ends of the clamp; when the grabbing is successful, the reinforcement learning model rewards are assigned as 1; when the grabbing fails, the reinforcement learning network rewards are assigned to 0;
step 6: and performing reinforcement learning model training to enable the mechanical arm to complete grabbing actions.
2. The mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction of claim 1, wherein the mechanical arm six-degree-of-freedom visual closed-loop grabbing method is characterized by comprising the following steps of: the specific steps of the step 1 are as follows:
firstly, vertically fixing a binocular camera at the tail end of a mechanical arm, wherein the binocular camera adopts an Intel D415 depth camera, so that the camera and an end effector are in fixed posture transformation, and the binocular camera can acquire image information of an object on a grabbing operation table; the method comprises the steps of using an end effector as a clamping effector, then calibrating internal parameters and external parameters of a camera by adopting a Zhang Zhengyou camera calibration method and an ArUco Markers calibration method, and constructing a three-dimensional coordinate system with a mechanical arm base as an origin; the camera internal and external parameters calibration formulas are as follows:
wherein x is w ,y w ,z w Three-dimensional coordinates in a world coordinate system; x is x c ,y c ,z c The coordinates are under an image coordinate system; u, v is the pixel coordinate system; (x) m ,y m ,z m ) Is a point in the hypothetical camera coordinate system; (x) p ,y p ,z p ) Imaging points in an image coordinate system; f is the imaging distance; d, d x ,d y Coordinates of the pixel points in an image coordinate system; r, T is a rotation matrix and a translation matrix; the formula (1) is a formula from a world coordinate system to a camera coordinate system, the formula (2) is a formula from a camera coordinate system to an ideal image coordinate system, and the formula (3) is a matrix from the camera coordinate system to a pixel coordinate system, namely an internal reference matrix.
3. The mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction of claim 1, wherein the mechanical arm six-degree-of-freedom visual closed-loop grabbing method is characterized by comprising the following steps of: the specific steps of the step 3 are as follows:
3.1): designing a mechanical arm grabbing strategy, and defining an action space for reinforcement learning: a two-dimensional convolution network is designed and used, the 6-DoF grabbing gesture is directly returned by increasing the network dimension so as to be different from a method for dividing the target grabbing into two grabbing position areas and three-dimensional rotation quantity, and the grabbing areas and the three-dimensional rotation quantity are respectively predicted by two network models; defining a grabbing action space of a single object as 16 x 28 actions, wherein 16 is the rotation action of the end effector around a z coordinate axis, and 28 x 28 is the rotation action quantity of the end effector on x and y coordinate axes;
3.2): each element in the array is converted into an angle that the tail end of the mechanical arm rotates around three coordinate axes of x, y and z respectively, and a specific conversion formula is as follows:
a x =((best_pix_ind[0]-14)*30/28)-pi (10)
b y =((best_pix_ind[1]-14)*30/28) (11)
r z =(best_pix_ind[2]*180/16) (12)
wherein a is x The rotation angle of the tail end of the mechanical arm around the x axis is expressed as the roll angle of the end effector; b y The rotation angle of the tail end of the mechanical arm around the y axis is expressed as a pitch angle of the end effector; r is (r) z The rotation angle of the tail end of the mechanical arm around the z axis is represented as a yaw angle of the end effector, and θ is an experimental set deviation value;
3.3): designing a characteristic extraction network model: the model adopts a Resnet full convolution feature extraction network architecture, and learns to extract unstructured features from RGB-D part observation; after the multi-layer feature extraction convolution layer extracts the features of the color image and the depth image, the color features and the depth features are connected through a Concat function in a channel mode, 2 extra Batch Normalization, nonlinear activation functions and 1×1Convolutional Layers are adopted for interleaving, and the resolution of an output thermodynamic diagram is kept consistent with that of an input image through up-sampling;
selecting a corresponding gesture in an action space corresponding to the maximum vector value coordinate in the tensor graph as a predicted grabbing gesture to output;
3.4): forward reasoning is performed on the network by the following formula:
wherein equation (13) represents the expected return under state s, action a, where g t Representing the grabbing action taken at time t, s t Indicating the state at time t, r t Indicating the return at time t; equation (14) represents the overall payback function of the network; equation (15) is a state distribution function; equation (16) represents a state-action function;
3.5): designing a reinforcement learning network loss function, and adopting a calculation cross entropy loss function, thereby further improving the detection precision of an algorithm:
where τ=s 0 a 0 s 1 a 1 ...s n a n .. it represents a Markov process;
due toP r { a|s } = pi (s, a), so that equation (18) is obtained;
the weight update function is as follows:
wherein f ω : S.times.A.fwdarw.R is a pairIs the approximate function of (1), when f ω Taking the minimum value, Δω=0, equation (20) can be derived;
3.6): setting reinforcement learning rewards: if the grabbing is successful, the value is assigned to the intelligent agent rewards 1, otherwise, the value is assigned to the intelligent agent rewards 0.
4. The mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction of claim 1, wherein the mechanical arm six-degree-of-freedom visual closed-loop grabbing method is characterized by comprising the following steps of: the specific steps of the step 6 are as follows:
continuously repeating the steps 2 to 5 in a CoppelianSim reduced simulation environment, and updating model weight parameters by reducing a cross entropy loss function in a reinforcement learning network model through reinforcement learning rewards; finally, the trained weight parameters are imported into a real mechanical arm UR5, experimental debugging is carried out, the steps 1 to 5 are repeated, and the six-degree-of-freedom closed-loop grabbing task of the mechanical arm is completed.
CN202210551360.5A 2022-05-18 2022-05-18 Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction Active CN114851201B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210551360.5A CN114851201B (en) 2022-05-18 2022-05-18 Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210551360.5A CN114851201B (en) 2022-05-18 2022-05-18 Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction

Publications (2)

Publication Number Publication Date
CN114851201A CN114851201A (en) 2022-08-05
CN114851201B true CN114851201B (en) 2023-09-05

Family

ID=82638419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210551360.5A Active CN114851201B (en) 2022-05-18 2022-05-18 Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction

Country Status (1)

Country Link
CN (1) CN114851201B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115890744B (en) * 2022-12-15 2024-07-26 武汉理工大学 TD 3-based mechanical arm 6-DOF object manipulation training method and system
CN115984388B (en) * 2023-02-28 2023-06-06 江西省智能产业技术创新研究院 Spatial positioning precision evaluation method, system, storage medium and computer
CN116524217B (en) * 2023-07-03 2023-08-25 北京七维视觉传媒科技有限公司 Human body posture image matching method and device, electronic equipment and storage medium
CN117549307B (en) * 2023-12-15 2024-04-16 安徽大学 Robot vision grabbing method and system in unstructured environment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9205562B1 (en) * 2014-08-29 2015-12-08 Google Inc. Integration of depth points into a height map
CN106548466A (en) * 2015-09-16 2017-03-29 富士通株式会社 The method and apparatus of three-dimensional reconstruction object
CN110310362A (en) * 2019-06-24 2019-10-08 中国科学院自动化研究所 High dynamic scene three-dimensional reconstruction method, system based on depth map and IMU
CN112476434A (en) * 2020-11-24 2021-03-12 新拓三维技术(深圳)有限公司 Visual 3D pick-and-place method and system based on cooperative robot
CN112801988A (en) * 2021-02-02 2021-05-14 上海交通大学 Object grabbing pose detection method based on RGBD and deep neural network
CN113192128A (en) * 2021-05-21 2021-07-30 华中科技大学 Mechanical arm grabbing planning method and system combined with self-supervision learning
CN113752255A (en) * 2021-08-24 2021-12-07 浙江工业大学 Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9205562B1 (en) * 2014-08-29 2015-12-08 Google Inc. Integration of depth points into a height map
CN106548466A (en) * 2015-09-16 2017-03-29 富士通株式会社 The method and apparatus of three-dimensional reconstruction object
CN110310362A (en) * 2019-06-24 2019-10-08 中国科学院自动化研究所 High dynamic scene three-dimensional reconstruction method, system based on depth map and IMU
CN112476434A (en) * 2020-11-24 2021-03-12 新拓三维技术(深圳)有限公司 Visual 3D pick-and-place method and system based on cooperative robot
CN112801988A (en) * 2021-02-02 2021-05-14 上海交通大学 Object grabbing pose detection method based on RGBD and deep neural network
CN113192128A (en) * 2021-05-21 2021-07-30 华中科技大学 Mechanical arm grabbing planning method and system combined with self-supervision learning
CN113752255A (en) * 2021-08-24 2021-12-07 浙江工业大学 Mechanical arm six-degree-of-freedom real-time grabbing method based on deep reinforcement learning

Also Published As

Publication number Publication date
CN114851201A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN114851201B (en) Mechanical arm six-degree-of-freedom visual closed-loop grabbing method based on TSDF three-dimensional reconstruction
JP6546618B2 (en) Learning apparatus, learning method, learning model, detection apparatus and gripping system
CN113524194A (en) Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning
CN108994832B (en) Robot eye system based on RGB-D camera and self-calibration method thereof
CN109658460A (en) A kind of mechanical arm tail end camera hand and eye calibrating method and system
JP2019508273A (en) Deep-layer machine learning method and apparatus for grasping a robot
CN111085997A (en) Capturing training method and system based on point cloud acquisition and processing
CN109840940A (en) Dynamic three-dimensional reconstruction method, device, equipment, medium and system
CN111801198A (en) Hand-eye calibration method, system and computer storage medium
CN114912287A (en) Robot autonomous grabbing simulation system and method based on target 6D pose estimation
CN108416428A (en) A kind of robot visual orientation method based on convolutional neural networks
CN112512755A (en) Robotic manipulation using domain-invariant 3D representations predicted from 2.5D visual data
CN114387513A (en) Robot grabbing method and device, electronic equipment and storage medium
CN114347008A (en) Industrial robot-based method and device for grabbing workpieces out of order and intelligent terminal
CN114700949B (en) Mechanical arm smart grabbing planning method based on voxel grabbing network
CN115890744B (en) TD 3-based mechanical arm 6-DOF object manipulation training method and system
CN115194774A (en) Binocular vision-based control method for double-mechanical-arm gripping system
CN116616812A (en) NeRF positioning-based ultrasonic autonomous navigation method
JP7349423B2 (en) Learning device, learning method, learning model, detection device and grasping system
CN110722547A (en) Robot vision stabilization under model unknown dynamic scene
Bai et al. Kinect-based hand tracking for first-person-perspective robotic arm teleoperation
Peng et al. Improved Image-based Pose Regressor Models for Underwater Environments
Li A Design of Robot System for Rapidly Sorting Express Carton with Mechanical Arm Based on Computer Vision Technology
Berti et al. Human-robot interaction and tracking using low cost 3d vision systems
Zhang et al. Research on AUV Recovery by Use of Manipulator Based on Vision Servo

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant