CN111897327B - Multi-mobile robot control/dispatch model acquisition method and device and electronic equipment - Google Patents
Multi-mobile robot control/dispatch model acquisition method and device and electronic equipment Download PDFInfo
- Publication number
- CN111897327B CN111897327B CN202010675357.5A CN202010675357A CN111897327B CN 111897327 B CN111897327 B CN 111897327B CN 202010675357 A CN202010675357 A CN 202010675357A CN 111897327 B CN111897327 B CN 111897327B
- Authority
- CN
- China
- Prior art keywords
- mobile robot
- information
- position information
- dispatch model
- agent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000033001 locomotion Effects 0.000 claims abstract description 82
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 28
- 230000002787 reinforcement Effects 0.000 claims abstract description 15
- 238000004590 computer program Methods 0.000 claims description 22
- 238000004088 simulation Methods 0.000 claims description 18
- 230000007246 mechanism Effects 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 11
- 230000005540 biological transmission Effects 0.000 claims 1
- 230000008569 process Effects 0.000 description 5
- 238000009825 accumulation Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
The invention provides a method, a device and electronic equipment for acquiring a control/dispatch model of a plurality of mobile robots, wherein the method comprises the steps of acquiring initial position information and identity identification information of each mobile robot; acquiring target position information; acquiring a multi-mobile robot dispatch model; inputting initial position information, identification information, and target position information into a multi-mobile robot dispatch model to obtain destination information and motion path information assigned to each mobile robot; transmitting destination information and motion path information to a corresponding mobile robot; wherein the multi-mobile robot dispatch model is a model for assigning destinations and planning motion paths for the multi-mobile robot based on a multi-agent reinforcement learning algorithm; therefore, position assignment and real-time motion planning can be completed rapidly, so that a plurality of mobile robots can act simultaneously to complete the position assignment task without collision.
Description
Technical Field
The present invention relates to the field of robotics, and in particular, to a method and an apparatus for acquiring a control/dispatch model of a multi-mobile robot, and an electronic device.
Background
With the development of technology and the deep application of robot technology, the multi-mobile robot is applied to more and more scenes, and the complexity and the diversity of the scenes place higher demands on the control of the multi-mobile robot. Wherein the control of the multi-mobile robot is focused on the position assignment and motion planning of the multi-mobile robot.
In the traditional multi-mobile machine control method, the states of the front formation and the rear formation are usually determined when the positions are assigned, then an optimization algorithm is adopted to distribute the corresponding relation between the robots and the end positions, and when the movement planning is carried out, a method such as sequential assignment or preferential assignment is adopted, so that the mobile robots reach the distributed target positions, and the purposes of avoiding collision and collision are achieved. This approach is inefficient and cannot cope with complex scenes.
Therefore, a multi-mobile robot control method is needed to be capable of rapidly completing position assignment and real-time motion planning, so that a plurality of mobile robots can act at the same time to complete position assignment tasks without collision.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, an object of an embodiment of the present application is to provide a method, an apparatus and an electronic device for acquiring a control/dispatch model of a multi-mobile robot, which can rapidly complete position assignment and real-time motion planning, so that a plurality of mobile robots can act simultaneously to complete a position assignment task without collision.
In a first aspect, an embodiment of the present application provides a method for acquiring a dispatch model of a multi-mobile robot, including the steps of:
A1. establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and priority level information, obstacle center position information and collision radius information of an application scene and target position information of a plurality of mobile robots; taking each mobile robot as an intelligent body in the simulation scene, and setting the movable direction of each intelligent body;
A2. setting an Actor network and a Critic network for each intelligent agent based on a multi-intelligent-agent depth deterministic strategy gradient algorithm;
A3. repeating training the Actor network and the Critic network according to a preset reward and punishment mechanism, and taking the maximum total rewards of all the agents as targets until all the agents reach target positions, wherein the average rewards obtained by all the agents are not promoted;
the preset reward and punishment mechanism is as follows: rewarding based on the distance between the agent and the closest target location, the agent increasing a first negative value upon collision and a second negative value each time a search is performed before reaching the target location.
In the method for acquiring the multi-mobile robot dispatch model, in step A3, the priority level information of the mobile robot is used as the weight value of the corresponding intelligent agent; weighting and adding the reward values acquired by all the agents as the total reward; and the average prize is calculated using the formula:
wherein,for average rewarding->Prize value for the ith agent, < +.>Is the weight value of the ith agent.
In a second aspect, an embodiment of the present application provides a method for controlling a multi-mobile robot, including the steps of:
s1, acquiring initial position information and identity identification information of each mobile robot;
s2, acquiring target position information;
s3, acquiring a multi-mobile robot dispatch model; the multi-mobile robot dispatch model is obtained through the multi-mobile robot dispatch model obtaining method;
s4, inputting the initial position information, the identity identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot;
s5, the destination information and the motion path information are sent to the corresponding mobile robot.
In the multi-mobile robot control method, in step S5, motion path information is sent to the corresponding mobile robot step by step, and the motion path information of each step includes motion direction information and pushing force information.
The multi-mobile robot control method further includes, before step S1:
s0. sends a guide instruction to each mobile robot to guide each mobile robot to a preset initial position.
In a third aspect, an embodiment of the present application provides a multi-mobile robot dispatch model acquisition apparatus, including:
the first execution module is used for establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and priority level information, obstacle center position information and collision radius information of an application scene and target position information of a plurality of mobile robots; taking each mobile robot as an intelligent body in the simulation scene, and setting the movable direction of each intelligent body;
the second execution module is used for setting an Actor network and a Critic network for each intelligent agent based on a multi-intelligent-agent depth deterministic strategy gradient algorithm;
the third execution module is used for repeatedly training the Actor network and the Critic network according to a preset reward and punishment mechanism, taking the maximum total rewards of all the agents as targets until all the agents reach the target positions, and the average rewards obtained by all the agents are not promoted;
the preset reward and punishment mechanism is as follows: rewarding based on the distance between the agent and the closest target location, the agent increasing a first negative value upon collision and a second negative value each time a search is performed before reaching the target location.
In the multi-mobile robot dispatch model acquisition device, the third execution module takes the priority level information of the mobile robot as a weight value of a corresponding intelligent agent; weighting and adding the reward values acquired by all the agents as the total reward; and the average prize is calculated using the formula:
wherein,for average rewarding->Prize value for the ith agent, < +.>Is the weight value of the ith agent.
In a fourth aspect, embodiments of the present application provide a multi-mobile robot control device, including:
the first acquisition module is used for acquiring initial position information and identity identification information of each mobile robot;
the second acquisition module is used for acquiring target position information;
the third acquisition module is used for acquiring a multi-mobile robot dispatch model; the multi-mobile robot dispatch model is obtained through the multi-mobile robot dispatch model obtaining method;
a fourth acquisition module for inputting the initial position information, the identification information, and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot;
and the first sending module is used for sending the destination information and the motion path information to the corresponding mobile robot.
In the multi-mobile robot control device, the first sending module sends the motion path information to the corresponding mobile robot step by step, and the motion path information of each step comprises motion direction information and pushing force information.
In a fifth aspect, an embodiment of the present application provides an electronic device including a processor and a memory, where the memory stores a computer program, and the processor is configured to execute the multi-mobile robot control method by calling the computer program stored in the memory.
The beneficial effects are that:
the embodiment of the application provides a method, a device and electronic equipment for acquiring a multi-mobile robot control/dispatch model, wherein initial position information and identity identification information of each mobile robot are acquired; acquiring target position information; acquiring a multi-mobile robot dispatch model; inputting the initial position information, the identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot; transmitting the destination information and the movement path information to a corresponding mobile robot; wherein the multi-mobile robot dispatch model is a model for assigning destinations and planning motion paths for the multi-mobile robot based on a multi-agent reinforcement learning algorithm; therefore, position assignment and real-time motion planning can be completed rapidly, so that a plurality of mobile robots can act simultaneously to complete the position assignment task without collision.
Drawings
Fig. 1 is a flowchart of a method for controlling a multi-mobile robot according to an embodiment of the present application.
Fig. 2 is a block diagram of a multi-mobile robot control device according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Fig. 4 is a flowchart of a multi-mobile robot dispatch model acquisition method according to an embodiment of the present application.
Fig. 5 is a block diagram of a multi-mobile robot dispatch model acquisition device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
Referring to fig. 4, a method for obtaining a dispatch model of a multi-mobile robot according to an embodiment of the present application includes the steps of:
A1. establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and priority level information, obstacle center position information and collision radius information of an application scene and target position information of a plurality of mobile robots; taking each mobile robot as an intelligent body in a simulation scene, and setting the movable direction of each intelligent body;
A2. setting an Actor network and a Critic network for each intelligent agent based on a multi-intelligent-agent depth deterministic strategy gradient algorithm;
A3. repeatedly training an Actor network and a Critic network according to a preset reward and punishment mechanism, and taking the maximum total rewards of all the agents as targets until all the agents reach target positions, wherein the average rewards obtained by all the agents are not promoted;
the preset reward and punishment mechanism is as follows: the reward R is based on the distance between the agent and the closest target location, which increases by a first negative value R1 upon collision of the agent and by a second negative value R2 each time a search is performed before reaching the target location.
Due to the above-mentioned punishment and punishment mechanism and the setting of training targets, the multi-mobile robot dispatch model obtained by using the method is used for assigning destination positions for all mobile robots and planning motion paths, and can ensure that a plurality of mobile robots can act simultaneously and arrive at the destination without collision. In the multi-agent depth deterministic strategy gradient algorithm, because the Critic network continuously strengthens the action with large expected return in iterative rounds, in a multi-mobile robot position assignment scene (i.e. the simulation scene), the action can enable the state position to continuously improve towards the optimal state position in the scene, and the optimal state sequence, namely the optimal path for the scene, is finally obtained by utilizing the trained converged parameters and the stable model.
In some embodiments, in step A3, the priority level information of the mobile robot is used as the weight value of the corresponding agent; weight sum of rewarding value obtained by all agent) As a total prize; and the average prize is calculated using the formula:
wherein,for average rewarding->Prize value for the ith agent, < +.>Is the weight value of the ith agent.
By introducing the weight, mobile robots with high priority levels are preferentially assigned when tasks are assigned, and the user can preset the priority level of each mobile robot according to actual needs (for example, the priority level can be represented by numbers 1-10, and the higher the numerical value, the higher the priority level), so that the assignment result is more in line with the user's expectations.
In step A1, the set movable direction of each agent includes a hold, an up-shift, a down-shift, a left-shift and a right-shift. In some embodiments, when the Actor network and the Critic network are repeatedly trained in the step A3, the change of the movement state of the intelligent agent is realized by applying a force f to the intelligent agent in a specified direction; therefore, when the obtained motion path comprises multi-step motions (the motion state is changed once to be one step), the motion direction and the driving force of each step of motion can be obtained, and when the mobile robot needs to move according to the planned path, the mobile robot only needs to execute step by step according to the motion direction and the driving force of each step.
In the training process of the step A3, the distance between the intelligent agent and other intelligent agents is calculated, if the distance is smaller than the sum of the collision distances of the two intelligent agents, the intelligent agent is considered to collide with the other intelligent agents, and the rewarding value of the intelligent agent is increased by a first negative value R1; and meanwhile, calculating the distance between the intelligent body and the obstacle, and if the distance is smaller than the sum of the collision distances of the intelligent body and the obstacle, considering that the intelligent body collides with the obstacle, and increasing the rewarding value of the intelligent body by a first negative value R1.
From the above, the multi-mobile robot dispatch model acquired by the multi-mobile robot dispatch model acquisition method has the following advantages:
1. the multi-agent reinforcement learning algorithm is applied to position assignment and motion planning of the mobile robot, and the motion planning is completed while the position assignment is realized, so that the efficiency is higher compared with the mode of separately designing the position assignment and the motion planning in the traditional method;
2. weight is introduced during the calculation of the reward accumulation, so that the algorithm can be corresponding to the priority of the mobile robot in practice, and the method has more practical significance;
3. for a complex environment, a corresponding obstacle model is built only according to the central position of an obstacle and the collision radius in a simulation scene, so that the method is suitable for the complex scene, and is larger in application space and higher in expandability.
Referring to fig. 5, the embodiment of the present application further provides a mobile robot dispatch model obtaining device, which includes a first execution module 1, a second execution module 2, and a third execution module 3;
the first execution module 1 is used for establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and priority level information, obstacle center position information and collision radius information of an application scene and target position information of a plurality of mobile robots; taking each mobile robot as an intelligent body in a simulation scene, and setting the movable direction of each intelligent body;
the second execution module 2 is used for setting an Actor network and a Critic network for each intelligent agent based on a multi-intelligent-agent depth deterministic strategy gradient algorithm;
the third execution module 3 is configured to perform repeated training on the Actor network and the Critic network according to a preset punishment mechanism, and take the maximum total rewards of all the agents as a target until each agent has reached a target position, where the average rewards obtained by all the agents are not increased;
the preset reward and punishment mechanism is as follows: the reward R is based on the distance between the agent and the closest target location, increasing a first negative value R1 when the agent collides, and increasing a second negative value R2 each time a search is performed before the target location is reached.
In some embodiments, the third execution module 3 uses the priority level information of the mobile robot as the weight value of the corresponding agent; weight sum of rewarding value obtained by all agent) As a total prize; and the average prize is calculated using the formula:
wherein,for average rewarding->Prize value for the ith agent, < +.>Is the weight value of the ith agent.
From the above, the mobile robot dispatch model acquisition device has the following advantages:
1. the multi-agent reinforcement learning algorithm is applied to position assignment and motion planning of the mobile robot, and the motion planning is completed while the position assignment is realized, so that the efficiency is higher compared with the mode of separately designing the position assignment and the motion planning in the traditional method;
2. weight is introduced during the calculation of the reward accumulation, so that the algorithm can be corresponding to the priority of the mobile robot in practice, and the method has more practical significance;
3. for a complex environment, a corresponding obstacle model is built only according to the central position of an obstacle and the collision radius in a simulation scene, so that the method is suitable for the complex scene, and is larger in application space and higher in expandability.
In addition, the embodiment of the application also provides electronic equipment, which comprises a processor and a memory, wherein the memory stores a computer program, and the processor is used for executing the multi-mobile robot dispatch model acquisition method by calling the computer program stored in the memory.
The processor is electrically connected with the memory. The processor is a control center of the electronic device, and uses various interfaces and lines to connect various parts of the whole electronic device, and executes various functions of the electronic device and processes data by running or calling computer programs stored in the memory and calling data stored in the memory, thereby performing overall monitoring on the electronic device.
The memory may be used to store computer programs and data. The memory stores a computer program having instructions executable in the processor. The computer program may constitute various functional modules. The processor executes various functional applications and data processing by invoking computer programs stored in the memory 1.
In this embodiment, the processor in the electronic device loads the instructions corresponding to the processes of one or more computer programs into the memory according to the following steps, and the processor executes the computer programs stored in the memory, so as to implement various functions: establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and priority level information, obstacle center position information and collision radius information of an application scene and target position information of a plurality of mobile robots; taking each mobile robot as an intelligent body in a simulation scene, and setting the movable direction of each intelligent body; setting an Actor network and a Critic network for each intelligent agent based on a multi-intelligent-agent depth deterministic strategy gradient algorithm; repeatedly training an Actor network and a Critic network according to a preset reward and punishment mechanism, and taking the maximum total rewards of all the agents as targets until all the agents reach target positions, wherein the average rewards obtained by all the agents are not promoted; the preset reward and punishment mechanism is as follows: the reward R is based on the distance between the agent and the closest target location, which increases by a first negative value R1 upon collision of the agent and by a second negative value R2 each time a search is performed before reaching the target location.
From the above, the electronic device has the following advantages:
1. the multi-agent reinforcement learning algorithm is applied to position assignment and motion planning of the mobile robot, and the motion planning is completed while the position assignment is realized, so that the efficiency is higher compared with the mode of separately designing the position assignment and the motion planning in the traditional method;
2. weight is introduced during the calculation of the reward accumulation, so that the algorithm can be corresponding to the priority of the mobile robot in practice, and the method has more practical significance;
3. for a complex environment, a corresponding obstacle model is built only according to the central position of an obstacle and the collision radius in a simulation scene, so that the method is suitable for the complex scene, and is larger in application space and higher in expandability.
Referring to fig. 1, the embodiment of the present application further provides a method for controlling a multi-mobile robot, including the steps of:
s1, acquiring initial position information and identity identification information of each mobile robot;
s2, acquiring target position information;
s3, acquiring a multi-mobile robot dispatch model; the multi-mobile robot dispatch model is obtained through the multi-mobile robot dispatch model obtaining method;
s4, inputting the initial position information, the identity identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot;
s5, the destination information and the motion path information are sent to the corresponding mobile robot.
The multi-mobile robot control method is applied to a control server of the multi-mobile robot.
The multi-mobile robot dispatch model is used for realizing position assignment and motion path planning of a plurality of mobile robots, so that the position assignment and real-time motion planning can be completed rapidly, and the plurality of mobile robots can act simultaneously to complete the position assignment task without collision; when the weight is introduced in the reward accumulation calculation in the multi-mobile robot dispatch model, the algorithm can be corresponding to the priority of the mobile robot in practice, and the method has more practical significance; and the method is suitable for complex scenes, and has larger applicable space and stronger expandability.
The identification information may be number information of the mobile robot defined by the user, or may be a MAC address of a communication module of the mobile robot. In the multi-mobile robot dispatch model, each mobile robot is corresponding to each intelligent agent through the identity identification information.
In some embodiments, the multi-mobile robot dispatch model, while training, effects a change in the state of motion of the agent by applying a force f to the agent in a specified direction; thus, the obtained motion path comprises multi-step motions (the motion state is changed once to be one step), and the motion direction and the driving force of each step of motion can be obtained. Correspondingly, in step S5, the movement path information may be sent stepwise to the corresponding mobile robot, the movement path information of each step including movement direction information and thrust force information. The robot can reach the assigned destination according to the corresponding path only by gradually executing according to the movement direction and the driving force of each step, and the robot is simple in logic and convenient to realize.
For a pre-trained multi-mobile robot dispatch model, the initial position of each multi-mobile robot is required not to be too different from the initial position used in training, otherwise, retraining is required; thus, in some preferred embodiments, prior to step S1, further comprising:
s0. sends a guide instruction to each mobile robot to guide each mobile robot to a preset initial position.
The guiding instruction comprises a lookup table between mobile robot identity identification information and preset initial position coordinates and guiding triggering signals; after the mobile robot recognizes the guiding trigger signal, the mobile robot queries in a lookup table according to the self identity recognition information to obtain corresponding initial position coordinates, then moves to the initial position according to the initial position coordinates, and sends initial position information and identity recognition information to a server after reaching the initial position.
According to the control method of the multi-mobile robot, the initial position information and the identity identification information of each mobile robot are obtained; acquiring target position information; acquiring a multi-mobile robot dispatch model; inputting the initial position information, the identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot; transmitting the destination information and the movement path information to a corresponding mobile robot; wherein the multi-mobile robot dispatch model is a model for assigning destinations and planning motion paths for the multi-mobile robot based on a multi-agent reinforcement learning algorithm; therefore, position assignment and real-time motion planning can be completed rapidly, so that a plurality of mobile robots can act simultaneously to complete the position assignment task without collision.
Referring to fig. 2, the embodiment of the present application further provides a multi-mobile robot control device, which includes a first obtaining module 91, a second obtaining module 92, a third obtaining module 93, a fourth obtaining module 94, and a first sending module 95;
the first acquiring module 91 is configured to acquire initial position information and identity identification information of each mobile robot;
wherein, the second obtaining module 92 is configured to obtain target location information;
wherein, the third obtaining module 93 is configured to obtain a multi-mobile robot dispatch model; the multi-mobile robot dispatch model is obtained through the multi-mobile robot dispatch model obtaining method;
wherein the fourth obtaining module 94 is configured to input the initial position information, the identification information, and the target position information into the multi-mobile robot dispatch model to obtain destination information and motion path information assigned to each mobile robot;
the first sending module 95 is configured to send the destination information and the motion path information to the corresponding mobile robot.
In some embodiments, the first transmitting module 95 transmits the movement path information to the corresponding mobile robot in steps, and the movement path information of each step includes movement direction information and thrust force information.
As can be seen from the above, the multi-mobile robot control device obtains the initial position information and the identity identification information of each mobile robot; acquiring target position information; acquiring a multi-mobile robot dispatch model; inputting the initial position information, the identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot; transmitting the destination information and the movement path information to a corresponding mobile robot; wherein the multi-mobile robot dispatch model is a model for assigning destinations and planning motion paths for the multi-mobile robot based on a multi-agent reinforcement learning algorithm; therefore, position assignment and real-time motion planning can be completed rapidly, so that a plurality of mobile robots can act simultaneously to complete the position assignment task without collision.
Referring to fig. 3, the embodiment of the present application further provides an electronic device 100, including a processor 101 and a memory 102, where the memory 102 stores a computer program, and the processor 101 is configured to execute the above-mentioned multi-mobile robot control method by calling the computer program stored in the memory 102.
The processor 101 is electrically connected to the memory 102. The processor 101 is a control center of the electronic device 100, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or calling computer programs stored in the memory 102, and calling data stored in the memory 102, thereby performing overall monitoring of the electronic device.
Memory 102 may be used to store computer programs and data. The memory 102 stores a computer program having instructions executable in a processor. The computer program may constitute various functional modules. The processor 101 executes various functional applications and data processing by calling a computer program stored in the memory 102.
In this embodiment, the processor 101 in the electronic device 100 loads instructions corresponding to the processes of one or more computer programs into the memory 102 according to the following steps, and the processor 101 executes the computer programs stored in the memory 102, so as to implement various functions: acquiring initial position information and identity identification information of each mobile robot; acquiring target position information; acquiring a multi-mobile robot dispatch model; inputting the initial position information, the identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot; and transmitting the destination information and the motion path information to the corresponding mobile robot.
From the above, the electronic device obtains the initial position information and the identity identification information of each mobile robot; acquiring target position information; acquiring a multi-mobile robot dispatch model; inputting the initial position information, the identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot; transmitting the destination information and the movement path information to a corresponding mobile robot; wherein the multi-mobile robot dispatch model is a model for assigning destinations and planning motion paths for the multi-mobile robot based on a multi-agent reinforcement learning algorithm; therefore, position assignment and real-time motion planning can be completed rapidly, so that a plurality of mobile robots can act simultaneously to complete the position assignment task without collision.
In summary, although the present invention has been described with reference to the preferred embodiments, it is not limited thereto, and various modifications and variations can be made by those skilled in the art without departing from the spirit and scope of the present invention.
Claims (8)
1. A method for acquiring a dispatch model of a multi-mobile robot, comprising the steps of:
A1. establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and preset priority information of a plurality of mobile robots, obstacle center position information and collision radius information of an application scene and target position information; taking each mobile robot as an intelligent body in the simulation scene, and setting the movable direction of each intelligent body;
A2. setting an Actor network and a Critic network for each intelligent agent based on a multi-intelligent-agent depth deterministic strategy gradient algorithm;
A3. repeating training the Actor network and the Critic network according to a preset reward and punishment mechanism, and taking the maximum total rewards of all the agents as targets until all the agents reach target positions, wherein the average rewards obtained by all the agents are not promoted;
the preset reward and punishment mechanism is as follows: rewarding on the basis of the distance between the intelligent body and the nearest target position, wherein a first negative value is added when the intelligent body collides, and a second negative value is added every time exploration is executed before the intelligent body reaches the target position;
in the step A3, the priority level information of the mobile robot is used as the weight value of the corresponding intelligent agent; weighting and adding the reward values acquired by all the agents as the total reward; and the average prize is calculated using the formula:
;
wherein,for average rewarding->Prize value for the ith agent, < +.>Is the weight value of the ith agent.
2. A method of controlling a multi-mobile robot, comprising the steps of:
s1, acquiring initial position information and identity identification information of each mobile robot;
s2, acquiring target position information;
s3, acquiring a multi-mobile robot dispatch model; the multi-mobile robot dispatch model is obtained by the multi-mobile robot dispatch model acquisition method of claim 1;
s4, inputting the initial position information, the identity identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot;
s5, the destination information and the motion path information are sent to the corresponding mobile robot.
3. The multi-mobile robot control method according to claim 2, wherein in step S5, the movement path information is sent stepwise to the corresponding mobile robot, and the movement path information of each step includes movement direction information and thrust force information.
4. The multi-mobile robot control method according to claim 2, further comprising, before step S1:
s0. sends a guide instruction to each mobile robot to guide each mobile robot to a preset initial position.
5. A multi-mobile robot dispatch model acquisition device, comprising:
the first execution module is used for establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and preset priority level information of a plurality of mobile robots, obstacle center position information and collision radius information of an application scene and target position information; taking each mobile robot as an intelligent body in the simulation scene, and setting the movable direction of each intelligent body;
the second execution module is used for setting an Actor network and a Critic network for each intelligent agent based on a multi-intelligent-agent depth deterministic strategy gradient algorithm;
the third execution module is used for repeatedly training the Actor network and the Critic network according to a preset reward and punishment mechanism, taking the maximum total rewards of all the agents as targets until all the agents reach the target positions, and the average rewards obtained by all the agents are not promoted;
the preset reward and punishment mechanism is as follows: rewarding on the basis of the distance between the intelligent body and the nearest target position, wherein a first negative value is added when the intelligent body collides, and a second negative value is added every time exploration is executed before the intelligent body reaches the target position;
the third execution module takes the priority level information of the mobile robot as a weight value of a corresponding intelligent agent; weighting and adding the reward values acquired by all the agents as the total reward; and the average prize is calculated using the formula:
;
wherein,for average rewarding->Prize value for the ith agent, < +.>Is the weight value of the ith agent.
6. A multiple mobile robot control device comprising:
the first acquisition module is used for acquiring initial position information and identity identification information of each mobile robot;
the second acquisition module is used for acquiring target position information;
the third acquisition module is used for acquiring a multi-mobile robot dispatch model; the multi-mobile robot dispatch model is obtained by the multi-mobile robot dispatch model acquisition method of claim 1;
a fourth acquisition module for inputting the initial position information, the identification information, and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot;
and the first sending module is used for sending the destination information and the motion path information to the corresponding mobile robot.
7. The multi-mobile robot control device of claim 6, wherein the first transmission module transmits the movement path information to the corresponding mobile robot in steps, the movement path information of each step including movement direction information and thrust force information.
8. An electronic device comprising a processor and a memory, said memory having stored therein a computer program for executing the multi-mobile robot control method of any of claims 2-4 by invoking said computer program stored in said memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010675357.5A CN111897327B (en) | 2020-07-14 | 2020-07-14 | Multi-mobile robot control/dispatch model acquisition method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010675357.5A CN111897327B (en) | 2020-07-14 | 2020-07-14 | Multi-mobile robot control/dispatch model acquisition method and device and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111897327A CN111897327A (en) | 2020-11-06 |
CN111897327B true CN111897327B (en) | 2024-02-23 |
Family
ID=73191751
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010675357.5A Active CN111897327B (en) | 2020-07-14 | 2020-07-14 | Multi-mobile robot control/dispatch model acquisition method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111897327B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114516044B (en) * | 2020-11-20 | 2024-12-20 | 炬星科技(深圳)有限公司 | Robot path planning method, device and storage medium |
CN113459109B (en) * | 2021-09-03 | 2021-11-26 | 季华实验室 | Mechanical arm path planning method and device, electronic equipment and storage medium |
CN114454162B (en) * | 2022-01-10 | 2023-05-26 | 广东技术师范大学 | Mobile robot complex intersection anti-collision method and system |
CN117527570B (en) * | 2023-12-18 | 2024-05-17 | 无锡北微传感科技有限公司 | Sensor cluster position optimization method based on edge reinforcement learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704980A (en) * | 2017-05-24 | 2018-02-16 | 重庆大学 | A kind of multiple agent towards the newly-increased express delivery pickup demand of processing is made decisions on one's own method |
CN110084375A (en) * | 2019-04-26 | 2019-08-02 | 东南大学 | A kind of hierarchy division frame based on deeply study |
CN110632931A (en) * | 2019-10-09 | 2019-12-31 | 哈尔滨工程大学 | Collision avoidance planning method for mobile robot based on deep reinforcement learning in dynamic environment |
CN110794842A (en) * | 2019-11-15 | 2020-02-14 | 北京邮电大学 | Reinforced learning path planning algorithm based on potential field |
CN110991972A (en) * | 2019-12-14 | 2020-04-10 | 中国科学院深圳先进技术研究院 | Cargo transportation system based on multi-agent reinforcement learning |
-
2020
- 2020-07-14 CN CN202010675357.5A patent/CN111897327B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107704980A (en) * | 2017-05-24 | 2018-02-16 | 重庆大学 | A kind of multiple agent towards the newly-increased express delivery pickup demand of processing is made decisions on one's own method |
CN110084375A (en) * | 2019-04-26 | 2019-08-02 | 东南大学 | A kind of hierarchy division frame based on deeply study |
CN110632931A (en) * | 2019-10-09 | 2019-12-31 | 哈尔滨工程大学 | Collision avoidance planning method for mobile robot based on deep reinforcement learning in dynamic environment |
CN110794842A (en) * | 2019-11-15 | 2020-02-14 | 北京邮电大学 | Reinforced learning path planning algorithm based on potential field |
CN110991972A (en) * | 2019-12-14 | 2020-04-10 | 中国科学院深圳先进技术研究院 | Cargo transportation system based on multi-agent reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN111897327A (en) | 2020-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111897327B (en) | Multi-mobile robot control/dispatch model acquisition method and device and electronic equipment | |
Yamada et al. | Motion planner augmented reinforcement learning for robot manipulation in obstructed environments | |
US11235467B2 (en) | System and method for trajectory planning for manipulators in robotic finishing applications | |
WO2019113067A2 (en) | Viewpoint invariant visual servoing of robot end effector using recurrent neural network | |
CN109407603B (en) | Method and device for controlling mechanical arm to grab object | |
JP7517225B2 (en) | Trajectory generation system, trajectory generation method, and program | |
CN101092032A (en) | Controlling the interactive behavior of a robot | |
WO2021178872A1 (en) | Trajectory optimization using neural networks | |
CN107457780B (en) | Method and device for controlling mechanical arm movement, storage medium and terminal equipment | |
Baba et al. | Collision avoidance planning of a robot manipulator by using genetic algorithm. A consideration for the problem in which moving obstacles and/or several robots are included in the workspace | |
Uchibe et al. | Cooperative behavior acquisition in multi-mobile robots environment by reinforcement learning based on state vector estimation | |
US12032343B2 (en) | Control system for controlling a machine using a control agent with parallel training of the control agent | |
Gao et al. | Effectively rearranging heterogeneous objects on cluttered tabletops | |
Imtiaz et al. | Implementing Robotic Pick and Place with Non-visual Sensing Using Reinforcement Learning | |
CN115249333B (en) | Grabbing network training method, grabbing network training system, electronic equipment and storage medium | |
CN114683280B (en) | Object control method and device, storage medium and electronic equipment | |
CN117138348A (en) | AI model construction method, agent control method, device, and storage medium | |
CN113218399A (en) | Maze navigation method and device based on multi-agent layered reinforcement learning | |
Lyu et al. | Asynchronous, option-based multi-agent policy gradient: A conditional reasoning approach | |
Wang et al. | The coordination of intelligent robots: A case study | |
JP2778922B2 (en) | How to determine the work order and work sharing of multiple robots | |
CN119141536A (en) | Multi-mechanical arm deep reinforcement learning control method and device for regenerated article sorting | |
US20250068966A1 (en) | Human-in-the-loop task and motion planning for imitation learning | |
Gómez et al. | Learning Manipulation Tasks: A multi-agent approach Technical Report No. CCC-23-004 | |
CN119501923A (en) | Human-in-the-loop tasks and motion planning in imitation learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |