CN111897327B

CN111897327B - Multi-mobile robot control/dispatch model acquisition method and device and electronic equipment

Info

Publication number: CN111897327B
Application number: CN202010675357.5A
Authority: CN
Inventors: 戚骁亚; 张校志
Original assignee: Ji Hua Laboratory
Current assignee: Ji Hua Laboratory
Priority date: 2020-07-14
Filing date: 2020-07-14
Publication date: 2024-02-23
Anticipated expiration: 2040-07-14
Also published as: CN111897327A

Abstract

The invention provides a method, a device and electronic equipment for acquiring a control/dispatch model of a plurality of mobile robots, wherein the method comprises the steps of acquiring initial position information and identity identification information of each mobile robot; acquiring target position information; acquiring a multi-mobile robot dispatch model; inputting initial position information, identification information, and target position information into a multi-mobile robot dispatch model to obtain destination information and motion path information assigned to each mobile robot; transmitting destination information and motion path information to a corresponding mobile robot; wherein the multi-mobile robot dispatch model is a model for assigning destinations and planning motion paths for the multi-mobile robot based on a multi-agent reinforcement learning algorithm; therefore, position assignment and real-time motion planning can be completed rapidly, so that a plurality of mobile robots can act simultaneously to complete the position assignment task without collision.

Description

Multi-mobile robot control/dispatch model acquisition method and device and electronic equipment

Technical Field

The present invention relates to the field of robotics, and in particular, to a method and an apparatus for acquiring a control/dispatch model of a multi-mobile robot, and an electronic device.

Background

With the development of technology and the deep application of robot technology, the multi-mobile robot is applied to more and more scenes, and the complexity and the diversity of the scenes place higher demands on the control of the multi-mobile robot. Wherein the control of the multi-mobile robot is focused on the position assignment and motion planning of the multi-mobile robot.

In the traditional multi-mobile machine control method, the states of the front formation and the rear formation are usually determined when the positions are assigned, then an optimization algorithm is adopted to distribute the corresponding relation between the robots and the end positions, and when the movement planning is carried out, a method such as sequential assignment or preferential assignment is adopted, so that the mobile robots reach the distributed target positions, and the purposes of avoiding collision and collision are achieved. This approach is inefficient and cannot cope with complex scenes.

Therefore, a multi-mobile robot control method is needed to be capable of rapidly completing position assignment and real-time motion planning, so that a plurality of mobile robots can act at the same time to complete position assignment tasks without collision.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, an object of an embodiment of the present application is to provide a method, an apparatus and an electronic device for acquiring a control/dispatch model of a multi-mobile robot, which can rapidly complete position assignment and real-time motion planning, so that a plurality of mobile robots can act simultaneously to complete a position assignment task without collision.

In a first aspect, an embodiment of the present application provides a method for acquiring a dispatch model of a multi-mobile robot, including the steps of:

A1. establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and priority level information, obstacle center position information and collision radius information of an application scene and target position information of a plurality of mobile robots; taking each mobile robot as an intelligent body in the simulation scene, and setting the movable direction of each intelligent body;

A2. setting an Actor network and a Critic network for each intelligent agent based on a multi-intelligent-agent depth deterministic strategy gradient algorithm;

A3. repeating training the Actor network and the Critic network according to a preset reward and punishment mechanism, and taking the maximum total rewards of all the agents as targets until all the agents reach target positions, wherein the average rewards obtained by all the agents are not promoted;

the preset reward and punishment mechanism is as follows: rewarding based on the distance between the agent and the closest target location, the agent increasing a first negative value upon collision and a second negative value each time a search is performed before reaching the target location.

In the method for acquiring the multi-mobile robot dispatch model, in step A3, the priority level information of the mobile robot is used as the weight value of the corresponding intelligent agent; weighting and adding the reward values acquired by all the agents as the total reward; and the average prize is calculated using the formula:

wherein,for average rewarding->Prize value for the ith agent, < +.>Is the weight value of the ith agent.

In a second aspect, an embodiment of the present application provides a method for controlling a multi-mobile robot, including the steps of:

s1, acquiring initial position information and identity identification information of each mobile robot;

s2, acquiring target position information;

s3, acquiring a multi-mobile robot dispatch model; the multi-mobile robot dispatch model is obtained through the multi-mobile robot dispatch model obtaining method;

s4, inputting the initial position information, the identity identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot;

s5, the destination information and the motion path information are sent to the corresponding mobile robot.

In the multi-mobile robot control method, in step S5, motion path information is sent to the corresponding mobile robot step by step, and the motion path information of each step includes motion direction information and pushing force information.

The multi-mobile robot control method further includes, before step S1:

s0. sends a guide instruction to each mobile robot to guide each mobile robot to a preset initial position.

In a third aspect, an embodiment of the present application provides a multi-mobile robot dispatch model acquisition apparatus, including:

the first execution module is used for establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and priority level information, obstacle center position information and collision radius information of an application scene and target position information of a plurality of mobile robots; taking each mobile robot as an intelligent body in the simulation scene, and setting the movable direction of each intelligent body;

the second execution module is used for setting an Actor network and a Critic network for each intelligent agent based on a multi-intelligent-agent depth deterministic strategy gradient algorithm;

the third execution module is used for repeatedly training the Actor network and the Critic network according to a preset reward and punishment mechanism, taking the maximum total rewards of all the agents as targets until all the agents reach the target positions, and the average rewards obtained by all the agents are not promoted;

In the multi-mobile robot dispatch model acquisition device, the third execution module takes the priority level information of the mobile robot as a weight value of a corresponding intelligent agent; weighting and adding the reward values acquired by all the agents as the total reward; and the average prize is calculated using the formula:

In a fourth aspect, embodiments of the present application provide a multi-mobile robot control device, including:

the first acquisition module is used for acquiring initial position information and identity identification information of each mobile robot;

the second acquisition module is used for acquiring target position information;

the third acquisition module is used for acquiring a multi-mobile robot dispatch model; the multi-mobile robot dispatch model is obtained through the multi-mobile robot dispatch model obtaining method;

a fourth acquisition module for inputting the initial position information, the identification information, and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot;

and the first sending module is used for sending the destination information and the motion path information to the corresponding mobile robot.

In the multi-mobile robot control device, the first sending module sends the motion path information to the corresponding mobile robot step by step, and the motion path information of each step comprises motion direction information and pushing force information.

In a fifth aspect, an embodiment of the present application provides an electronic device including a processor and a memory, where the memory stores a computer program, and the processor is configured to execute the multi-mobile robot control method by calling the computer program stored in the memory.

The beneficial effects are that:

the embodiment of the application provides a method, a device and electronic equipment for acquiring a multi-mobile robot control/dispatch model, wherein initial position information and identity identification information of each mobile robot are acquired; acquiring target position information; acquiring a multi-mobile robot dispatch model; inputting the initial position information, the identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot; transmitting the destination information and the movement path information to a corresponding mobile robot; wherein the multi-mobile robot dispatch model is a model for assigning destinations and planning motion paths for the multi-mobile robot based on a multi-agent reinforcement learning algorithm; therefore, position assignment and real-time motion planning can be completed rapidly, so that a plurality of mobile robots can act simultaneously to complete the position assignment task without collision.

Drawings

Fig. 1 is a flowchart of a method for controlling a multi-mobile robot according to an embodiment of the present application.

Fig. 2 is a block diagram of a multi-mobile robot control device according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Fig. 4 is a flowchart of a multi-mobile robot dispatch model acquisition method according to an embodiment of the present application.

Fig. 5 is a block diagram of a multi-mobile robot dispatch model acquisition device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.

Referring to fig. 4, a method for obtaining a dispatch model of a multi-mobile robot according to an embodiment of the present application includes the steps of:

A1. establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and priority level information, obstacle center position information and collision radius information of an application scene and target position information of a plurality of mobile robots; taking each mobile robot as an intelligent body in a simulation scene, and setting the movable direction of each intelligent body;

A3. repeatedly training an Actor network and a Critic network according to a preset reward and punishment mechanism, and taking the maximum total rewards of all the agents as targets until all the agents reach target positions, wherein the average rewards obtained by all the agents are not promoted;

the preset reward and punishment mechanism is as follows: the reward R is based on the distance between the agent and the closest target location, which increases by a first negative value R1 upon collision of the agent and by a second negative value R2 each time a search is performed before reaching the target location.

Due to the above-mentioned punishment and punishment mechanism and the setting of training targets, the multi-mobile robot dispatch model obtained by using the method is used for assigning destination positions for all mobile robots and planning motion paths, and can ensure that a plurality of mobile robots can act simultaneously and arrive at the destination without collision. In the multi-agent depth deterministic strategy gradient algorithm, because the Critic network continuously strengthens the action with large expected return in iterative rounds, in a multi-mobile robot position assignment scene (i.e. the simulation scene), the action can enable the state position to continuously improve towards the optimal state position in the scene, and the optimal state sequence, namely the optimal path for the scene, is finally obtained by utilizing the trained converged parameters and the stable model.

In some embodiments, in step A3, the priority level information of the mobile robot is used as the weight value of the corresponding agent; weight sum of rewarding value obtained by all agent) As a total prize; and the average prize is calculated using the formula:

By introducing the weight, mobile robots with high priority levels are preferentially assigned when tasks are assigned, and the user can preset the priority level of each mobile robot according to actual needs (for example, the priority level can be represented by numbers 1-10, and the higher the numerical value, the higher the priority level), so that the assignment result is more in line with the user's expectations.

In step A1, the set movable direction of each agent includes a hold, an up-shift, a down-shift, a left-shift and a right-shift. In some embodiments, when the Actor network and the Critic network are repeatedly trained in the step A3, the change of the movement state of the intelligent agent is realized by applying a force f to the intelligent agent in a specified direction; therefore, when the obtained motion path comprises multi-step motions (the motion state is changed once to be one step), the motion direction and the driving force of each step of motion can be obtained, and when the mobile robot needs to move according to the planned path, the mobile robot only needs to execute step by step according to the motion direction and the driving force of each step.

In the training process of the step A3, the distance between the intelligent agent and other intelligent agents is calculated, if the distance is smaller than the sum of the collision distances of the two intelligent agents, the intelligent agent is considered to collide with the other intelligent agents, and the rewarding value of the intelligent agent is increased by a first negative value R1; and meanwhile, calculating the distance between the intelligent body and the obstacle, and if the distance is smaller than the sum of the collision distances of the intelligent body and the obstacle, considering that the intelligent body collides with the obstacle, and increasing the rewarding value of the intelligent body by a first negative value R1.

From the above, the multi-mobile robot dispatch model acquired by the multi-mobile robot dispatch model acquisition method has the following advantages:

1. the multi-agent reinforcement learning algorithm is applied to position assignment and motion planning of the mobile robot, and the motion planning is completed while the position assignment is realized, so that the efficiency is higher compared with the mode of separately designing the position assignment and the motion planning in the traditional method;

2. weight is introduced during the calculation of the reward accumulation, so that the algorithm can be corresponding to the priority of the mobile robot in practice, and the method has more practical significance;

3. for a complex environment, a corresponding obstacle model is built only according to the central position of an obstacle and the collision radius in a simulation scene, so that the method is suitable for the complex scene, and is larger in application space and higher in expandability.

Referring to fig. 5, the embodiment of the present application further provides a mobile robot dispatch model obtaining device, which includes a first execution module 1, a second execution module 2, and a third execution module 3;

the first execution module 1 is used for establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and priority level information, obstacle center position information and collision radius information of an application scene and target position information of a plurality of mobile robots; taking each mobile robot as an intelligent body in a simulation scene, and setting the movable direction of each intelligent body;

the second execution module 2 is used for setting an Actor network and a Critic network for each intelligent agent based on a multi-intelligent-agent depth deterministic strategy gradient algorithm;

the third execution module 3 is configured to perform repeated training on the Actor network and the Critic network according to a preset punishment mechanism, and take the maximum total rewards of all the agents as a target until each agent has reached a target position, where the average rewards obtained by all the agents are not increased;

the preset reward and punishment mechanism is as follows: the reward R is based on the distance between the agent and the closest target location, increasing a first negative value R1 when the agent collides, and increasing a second negative value R2 each time a search is performed before the target location is reached.

In some embodiments, the third execution module 3 uses the priority level information of the mobile robot as the weight value of the corresponding agent; weight sum of rewarding value obtained by all agent) As a total prize; and the average prize is calculated using the formula:

From the above, the mobile robot dispatch model acquisition device has the following advantages:

In addition, the embodiment of the application also provides electronic equipment, which comprises a processor and a memory, wherein the memory stores a computer program, and the processor is used for executing the multi-mobile robot dispatch model acquisition method by calling the computer program stored in the memory.

The processor is electrically connected with the memory. The processor is a control center of the electronic device, and uses various interfaces and lines to connect various parts of the whole electronic device, and executes various functions of the electronic device and processes data by running or calling computer programs stored in the memory and calling data stored in the memory, thereby performing overall monitoring on the electronic device.

The memory may be used to store computer programs and data. The memory stores a computer program having instructions executable in the processor. The computer program may constitute various functional modules. The processor executes various functional applications and data processing by invoking computer programs stored in the memory 1.

In this embodiment, the processor in the electronic device loads the instructions corresponding to the processes of one or more computer programs into the memory according to the following steps, and the processor executes the computer programs stored in the memory, so as to implement various functions: establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and priority level information, obstacle center position information and collision radius information of an application scene and target position information of a plurality of mobile robots; taking each mobile robot as an intelligent body in a simulation scene, and setting the movable direction of each intelligent body; setting an Actor network and a Critic network for each intelligent agent based on a multi-intelligent-agent depth deterministic strategy gradient algorithm; repeatedly training an Actor network and a Critic network according to a preset reward and punishment mechanism, and taking the maximum total rewards of all the agents as targets until all the agents reach target positions, wherein the average rewards obtained by all the agents are not promoted; the preset reward and punishment mechanism is as follows: the reward R is based on the distance between the agent and the closest target location, which increases by a first negative value R1 upon collision of the agent and by a second negative value R2 each time a search is performed before reaching the target location.

From the above, the electronic device has the following advantages:

Referring to fig. 1, the embodiment of the present application further provides a method for controlling a multi-mobile robot, including the steps of:

s2, acquiring target position information;

The multi-mobile robot control method is applied to a control server of the multi-mobile robot.

The multi-mobile robot dispatch model is used for realizing position assignment and motion path planning of a plurality of mobile robots, so that the position assignment and real-time motion planning can be completed rapidly, and the plurality of mobile robots can act simultaneously to complete the position assignment task without collision; when the weight is introduced in the reward accumulation calculation in the multi-mobile robot dispatch model, the algorithm can be corresponding to the priority of the mobile robot in practice, and the method has more practical significance; and the method is suitable for complex scenes, and has larger applicable space and stronger expandability.

The identification information may be number information of the mobile robot defined by the user, or may be a MAC address of a communication module of the mobile robot. In the multi-mobile robot dispatch model, each mobile robot is corresponding to each intelligent agent through the identity identification information.

In some embodiments, the multi-mobile robot dispatch model, while training, effects a change in the state of motion of the agent by applying a force f to the agent in a specified direction; thus, the obtained motion path comprises multi-step motions (the motion state is changed once to be one step), and the motion direction and the driving force of each step of motion can be obtained. Correspondingly, in step S5, the movement path information may be sent stepwise to the corresponding mobile robot, the movement path information of each step including movement direction information and thrust force information. The robot can reach the assigned destination according to the corresponding path only by gradually executing according to the movement direction and the driving force of each step, and the robot is simple in logic and convenient to realize.

For a pre-trained multi-mobile robot dispatch model, the initial position of each multi-mobile robot is required not to be too different from the initial position used in training, otherwise, retraining is required; thus, in some preferred embodiments, prior to step S1, further comprising:

The guiding instruction comprises a lookup table between mobile robot identity identification information and preset initial position coordinates and guiding triggering signals; after the mobile robot recognizes the guiding trigger signal, the mobile robot queries in a lookup table according to the self identity recognition information to obtain corresponding initial position coordinates, then moves to the initial position according to the initial position coordinates, and sends initial position information and identity recognition information to a server after reaching the initial position.

According to the control method of the multi-mobile robot, the initial position information and the identity identification information of each mobile robot are obtained; acquiring target position information; acquiring a multi-mobile robot dispatch model; inputting the initial position information, the identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot; transmitting the destination information and the movement path information to a corresponding mobile robot; wherein the multi-mobile robot dispatch model is a model for assigning destinations and planning motion paths for the multi-mobile robot based on a multi-agent reinforcement learning algorithm; therefore, position assignment and real-time motion planning can be completed rapidly, so that a plurality of mobile robots can act simultaneously to complete the position assignment task without collision.

Referring to fig. 2, the embodiment of the present application further provides a multi-mobile robot control device, which includes a first obtaining module 91, a second obtaining module 92, a third obtaining module 93, a fourth obtaining module 94, and a first sending module 95;

the first acquiring module 91 is configured to acquire initial position information and identity identification information of each mobile robot;

wherein, the second obtaining module 92 is configured to obtain target location information;

wherein, the third obtaining module 93 is configured to obtain a multi-mobile robot dispatch model; the multi-mobile robot dispatch model is obtained through the multi-mobile robot dispatch model obtaining method;

wherein the fourth obtaining module 94 is configured to input the initial position information, the identification information, and the target position information into the multi-mobile robot dispatch model to obtain destination information and motion path information assigned to each mobile robot;

the first sending module 95 is configured to send the destination information and the motion path information to the corresponding mobile robot.

In some embodiments, the first transmitting module 95 transmits the movement path information to the corresponding mobile robot in steps, and the movement path information of each step includes movement direction information and thrust force information.

As can be seen from the above, the multi-mobile robot control device obtains the initial position information and the identity identification information of each mobile robot; acquiring target position information; acquiring a multi-mobile robot dispatch model; inputting the initial position information, the identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot; transmitting the destination information and the movement path information to a corresponding mobile robot; wherein the multi-mobile robot dispatch model is a model for assigning destinations and planning motion paths for the multi-mobile robot based on a multi-agent reinforcement learning algorithm; therefore, position assignment and real-time motion planning can be completed rapidly, so that a plurality of mobile robots can act simultaneously to complete the position assignment task without collision.

Referring to fig. 3, the embodiment of the present application further provides an electronic device 100, including a processor 101 and a memory 102, where the memory 102 stores a computer program, and the processor 101 is configured to execute the above-mentioned multi-mobile robot control method by calling the computer program stored in the memory 102.

The processor 101 is electrically connected to the memory 102. The processor 101 is a control center of the electronic device 100, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or calling computer programs stored in the memory 102, and calling data stored in the memory 102, thereby performing overall monitoring of the electronic device.

Memory 102 may be used to store computer programs and data. The memory 102 stores a computer program having instructions executable in a processor. The computer program may constitute various functional modules. The processor 101 executes various functional applications and data processing by calling a computer program stored in the memory 102.

In this embodiment, the processor 101 in the electronic device 100 loads instructions corresponding to the processes of one or more computer programs into the memory 102 according to the following steps, and the processor 101 executes the computer programs stored in the memory 102, so as to implement various functions: acquiring initial position information and identity identification information of each mobile robot; acquiring target position information; acquiring a multi-mobile robot dispatch model; inputting the initial position information, the identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot; and transmitting the destination information and the motion path information to the corresponding mobile robot.

From the above, the electronic device obtains the initial position information and the identity identification information of each mobile robot; acquiring target position information; acquiring a multi-mobile robot dispatch model; inputting the initial position information, the identification information and the target position information into the multi-mobile robot dispatch model to acquire destination information and motion path information assigned to each mobile robot; transmitting the destination information and the movement path information to a corresponding mobile robot; wherein the multi-mobile robot dispatch model is a model for assigning destinations and planning motion paths for the multi-mobile robot based on a multi-agent reinforcement learning algorithm; therefore, position assignment and real-time motion planning can be completed rapidly, so that a plurality of mobile robots can act simultaneously to complete the position assignment task without collision.

In summary, although the present invention has been described with reference to the preferred embodiments, it is not limited thereto, and various modifications and variations can be made by those skilled in the art without departing from the spirit and scope of the present invention.

Claims

1. A method for acquiring a dispatch model of a multi-mobile robot, comprising the steps of:

A1. establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and preset priority information of a plurality of mobile robots, obstacle center position information and collision radius information of an application scene and target position information; taking each mobile robot as an intelligent body in the simulation scene, and setting the movable direction of each intelligent body;

the preset reward and punishment mechanism is as follows: rewarding on the basis of the distance between the intelligent body and the nearest target position, wherein a first negative value is added when the intelligent body collides, and a second negative value is added every time exploration is executed before the intelligent body reaches the target position;

in the step A3, the priority level information of the mobile robot is used as the weight value of the corresponding intelligent agent; weighting and adding the reward values acquired by all the agents as the total reward; and the average prize is calculated using the formula:

；

2. A method of controlling a multi-mobile robot, comprising the steps of:

s2, acquiring target position information;

s3, acquiring a multi-mobile robot dispatch model; the multi-mobile robot dispatch model is obtained by the multi-mobile robot dispatch model acquisition method of claim 1;

3. The multi-mobile robot control method according to claim 2, wherein in step S5, the movement path information is sent stepwise to the corresponding mobile robot, and the movement path information of each step includes movement direction information and thrust force information.

4. The multi-mobile robot control method according to claim 2, further comprising, before step S1:

5. A multi-mobile robot dispatch model acquisition device, comprising:

the first execution module is used for establishing a multi-agent reinforcement learning algorithm simulation scene according to initial position information, collision radius and preset priority level information of a plurality of mobile robots, obstacle center position information and collision radius information of an application scene and target position information; taking each mobile robot as an intelligent body in the simulation scene, and setting the movable direction of each intelligent body;

the third execution module takes the priority level information of the mobile robot as a weight value of a corresponding intelligent agent; weighting and adding the reward values acquired by all the agents as the total reward; and the average prize is calculated using the formula:

；

6. A multiple mobile robot control device comprising:

the third acquisition module is used for acquiring a multi-mobile robot dispatch model; the multi-mobile robot dispatch model is obtained by the multi-mobile robot dispatch model acquisition method of claim 1;

7. The multi-mobile robot control device of claim 6, wherein the first transmission module transmits the movement path information to the corresponding mobile robot in steps, the movement path information of each step including movement direction information and thrust force information.

8. An electronic device comprising a processor and a memory, said memory having stored therein a computer program for executing the multi-mobile robot control method of any of claims 2-4 by invoking said computer program stored in said memory.