CN116430891A - Deep reinforcement learning method oriented to multi-agent path planning environment - Google Patents
Deep reinforcement learning method oriented to multi-agent path planning environment Download PDFInfo
- Publication number
- CN116430891A CN116430891A CN202310175856.1A CN202310175856A CN116430891A CN 116430891 A CN116430891 A CN 116430891A CN 202310175856 A CN202310175856 A CN 202310175856A CN 116430891 A CN116430891 A CN 116430891A
- Authority
- CN
- China
- Prior art keywords
- network
- agent
- curiosity
- path planning
- environment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000002787 reinforcement Effects 0.000 title claims abstract description 28
- 238000004088 simulation Methods 0.000 claims abstract description 18
- 239000003795 chemical substances by application Substances 0.000 claims description 63
- 230000009471 action Effects 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 9
- 238000011161 development Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000004891 communication Methods 0.000 claims description 2
- 238000000605 extraction Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 8
- 238000012549 training Methods 0.000 abstract description 2
- 238000013135 deep learning Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of path planning, and provides a multi-agent-oriented path planning deep reinforcement learning algorithm and system, wherein the method and the system comprise the following steps: building a modeling and path planning simulation system of the quadrotor unmanned aerial vehicle; and constructing a deep reinforcement learning basic network, and initializing and setting basic parameters. And building a non-global curiosity network for improving the exploration ability and level of the intelligent agent. An attention network is built, the training process of the intelligent agents is accelerated and stabilized, and the cooperation level between the intelligent agents is enhanced. The invention provides a deep reinforcement learning algorithm for multi-agent path planning, which combines a curiosity mechanism and an attention mechanism, establishes a new agent rewarding distribution mechanism, balances agent exploration and cooperation, and effectively improves the stability and planning level of the multi-agent path planning.
Description
Technical Field
The invention relates to a path planning problem of multiple agents, in particular to a path planning problem of deep reinforcement learning of multiple agents, and a problem of insufficient exploration of the agents and unreasonable reward value distribution.
Background
Path planning is a technique widely used in the fields of robots, automatic driving vehicles, virtual reality, simulation systems, and the like. Its main objective is to find an optimal path from the start point to the end point in a given environment to meet specific task requirements, such as avoiding obstacles, avoiding collisions, etc. In order to better meet the actual demands, researches on path planning are also continuously advancing. In recent years, with the continuous development of artificial intelligence technology, deep learning, reinforcement learning and other technologies are gradually introduced into the field of path planning, so that the efficiency and accuracy of path planning are greatly improved. By utilizing the characteristic of reinforcement learning 'exploration and utilization', a good result can be obtained more rapidly when the path planning is conducted in the face of a complex environment than the conventional method. In addition, the application scene of the multi-agent reinforcement learning algorithm is more in line with the characteristics of many path planning scenes in the real world, for example, in the path planning of the unmanned aerial vehicle group, interaction and cooperation among various agents need to be considered, and the multi-agent can carry out cooperative control on the unmanned aerial vehicle so as to achieve the overall optimal solution.
The deep reinforcement learning is a combination of the deep learning and the reinforcement learning, and can fully utilize the characteristic of the deep learning to solve the more complex problem. In conventional reinforcement learning methods, processing the context information typically requires manual design of feature vectors. However, the deep reinforcement learning algorithm has strong environment sensing capability of deep learning, such as convolution and full connection layer, and can directly process high-dimensional environment observation information and extract characteristics thereof.
At present, path planning research in a multi-agent environment is very few, and the problems of unreasonable reward distribution, difficult convergence, complex relationship among agents and the like are required to be further researched and solved.
Disclosure of Invention
The invention aims to solve the problems, and provides a multi-agent-oriented deep reinforcement learning method for path planning, which solves the problem that in a path planning environment, because of sparse environmental rewards, agents are difficult to converge or converge to a local optimal solution.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the invention provides a deep reinforcement learning method for a multi-agent path planning environment, which comprises the following steps:
step 1: constructing a three-dimensional path planning simulation system of the four-rotor unmanned aerial vehicle by means of a Pybullet development kit;
step 2: finishing a deep reinforcement learning algorithm based on a non-global curiosity network and an attention module, and initializing each intelligent agent;
step 3: constructing an environment rewarding function according to a path planning task target, and setting a target to be reached according to rules abstracted by a simulation environment;
step 4: setting the maximum iteration round and other parameters;
step 5: according to the Pybullet development kit, acquiring environment observation information in a simulation environment and communication information between the agents in the same team, processing state information, selecting actions to be executed, acquiring curiosity rewarding values of the agents, inputting the curiosity rewarding values into an attention network for further processing, and acquiring final rewarding values;
step 6: finer evaluation of parameters of the network and policy network;
step 7: acquiring new environment observation information, acquiring experience playback quadruples and storing the experience playback quadruples in a playback experience buffer;
step 8: and repeatedly executing the steps 5-7, and updating the neural network in the multi-agent reinforcement learning algorithm until the iteration number reaches the maximum iteration number, thereby realizing the path planning task in the simulation environment.
Further, the step 1 includes:
definition in a Pybullet simulation environmentEach agent is identical except for the initial location in the environment. The environment comprises a group of local observationsA group of actionsAnd a set of states S and state transfer functionsFor each agentLocal observations obtained。
Further, the step 3 includes:
the objects to be achieved are: and under the condition that the unmanned aerial vehicle does not crash, all the obstacles are avoided to successfully reach the target position.
Further, the step 4 includes:
the attention module acts on the non-global curiosity module and is used for controlling the importance degree of the curiosity value of each agent to achieve the overall goal.
According to an embodiment of the multi-agent path planning oriented deep reinforcement learning method of the present invention, the simulation module is further configured to:
definition of the definitionEach agent is identical except for the initial location in the environment. The environment comprises a group of local observationsA group of actionsAnd a set of states S and state transfer functionsFor each agentLocal observations obtained。
According to an embodiment of the multi-agent path planning oriented deep reinforcement learning method of the present invention, the attention module is further configured to:
the attention module acts on the curiosity module, processes the curiosity reward value generated by the curiosity module and is used for improving the effect of the curiosity reward value on convergence of the whole training.
According to an embodiment of the multi-agent path planning oriented deep reinforcement learning method of the present invention, the non-global curiosity module is further configured to:
each agent based on its local observationsThe heuristics are calculated to generate curiosity rewards.
According to an embodiment of the multi-agent path planning oriented deep reinforcement learning method of the present invention, the reward function construction module is further configured to:
the objects to be achieved are: any intelligent body successfully avoids various obstacles under the condition of no crash, and successfully reaches the position of the target point.
Compared with the prior art, the invention has the following advantages:
1) The non-global curiosity module adopted by the invention solves the problem that an agent path planning has a single path in a complex environment, improves the exploration level of the agent and efficiently optimizes the multi-agent game strategy;
2) The invention provides that the attention module acts on the non-global curiosity module, and the curiosity rewards acquired by a single agent are further optimized by using the attention according to global environment observation, so that the convergence stability is improved;
3) The invention aims at the cooperative multi-agent, and realizes the path planning of the multi-agent under the complex obstacle environment.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is an overall schematic of a simulation environment employed by the present invention;
FIG. 3 is a process framework diagram of a multi-agent reinforcement learning algorithm set forth in the present invention;
fig. 4 shows a path planning result diagram (top view) of the algorithm in this simulation environment.
Detailed Description
The present invention will be further described in detail with reference to the drawings and the following examples, wherein like reference numerals refer to the same or similar elements, in order to make the objects, technical solutions and advantages of the present invention more apparent. However, the following specific examples are given for the purpose of illustration only and are not intended to limit the scope of the present invention.
Referring to fig. 1, 2 and 3, the method of the embodiment of the present invention operates as follows:
step 1: the four-rotor drone was modeled using ROS. Establishing an appropriate coordinate system to describe the movement of the unmanned aerial vehicle in space, usually using an inertial coordinate system and the coordinate system of the unmanned aerial vehicle itself; describing the movement of the unmanned aerial vehicle in three directions (longitudinal, transverse and vertical), and adjusting the power output and the moment of the four motors according to the state of the unmanned aerial vehicle; describing the rotation state of the unmanned aerial vehicle by adopting a rotation matrix or quaternion according to aerodynamics; programming an ROS program to simulate the description of the four-rotor unmanned aerial vehicle; the method comprises the steps of importing a quadrotor unmanned aerial vehicle into an environment, uniformly arranging 40 radar rays around the unmanned aerial vehicle, identifying whether the surrounding environment is a target object, importing models of objects such as columns and the like into the environment, randomly generating a fixed number of models in the environment, and using the models as barriers; the sphere models with the same number as the unmanned aerial vehicles are imported as target points and randomly distributed behind the obstacle.
Step 2: the design of a non-global curiosity network and an attention module is completed, and the non-global curiosity network and the attention module are introduced into a deep reinforcement learning algorithm; according to the dimensions of the state space and the action space of the four-rotor unmanned aerial vehicle (the dimension of the observation space of each intelligent body is 40 and the dimension of the action space is 3), the input and output dimensions of an algorithm network are adjusted, and an improved deep reinforcement learning algorithm is completed; using the algorithm as intelligentInitializing respective networks by a body; according to the strategyObtaining the selection action in the action space of the intelligent agentAnd obtains rewards by interacting with the simulation environment:。
Step 3: designing a bonus function of an environment: when four rotor unmanned aerial vehicleCloser to the obstacle, a negative prize will be given:,wherein, the method comprises the steps of, wherein,for the distance between the drone and the obstacle,is the influence range of the obstacle; when the quadrotor unmanned aerial vehicle is destroyed due to collision with an obstacle or excessive posture adjustment, a negative reward is given:the method comprises the steps of carrying out a first treatment on the surface of the When the agent successfully reaches the target point, a positive reward will be given:the method comprises the steps of carrying out a first treatment on the surface of the In addition, output from non-global curiosity networkA prize may be awarded:the method comprises the steps of carrying out a first treatment on the surface of the The total prize is thus:。
step 4: setting the maximum round as 1000, and setting the size of the experience playback buffer zone asSoft update parameters,Set to 256.
Step 5: the algorithm takes the form of an Actor-Critic framework, which includes Actor (Actor) networks and critics (Critic) networks. The actor network is responsible for generating actions of the unmanned aerial vehicle and interacting with the environment, and the critique network is responsible for evaluating states and performances of the actions and guiding the strategy function to generate actions of the next stage; both networks adopt a dual-network structure, including a target network and an estimation network; according to the observation information of each intelligent agent at the moment, the action executed by the processing of the Actor network is obtainedInteracting with the environment, calculating the curiosity rewards of each, inputting the curiosity rewards of all the agents into the attention network, weighting rewards, and outputting the curiosity rewards finally obtained by each agentThe method comprises the steps of carrying out a first treatment on the surface of the Adding the curiosity rewards and the environmental rewards to obtain the final rewards of each agent in the step.
In addition, the "non-global" nature of the non-global curiosity rewards module is embodied in: when calculating curiosity rewards, the single agent does not calculate all other agents as part of the environment, but selects the agent state information which has influence on the single agent as the environment information according to the distance between the agents.
Wherein, the process of calculating curiosity reward value is as follows:
first, the current state isCurrent actionThe next true stateIs input into the curiosity module. The curiosity module comprises four small modules; two feature extraction network modules are used for extracting statesIs characterized by (2); an execution module (Forward Model) for predicting the time of the executionExecution under stateObtained byThe method comprises the steps of carrying out a first treatment on the surface of the A reversing module for passingAndestimation. Curiosity rewarding byAndand the similarity is calculated.
In addition, the attention network processes curiosity rewards as follows:
will firstCuriosity rewards sequence X for each agent [ A ]]Inputting the curiosity rewards into the attention network, and learning the importance degrees of curiosity rewards of different intelligent agents through the neural network processing; in particular, an attention variable is usedRepresenting query variablesAn item index location selected from (a) a plurality of items; at a given pointAnd X, select the firstThe probability of the individual input information is:
wherein,,referred to as the distribution of attention,for the attention scoring function, the following equation may be used to calculate:
step 6, the specific updating process of each network is as follows:
wherein,,in order to be able to sample the number of samples,to at the same timeParameter values of a functionIn the case of determination, the state-action pair isWhen the intelligent agent finishes the round, the intelligent agent can obtain expected return;the expression is as follows:
wherein the method comprises the steps ofIs the firstThe value of the prize to be awarded by the wheel,is a discount factor that balances future rewards against current rewards.And outputting an action value for the Actor network.
The parameters of the Actor network are updated in a gradient way according to the evaluation of the actions by the Critic network:
wherein in addition to the parameters described above, there areThe function is a function that maximizes the desirability,namely, isFunction at parameterThe gradient at the time of the determination is determined,equivalent to a policy function。
Step 7. The steps are carried outThe quadruple is stored in an experience playback buffer; the process of experience playback is as follows:
taking a learning sequence from the experience playback buffer pool:
the value of the calculated time difference error (td_error) is:
the random gradient is:
the gradient update formula is:
in the algorithm, an empirical playback strategy of importance sampling is adopted; and (3) carrying out descending processing on the probability of each experience playback according to the time sequence difference error, wherein the larger the value of the time sequence difference error is, the larger the probability of being sampled is.
Step 8: the iteration is continued to the set maximum number of iterations according to steps 5-7.
Referring to fig. 4:
the three-dimensional simulation environment is projected on a two-dimensional plane, a blue square in the figure represents the position of an obstacle, a red circle represents the position of a target point, and three irregular lines represent the path planning routes of three four-rotor unmanned aerial vehicles.
While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance with one or more embodiments, occur in different orders and/or concurrently with other acts from that shown and described herein or not shown and described herein, as would be understood and appreciated by those skilled in the art.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (5)
1. A deep reinforcement learning method and system for a multi-agent path planning environment is characterized by comprising the following steps:
step 1: constructing a three-dimensional path planning simulation system of the four-rotor unmanned aerial vehicle by means of a Pybullet development kit;
step 2: finishing a deep reinforcement learning algorithm based on a non-global curiosity network and an attention module, and initializing each intelligent agent;
step 3: constructing an environment rewarding function according to a path planning task target, and setting a target to be reached according to rules abstracted by a simulation environment;
step 4: setting the maximum iteration round and other parameters;
step 5: according to the Pybullet development kit, acquiring environment observation information in a simulation environment and communication information between the agents in the same team, processing state information, selecting actions to be executed, acquiring curiosity rewarding values of the agents, inputting the curiosity rewarding values into an attention network for further processing, and acquiring final rewarding values;
step 6: finer evaluation of parameters of the network and policy network;
step 7: acquiring new environment observation information, acquiring experience playback quadruples and storing the experience playback quadruples in a playback experience buffer;
step 8: and repeatedly executing the steps 5-7, and updating the neural network in the multi-agent reinforcement learning algorithm until the iteration number reaches the maximum iteration number, thereby realizing the path planning task in the simulation environment.
2. The three-dimensional path planning simulation system of the four-rotor unmanned aerial vehicle according to claim 1, wherein the four-rotor unmanned aerial vehicle is modeled according to the attribute of the four-rotor unmanned aerial vehicle by adopting ROS software, and intelligent agents of the four-rotor unmanned aerial vehicle are added in a Pybullet simulation environment, wherein each intelligent agent is completely the same except the initial position; the target unit is defined as spherical and is located behind an obstacle.
3. The non-global curiosity network of claim 1, wherein non-globally is embodied in that a single agent does not treat all other agents as a ring when calculating curiosity rewardsCalculating part of the environment, selecting the state information of the intelligent agents which has influence on the intelligent agents according to the distance between the intelligent agents as the environment state information, and firstly, setting the current stateCurrent action->And the next real state +.>Are all input into curiosity module which comprises four small modules, two feature extraction network modules for extracting state ∈ ->Is characterized by (2); an execution module (Forward Model) for predicting the presence of +.>Execute +.>Obtained->The method comprises the steps of carrying out a first treatment on the surface of the An inversion module for passing->And->Estimate->Curiosity rewarding byAnd->And the similarity is calculated.
4. The attention module of claim 1, wherein the curiosity rewards sequence X of each agent is first [ A ]]Inputting into the attention network, and learning the importance degree of curiosity rewards of different intelligent agents by neural network processing, specifically adopting an attention variable +.>To represent the query variable +.>Index position of the selected item in given +.>And X, select->The probability of the individual input information is: /> The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Called attention distribution, +.>The attention is scored as a function.
5. The deep reinforcement learning algorithm of claim 1 comprising an Actor (Actor) network and a Critic (Critic) network, wherein Critic's two network rootsAccording to the time sequence difference error) Updating:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For the number of samples +.>Is at->Parameter value +.>In the case of a determination, the state-action pair is +.>When the intelligent agent reaches the end of the round, the intelligent agent can obtain the expected return; ->The expression is as follows:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is->Prize value for round, ->For balancing future returns with current returns, as a discount factor; ->The parameters of the Actor network are updated in a gradient way according to the evaluation of the action by the Critic network:the method comprises the steps of carrying out a first treatment on the surface of the Wherein, in addition to the parameters described above, there is +.>The function is the maximum desired function, +.>Namely +.>The function is in parameter->Determining the gradient->Equivalent to policy function->。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310175856.1A CN116430891A (en) | 2023-02-28 | 2023-02-28 | Deep reinforcement learning method oriented to multi-agent path planning environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310175856.1A CN116430891A (en) | 2023-02-28 | 2023-02-28 | Deep reinforcement learning method oriented to multi-agent path planning environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116430891A true CN116430891A (en) | 2023-07-14 |
Family
ID=87080380
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310175856.1A Pending CN116430891A (en) | 2023-02-28 | 2023-02-28 | Deep reinforcement learning method oriented to multi-agent path planning environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116430891A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117492446A (en) * | 2023-12-25 | 2024-02-02 | 北京大学 | Multi-agent cooperation planning method and system based on combination and mixing optimization |
CN118625816A (en) * | 2024-08-12 | 2024-09-10 | 徐州市近距离智能科技有限公司 | Intelligent robot path planning method |
-
2023
- 2023-02-28 CN CN202310175856.1A patent/CN116430891A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117492446A (en) * | 2023-12-25 | 2024-02-02 | 北京大学 | Multi-agent cooperation planning method and system based on combination and mixing optimization |
CN118625816A (en) * | 2024-08-12 | 2024-09-10 | 徐州市近距离智能科技有限公司 | Intelligent robot path planning method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113110592B (en) | Unmanned aerial vehicle obstacle avoidance and path planning method | |
Chen et al. | Stabilization approaches for reinforcement learning-based end-to-end autonomous driving | |
Das et al. | A hybrid improved PSO-DV algorithm for multi-robot path planning in a clutter environment | |
Sun et al. | Crowd navigation in an unknown and dynamic environment based on deep reinforcement learning | |
Suh et al. | Fast sampling-based cost-aware path planning with nonmyopic extensions using cross entropy | |
CN112947081A (en) | Distributed reinforcement learning social navigation method based on image hidden variable probability model | |
CN113848974B (en) | Aircraft trajectory planning method and system based on deep reinforcement learning | |
Yan et al. | Immune deep reinforcement learning-based path planning for mobile robot in unknown environment | |
CN116661503A (en) | Cluster track automatic planning method based on multi-agent safety reinforcement learning | |
Petrazzini et al. | Proximal policy optimization with continuous bounded action space via the beta distribution | |
CN113391633A (en) | Urban environment-oriented mobile robot fusion path planning method | |
Gan et al. | Multi-usv cooperative chasing strategy based on obstacles assistance and deep reinforcement learning | |
CN116430891A (en) | Deep reinforcement learning method oriented to multi-agent path planning environment | |
CN114089751A (en) | Mobile robot path planning method based on improved DDPG algorithm | |
Lei et al. | Kb-tree: Learnable and continuous monte-carlo tree search for autonomous driving planning | |
Chen et al. | A study of unmanned path planning based on a double-twin RBM-BP deep neural network | |
CN115016499B (en) | SCA-QL-based path planning method | |
Qin et al. | A path planning algorithm based on deep reinforcement learning for mobile robots in unknown environment | |
CN114326826B (en) | Multi-unmanned aerial vehicle formation transformation method and system | |
Duo et al. | A deep reinforcement learning based mapless navigation algorithm using continuous actions | |
CN116027788A (en) | Intelligent driving behavior decision method and equipment integrating complex network theory and part of observable Markov decision process | |
Choi et al. | Collision avoidance of unmanned aerial vehicles using fuzzy inference system-aided enhanced potential field | |
Tang et al. | Reinforcement learning for robots path planning with rule-based shallow-trial | |
CN115164890A (en) | Swarm unmanned aerial vehicle autonomous motion planning method based on simulation learning | |
Priya et al. | Unmanned Aerial System Trajectory Tracking based on Diversified Grey Wolf Optimization Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |