[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113359480B - Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm - Google Patents

Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm Download PDF

Info

Publication number
CN113359480B
CN113359480B CN202110806485.3A CN202110806485A CN113359480B CN 113359480 B CN113359480 B CN 113359480B CN 202110806485 A CN202110806485 A CN 202110806485A CN 113359480 B CN113359480 B CN 113359480B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
user
mth
tth moment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110806485.3A
Other languages
Chinese (zh)
Other versions
CN113359480A (en
Inventor
赵建伟
吴官翰
贾维敏
张峰干
姜楠
王连锋
谭力宁
金伟
金国栋
沈涛
张聪
何芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rocket Force University of Engineering of PLA
Original Assignee
Rocket Force University of Engineering of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rocket Force University of Engineering of PLA filed Critical Rocket Force University of Engineering of PLA
Priority to CN202110806485.3A priority Critical patent/CN113359480B/en
Publication of CN113359480A publication Critical patent/CN113359480A/en
Application granted granted Critical
Publication of CN113359480B publication Critical patent/CN113359480B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a multi-unmanned aerial vehicle and user cooperative communication optimization method based on a MAPPO algorithm, which comprises the following steps: firstly, establishing an unmanned aerial vehicle network model and a user network model; setting an unmanned aerial vehicle and a user scene; thirdly, acquiring the observation states of the unmanned aerial vehicle and the user; acquiring global states of the unmanned aerial vehicle and the user; fifthly, obtaining rewards of the unmanned aerial vehicle and the user; sixthly, storing experience tuples; seventhly, iteratively optimizing parameters of the network model by using the MAPPO algorithm; and eighthly, optimizing and predicting the communication between multiple unmanned aerial vehicles and multiple users. According to the invention, through the optimization of the parameters of the unmanned aerial vehicle and the user network model, the optimization of the flight azimuth angle, power and bandwidth distribution of the unmanned aerial vehicle is realized, the observation states of a plurality of unmanned aerial vehicles and a plurality of users are effectively adapted to predict and output a reasonable cooperative communication optimization strategy, the throughput of a communication system is maximized under the action of a multidimensional decision, and the fairness of resource distribution is satisfied.

Description

Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
Technical Field
The invention belongs to the technical field of unmanned aerial vehicles and user communication, and particularly relates to a multi-unmanned aerial vehicle and user cooperative communication optimization method based on a MAPPO algorithm.
Background
In the current 5G mobile communication, the ground backbone network bears huge data transmission pressure with the rapid development of various emerging industries. While being limited by geographical conditions, many remote areas are still in a state of insufficient wireless coverage. These unprecedented demands for high-quality wireless communication services present significant challenges to current traditional terrestrial communication networks. For this reason, Unmanned Aerial Vehicles (UAVs) as air access nodes assist ground communication in future 6G and beyond wireless communication become a promising solution.
Unmanned aerial vehicle has stronger flexibility and degree of freedom as flying base station, can stride across multiple topography and provide wireless coverage for the User, can unload the calculation load that part Ground spilled over on the one hand, alleviates Ground base station and calculates transmission pressure, and on the other hand can adjust Ground coverage and region in a flexible way to the Ground User (GU) of corresponding random motion. Meanwhile, due to the good line-of-sight characteristic of the air-ground link of the unmanned aerial vehicle, the probability of non-line-of-sight shielding and shadow effects is greatly reduced, unnecessary path loss is reduced to a certain extent, and the working time of the unmanned aerial vehicle is prolonged under the conditions of limited energy of the unmanned aerial vehicle and equal Quality of Service (QoS) provided by the unmanned aerial vehicle.
Existing unmanned aerial vehicles are mainly subjected to trajectory optimization under fixed communication resource allocation or single communication resource allocation. The optimization goals are limited to drone or ground access control only and are not studied from multiple drones and multiple user planes.
Therefore, a method for optimizing cooperative communication between multiple unmanned aerial vehicles and users based on an MAPPO algorithm is absent at present, optimization of flight azimuth, power and bandwidth distribution of the unmanned aerial vehicles is realized through optimization of parameters of unmanned aerial vehicles and user network models, observation states of the multiple unmanned aerial vehicles and multiple users are effectively adapted to predict and output a reasonable cooperative communication optimization strategy, throughput of a communication system is maximized under multi-dimensional decision-making action, and fairness of resource distribution is met.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a cooperative communication optimization method of multiple unmanned aerial vehicles and users based on MAPPO algorithm, which is simple in steps and reasonable in design, realizes the optimization of flight azimuth angle, power and bandwidth distribution of the unmanned aerial vehicles through the optimization of the unmanned aerial vehicles and user network model parameters, effectively adapts to the observation states of multiple unmanned aerial vehicles and multiple users to predict and output a reasonable cooperative communication optimization strategy, and realizes the maximization of the throughput of a communication system under the action of multidimensional decision and meets the fairness of resource distribution.
In order to solve the technical problems, the invention adopts the technical scheme that: a multi-unmanned aerial vehicle and user cooperative communication optimization method based on a MAPPO algorithm is characterized by comprising the following steps:
step one, establishing an unmanned aerial vehicle network model and a user network model:
step 101, setting parameters of an Actor network of the unmanned aerial vehicle as phi and parameters of a Critic network of the unmanned aerial vehicle as omega1The parameter of the user Actor network is theta, and the parameter of the user Critic network is omega2
Step 102, setting an initial value of a parameter phi of an unmanned aerial vehicle Actor network to be phi (0), and setting a parameter omega of a Critic network of the unmanned aerial vehicle to be omega1Has an initial value of ω1(0) The initial value of the parameter theta of the user Actor network is theta (0), and the parameter omega of the user Critic network is omega2Has an initial value of ω2(0) (ii) a Wherein phi (0) and omega1(0) θ (0) and ω2(0) The orthogonal initialization of the neural network is met;
step two, setting unmanned aerial vehicles and user scenes:
step 201, establishing a two-dimensional rectangular coordinate system OXY; wherein, the two-dimensional rectangular coordinate system is superposed with the ground area D;
step 202, setting N users in the ground area D, wherein the set of users is
Figure GDA0003376312920000021
Figure GDA0003376312920000022
Wherein, the position coordinate of the nth user at the tth moment is
Figure GDA0003376312920000023
N and N are positive integers, N is more than or equal to 1 and less than or equal to N, the ground area D is positioned in the first quadrant of OXY, the origin O is coincident with the lower left corner of the ground area D, and t is a positive integer;
step 203, setting M unmanned aerial vehicles above the ground area D, wherein the unmanned aerial vehicles are integrated into
Figure GDA0003376312920000031
And is
Figure GDA0003376312920000032
The deployment heights of the M unmanned aerial vehicles relative to the ground area D are all h;
step three, acquiring the observation states of the unmanned aerial vehicle and the user:
step 301, setting the observation state of the nth user at the tth moment as
Figure GDA0003376312920000033
And is
Figure GDA0003376312920000034
Wherein,
Figure GDA0003376312920000035
indicating the coordinate position of the nth user at the time instant t,
Figure GDA0003376312920000036
the two-dimensional coordinate position of the mth unmanned aerial vehicle which can be accessed by the nth user at the tth moment under the OXY is represented, M and M are positive integers, and M is more than or equal to 1 and less than or equal to M; sm(t-j) represents the number of users served by the mth drone at the jth moment before the tth moment, j is a positive integer, and j is 1, …, w; w is a positive integer, and w is less than t;
step 302, the observation state of the nth user at the tth moment
Figure GDA0003376312920000037
In a user Actor network with an input initial value theta (0), the user Actor network outputs a preactivation component x of the mth unmanned aerial vehiclem(θ(0));
Step 303, using a computer to
Figure GDA0003376312920000038
Obtaining discrete probability distribution of action of the nth user selecting the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000039
Wherein exp (·) represents an exponential function with a natural constant e as the base,
Figure GDA00033763129200000310
representing the action of selecting the unmanned aerial vehicle by the nth user at the tth moment;
step 304, the nth user at the tth moment according to the discrete probability distribution
Figure GDA00033763129200000311
Sampling action
Figure GDA00033763129200000312
And selecting corresponding unmanned aerial vehicle for access, and acquiring action of selecting unmanned aerial vehicle by nth user at the tth moment
Figure GDA00033763129200000313
Probability of (2)
Figure GDA00033763129200000314
305, setting the observation state of the mth unmanned aerial vehicle at the tth moment to be the observation state of the mth unmanned aerial vehicle by adopting a computer according to the user selection and the state of the unmanned aerial vehicle
Figure GDA00033763129200000315
And is
Figure GDA00033763129200000316
Wherein,
Figure GDA00033763129200000317
the two-dimensional coordinate position of the mth unmanned aerial vehicle under the OXY at the tth moment is shown,
Figure GDA00033763129200000318
indicating the coordinate positions of other unmanned aerial vehicles under OXY after the mth unmanned aerial vehicle is removed at the tth moment, wherein m 'is a positive integer, m' ≠ m, and
Figure GDA00033763129200000319
σm,n(t) represents the status of the nth user accessing the mth drone;
step 306, adopting a computer to observe the observation state of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000041
In an unmanned aerial vehicle Actor network with an input initial value of phi (0), the unmanned aerial vehicle Actor network outputs the observation state of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000042
Action of mth unmanned aerial vehicle at the next tth moment
Figure GDA0003376312920000043
Probability distribution of
Figure GDA0003376312920000044
Wherein,
Figure GDA0003376312920000045
obeying a beta distribution, i.e.
Figure GDA0003376312920000046
αφAnd betaφAre all shape parameters of beta distribution;
Figure GDA0003376312920000047
the action of the mth unmanned aerial vehicle at the tth moment is shown;
according to
Figure GDA0003376312920000048
Sampling action
Figure GDA0003376312920000049
Obtaining the transmitting power output value of the mth unmanned aerial vehicle to the nth user at the tth moment
Figure GDA00033763129200000410
Bandwidth output value of mth unmanned aerial vehicle to nth user at tth moment
Figure GDA00033763129200000411
And the flight azimuth angle of the mth unmanned aerial vehicle at the tth moment
Figure GDA00033763129200000412
And the motion of the mth unmanned aerial vehicle at the tth moment
Figure GDA00033763129200000413
Probability of (2)
Figure GDA00033763129200000414
Step 307, setting by computer
Figure GDA00033763129200000415
As an action mask of the mth unmanned aerial vehicle at the tth moment, a computer command is adopted
Figure GDA00033763129200000416
And
Figure GDA00033763129200000417
wherein,
Figure GDA00033763129200000418
indicating that the mth unmanned plane masks the nth user with the power value at the tth moment,
Figure GDA00033763129200000419
indicating that the mth unmanned aerial vehicle masks the nth user with the bandwidth value at the tth moment;
step 308, using a computer to
Figure GDA00033763129200000420
Obtaining the action component p of the transmitting power distributed to the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t);
By computer according to
Figure GDA00033763129200000421
Obtaining a bandwidth resource action component b distributed to the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t); wherein, bm(t) represents the bandwidth resources which can be allocated by the mth unmanned aerial vehicle at the tth moment, and
Figure GDA00033763129200000422
Btotalrepresenting the total bandwidth resource, s, shared by all UAVsm(t) represents the total number of users accessing the mth drone, bminRepresenting a minimum separable bandwidth;
step 309, obtaining the motion of the mth unmanned aerial vehicle at the tth moment by using a computer
Figure GDA00033763129200000423
And is
Figure GDA0003376312920000051
Wherein,
Figure GDA0003376312920000052
representing the flight azimuth angle of the mth unmanned aerial vehicle at the tth moment;
step 30A, the observation state of the nth user at the tth moment is
Figure GDA0003376312920000053
And the observation state of the mth unmanned aerial vehicle at the tth moment is
Figure GDA0003376312920000054
Merging the observed states recorded as the ith agent at the tth moment
Figure GDA0003376312920000055
Wherein, the agent includes M unmanned aerial vehicle and N users, and i is positive integer, and
Figure GDA0003376312920000056
act of selecting unmanned aerial vehicle by nth user at tth moment
Figure GDA0003376312920000057
And the action of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000058
Merging actions written as ith agent at tth moment
Figure GDA0003376312920000059
Act of selecting unmanned aerial vehicle by nth user at tth moment
Figure GDA00033763129200000510
Probability of (2)
Figure GDA00033763129200000511
And the action of the mth unmanned aerial vehicle at the tth moment
Figure GDA00033763129200000512
Probability of (2)
Figure GDA00033763129200000513
Merging action probabilities written as ith agent
Figure GDA00033763129200000514
Step four, acquiring global states of the unmanned aerial vehicle and the user:
step 401, inputting p in step 309 according to shannon channel capacity by using computerm,n(t) and bm,n(t), obtaining the theoretical communication speed c provided by the mth unmanned aerial vehicle for the nth user at the tth momentm,n(t);
Step 402, using a computer according to
Figure GDA00033763129200000515
Obtaining the communication speed of the nth user at the t moment
Figure GDA00033763129200000516
Step 403, setting the global state of the mth unmanned aerial vehicle at the tth moment to be
Figure GDA00033763129200000517
And is
Figure GDA00033763129200000518
Step 404, setting global state of nth user at tth moment as
Figure GDA00033763129200000519
Figure GDA00033763129200000520
Wherein,
Figure GDA00033763129200000521
indicating the coordinate positions of other users under OXY after the nth user is removed at the tth moment, n 'is a positive integer, n' ≠ n, and
Figure GDA00033763129200000522
step 405, carrying out global state of the mth unmanned aerial vehicle at the tth moment
Figure GDA00033763129200000523
And global state of nth user at tth moment
Figure GDA00033763129200000524
Merging global states recorded as ith agent at tth moment
Figure GDA00033763129200000525
Wherein i is a positive integer, and
Figure GDA00033763129200000526
step five, obtaining the rewards of the unmanned aerial vehicle and the user:
step 501, adopt the computer to according to
Figure GDA0003376312920000061
Obtaining the average communication speed c of N users at the t momentmean(t);
Step 502, using a computer according to
Figure GDA0003376312920000062
Obtaining the fairness index f of the mth unmanned aerial vehicle at the tth momentm(t);
Step 503, using computer to
Figure GDA0003376312920000063
Obtain reward of mth unmanned aerial vehicle at tth moment
Figure GDA0003376312920000064
Wherein r isdDenotes the reward factor, κ, of the dronerIs fm(t) an index parameter of (t),
Figure GDA0003376312920000065
the boundary penalty item of the mth unmanned aerial vehicle at the tth moment is represented;
step 504, using a computer to
Figure GDA0003376312920000066
Receive the reward of the nth user at the tth moment
Figure GDA0003376312920000067
Wherein r iscA reward factor representing a user;
step 505, reward of nth user at tth moment by computer
Figure GDA0003376312920000068
And reward of mth unmanned aerial vehicle at tth moment
Figure GDA0003376312920000069
Incorporating rewards accruing as ith agent at time t
Figure GDA00033763129200000610
Step six, storing experience tuples:
step 601, adopting a computer to send
Figure GDA00033763129200000611
The experience tuple is taken as the experience tuple of the ith agent at the tth moment and is stored in a cache region;
step 602, repeating the third step to the step 601, obtaining the experience tuple of the next moment, and storing the experience tuple into the buffer area until T is T ═ TmaxWhen the data is stored, completing data storage of one round; wherein, TmaxRepresenting the total number of moments per round;
603, repeating the step 602, and storing the data of the next round until the number of the test tuples in the buffer area is B to obtain the training data of the first round; wherein B is greater than Tmax
Step seven, parameters of the MAPPO algorithm iterative optimization network model:
step 701, inputting first round training data, and performing gradient rise optimization on a parameter phi of an unmanned aerial vehicle Actor network and a parameter theta of a user Actor network by using a computer through a MAPPO algorithm to obtain a first round optimized value of the parameter phi of the unmanned aerial vehicle Actor network and a first round optimized value of the parameter theta of the user Actor network;
meanwhile, a computer is adopted to center the Critic network omega of the unmanned aerial vehicle by using MAPPO algorithm1Parameter of and user criticic network omega2The parameters are optimized by gradient descent to obtain the parameter omega of the Critic network of the unmanned aerial vehicle1First round of optimization values and parameters omega of the user Critic network2A first round of optimization values of;
step 702, obtaining next round of training data according to the method from the third step to the step 603;
step 703, inputting next round of training data, and according to the method in step 701, performing next round of optimization updating by using the previous round of optimized values as parameter initial values to obtain next round of optimized values of the parameter phi of the unmanned aerial vehicle Actor network, next round of optimized values of the parameter theta of the user Actor network, and parameter omega of the unmanned aerial vehicle Critic network1Next round of optimization and parameter omega of user Critic network2The next round of optimization values;
step 704, according to the method from step three to step 603, completing the set maximum round ThThe P-th round training data is obtained through data storage; wherein, P is a positive integer;
705, inputting the P-th round of training data, and according to the method in the step 701, obtaining a P-th round of optimized parameter phi of the unmanned aerial vehicle Actor network, a P-th round of optimized parameter theta of the user Actor network, and a parameter omega of the unmanned aerial vehicle criticic network by using the previous round of optimized parameter as a parameter initial value1The P-th round optimization value and the parameter omega of the user Critic network2The last round optimized value of the P round;
step eight, optimizing and predicting the cooperative communication of multiple unmanned aerial vehicles and multiple users:
step 801, optimizing the P-th wheel according to the parameter phi of the Actor network of the unmanned aerial vehicleThe value, the P-th round optimization value of the parameter theta of the user Actor network, and the parameter omega of the unmanned aerial vehicle Critic network1The P-th round optimization value and the parameter omega of the user Critic network2Obtaining an optimized network model according to the P-th round optimization value;
and 802, acquiring the observation state of the nth user and the observation state of the mth unmanned aerial vehicle at the subsequent moment, and inputting the optimized network model to obtain the cooperative communication optimization action strategy of the mth unmanned aerial vehicle and the nth user at the subsequent moment.
The MAPPO algorithm-based multi-unmanned aerial vehicle and user cooperative communication optimization method is characterized by comprising the following steps: step 401, using computer to input p in step 309 according to shannon channel capacitym,n(t) and bm,n(t), obtaining the theoretical communication speed c provided by the mth unmanned aerial vehicle for the nth user at the tth momentm,n(t), the specific process is as follows:
step 4011, using computer according to formula
Figure GDA0003376312920000081
Obtaining LoS link probability from the mth unmanned aerial vehicle to the nth user at the tth moment
Figure GDA0003376312920000082
Wherein a denotes a first constant relating to the environment, b denotes a second constant relating to the environment, dm,n(t) represents the linear distance from the mth unmanned aerial vehicle to the nth user at the tth moment;
step 4012, using computer according to formula
Figure GDA0003376312920000083
Obtaining the path loss from the mth unmanned aerial vehicle to the nth user at the tth moment under the LoS link
Figure GDA0003376312920000084
Wherein ξLoSRepresents the added loss under the LoS link, c represents the speed of light, fcRepresents a signal carrier frequency;
step 4013, adopting computer to calculate according to formula
Figure GDA0003376312920000085
Obtaining the path loss from the mth unmanned aerial vehicle to the nth user at the tth moment under the NLoS link
Figure GDA0003376312920000086
Wherein ξNLoSRepresenting the additional loss under the NLoS link;
step 4014, using computer according to formula
Figure GDA0003376312920000087
Obtaining the path loss PL from the mth unmanned aerial vehicle to the nth user signalm,n(t); wherein,
Figure GDA0003376312920000088
the probability of NLoS link from the mth unmanned aerial vehicle to the nth user at the tth moment is represented, and
Figure GDA0003376312920000089
step 4015, using computer according to formula
Figure GDA00033763129200000810
Obtaining the signal power of the nth user signal at the tth moment for receiving the mth unmanned aerial vehicle
Figure GDA00033763129200000811
Step 4016, using computer according to formula
Figure GDA00033763129200000812
Obtaining the theoretical communication speed c provided for the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t); wherein n is0Representing the power spectral density of gaussian white noise in the channel.
The MAPPO algorithm-based multi-unmanned aerial vehicle and user cooperative communication optimization method is characterized by comprising the following steps: in the step 4011, a is more than 4.88 and less than 28, and b is more than 0 and less than 1;
additional loss xi under NLoS link in step 4012 and step 4013NLoSAdditional loss xi greater than in LoS linkLoSAdditional loss xi under LoS linkLoSThe value range of (0dB,50dB), additional loss xi under NLoS linkNLoSThe value range of (10dB,100 dB);
the user reward factor r in step 504cThe value range of (1) to (3);
reward factor r of the drone in step 503dHas a value range of 1 to 5, and rdGreater than rc(ii) a Index parameter kapparThe value range of (1) is a positive integer of 1-5.
The MAPPO algorithm-based multi-unmanned aerial vehicle and user cooperative communication optimization method is characterized by comprising the following steps: boundary penalty item of mth unmanned aerial vehicle at tth moment in step 503
Figure GDA00033763129200000921
The specific process of obtaining is as follows:
step 5031, setting the upper bound of the ground area D on the X axis as umax,xThe upper bound of the ground area D on the Y axis is umax,yThe lower bound of the ground area D on the X-axis is umin,xThe lower bound of the ground area D on the Y axis is umin,y(ii) a And u ismin,x=umin,y=0;
Step 5032, adopting a computer to determine the position of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000091
Obtaining the X coordinate of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000092
And the Y coordinate of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000093
Step 5033, when
Figure GDA0003376312920000094
Greater than umax,xOr
Figure GDA0003376312920000095
Less than umin,xAccording to the computer
Figure GDA0003376312920000096
Obtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000097
Wherein r isbDenotes a penalty factor, κbRepresenting gradient factors for determining the smoothness of the boundary function, and a penalty factor rbHas a value range of 10 to 50 and a gradient factor kappab0.07 to 0.1;
when in use
Figure GDA0003376312920000098
Greater than umax,yOr
Figure GDA0003376312920000099
Less than umin,yAccording to the computer
Figure GDA00033763129200000910
Obtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
Figure GDA00033763129200000911
When in use
Figure GDA00033763129200000912
Greater than umax,xAnd is
Figure GDA00033763129200000913
Greater than umax,yOr
Figure GDA00033763129200000914
Less than umin,xAnd is
Figure GDA00033763129200000915
Less than umin,yAccording to the computer
Figure GDA00033763129200000916
Obtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
Figure GDA00033763129200000917
When in use
Figure GDA00033763129200000918
And
Figure GDA00033763129200000919
are all located in the ground area D,
Figure GDA00033763129200000920
the MAPPO algorithm-based multi-unmanned aerial vehicle and user cooperative communication optimization method is characterized by comprising the following steps: in the step 301, the value range of w is 3-20;
alpha in step 306φAnd betaφThe following are satisfied: alpha is alphaφ≥1,βφ≥1。
The MAPPO algorithm-based multi-unmanned aerial vehicle and user cooperative communication optimization method is characterized by comprising the following steps: maximum round T set in step 704hThe value range of (1) is 5000-6000;
total number of wheels
Figure GDA0003376312920000101
Compared with the prior art, the invention has the following advantages:
1. the method has simple steps and reasonable design, is suitable for the games of a plurality of unmanned aerial vehicles and a plurality of users, realizes the prediction of the cooperative communication optimization strategy, maximizes the throughput of a communication system under the action of multidimensional decision and meets the fairness of resource allocation.
2. The method comprises the steps of firstly establishing an unmanned aerial vehicle network model and a user network model, then obtaining training data through unmanned aerial vehicle and user scene setting, unmanned aerial vehicle and user observation state obtaining, unmanned aerial vehicle and user global state obtaining, unmanned aerial vehicle and user reward obtaining and experience tuple storing, and training the training data through MAPPO algorithm to realize updating and optimization of parameters of the network model to obtain an optimized network model; and finally, inputting the observation state of the user and the observation state of the unmanned aerial vehicle at the subsequent moment into the optimized network model so as to obtain the cooperative communication optimization strategy of the unmanned aerial vehicle and the user.
3. According to the invention, parameters of an unmanned aerial vehicle Actor network, parameters of a user Actor network, parameters of an unmanned aerial vehicle criticic network and parameters of the user criticic network are trained and iterated by using a MAPP algorithm, so that all users can acquire communication rate by themselves through greedy maximization of a competition strategy, each unmanned aerial vehicle intelligently allocates power and bandwidth resources for users who select to access the unmanned aerial vehicle, dynamically decides flight azimuth angles of the unmanned aerial vehicle, and forms a most appropriate space topological structure under the current environment through cooperation with other unmanned aerial vehicles.
4. The invention performs joint optimization on the access strategy of users, the power distributed by the unmanned aerial vehicles, the bandwidth resource scheduling distributed by the unmanned aerial vehicles and the flight azimuth angle of the unmanned aerial vehicles, and all the unmanned aerial vehicles share the total bandwidth resource, thereby maximizing the system throughput through dynamic resource scheduling and simultaneously ensuring the fairness of the communication rate among the users under the condition of meeting the constraint condition of the minimum communication rate of each user.
5. The invention adopts MAPP (Multi-Agent public Policy Optimization) algorithm to solve the problem of coexistence of discrete and continuous actions of various types of agents. Different from the previous method for centrally deciding the multidimensional action of the unmanned aerial vehicle cluster, the MAPPO algorithm considers partial observability under the real condition, so that each agent only depends on self-observation distributed decision. The defects that the dimensionality is too high and cannot be expanded and the like caused by a centralized decision-making mode when a single-agent reinforcement learning algorithm is used for processing the problem of multiple agents are overcome.
6. Aiming at the practical problem that different unmanned aerial vehicles can be selectively accessed by different numbers of users, the unmanned aerial vehicle resource allocation strategy dimensionality is dynamically adjusted by setting the action mask, and the user information which is not selectively accessed is shielded by the action mask, namely, the unmanned aerial vehicle only needs to allocate resources for the user which is selectively accessed.
7. Aiming at the fact that the flight azimuth angle of the unmanned aerial vehicle is bounded when the flight azimuth angle of the unmanned aerial vehicle is optimized, the parameterized beta strategy is adopted to replace the traditional Gaussian strategy, the problem of biased estimation of the Gaussian strategy under the condition that the action of the unmanned aerial vehicle is bounded can be solved, and the phenomenon that the unmanned aerial vehicle converges to local optimum under the multi-peak reward environment is improved.
8. The invention not only carries out strategy allocation on power, but also carries out strategy allocation on bandwidth, thereby improving the flexibility and latitude of allocation.
In conclusion, the method provided by the invention has the advantages of simple steps and reasonable design, realizes the optimization of the flight azimuth angle, the power and the bandwidth distribution of the unmanned aerial vehicle through the optimization of the unmanned aerial vehicle and the user network model parameters, effectively adapts to the observation states of a plurality of unmanned aerial vehicles and a plurality of users to predict and output a reasonable cooperative communication optimization strategy, and realizes the maximization of the throughput of a communication system under the action of a multidimensional decision and meets the fairness of resource distribution.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a block diagram of the process flow of the present invention.
Detailed Description
As shown in fig. 1, a method for optimizing cooperative communication between multiple drones and users based on MAPPO algorithm includes the following steps:
step one, establishing an unmanned aerial vehicle network model and a user network model:
step 101, setting parameters of an Actor network of the unmanned aerial vehicle as phi and parameters of a Critic network of the unmanned aerial vehicle as omega1The parameter of the user Actor network is theta, and the parameter of the user Critic network is omega2
Step 102, setting an initial value of a parameter phi of an unmanned aerial vehicle Actor network to be phi (0), and setting a parameter omega of a Critic network of the unmanned aerial vehicle to be omega1Of (2) is initiatedValue of omega1(0) The initial value of the parameter theta of the user Actor network is theta (0), and the parameter omega of the user Critic network is omega2Has an initial value of ω2(0) (ii) a Wherein phi (0) and omega1(0) θ (0) and ω2(0) The orthogonal initialization of the neural network is met;
step two, setting unmanned aerial vehicles and user scenes:
step 201, establishing a two-dimensional rectangular coordinate system OXY; wherein, the two-dimensional rectangular coordinate system is superposed with the ground area D;
step 202, setting N users in the ground area D, wherein the set of users is
Figure GDA0003376312920000121
Figure GDA0003376312920000122
Wherein, the position coordinate of the nth user at the tth moment is
Figure GDA0003376312920000123
N and N are positive integers, N is more than or equal to 1 and less than or equal to N, the ground area D is positioned in the first quadrant of OXY, the origin O is coincident with the lower left corner of the ground area D, and t is a positive integer;
step 203, setting M unmanned aerial vehicles above the ground area D, wherein the unmanned aerial vehicles are integrated into
Figure GDA0003376312920000124
And is
Figure GDA0003376312920000125
The deployment heights of the M unmanned aerial vehicles relative to the ground area D are all h;
step three, acquiring the observation states of the unmanned aerial vehicle and the user:
step 301, setting the observation state of the nth user at the tth moment as
Figure GDA0003376312920000126
And is
Figure GDA0003376312920000127
Wherein,
Figure GDA0003376312920000128
indicating the coordinate position of the nth user at the time instant t,
Figure GDA0003376312920000129
the two-dimensional coordinate position of the mth unmanned aerial vehicle which can be accessed by the nth user at the tth moment under the OXY is represented, M and M are positive integers, and M is more than or equal to 1 and less than or equal to M; sm(t-j) represents the number of users served by the mth drone at the jth moment before the tth moment, j is a positive integer, and j is 1, …, w; w is a positive integer, and w is less than t;
step 302, the observation state of the nth user at the tth moment
Figure GDA0003376312920000131
In a user Actor network with an input initial value theta (0), the user Actor network outputs a preactivation component x of the mth unmanned aerial vehiclem(θ(0));
Step 303, using a computer to
Figure GDA0003376312920000132
Obtaining discrete probability distribution of action of the nth user selecting the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000133
Wherein exp (·) represents an exponential function with a natural constant e as the base,
Figure GDA0003376312920000134
representing the action of selecting the unmanned aerial vehicle by the nth user at the tth moment;
step 304, the nth user at the tth moment according to the discrete probability distribution
Figure GDA0003376312920000135
Sampling action
Figure GDA0003376312920000136
And selecting corresponding unmanned aerial vehicle for access, and acquiring action of selecting unmanned aerial vehicle by nth user at the tth moment
Figure GDA0003376312920000137
Probability of (2)
Figure GDA0003376312920000138
305, setting the observation state of the mth unmanned aerial vehicle at the tth moment to be the observation state of the mth unmanned aerial vehicle by adopting a computer according to the user selection and the state of the unmanned aerial vehicle
Figure GDA0003376312920000139
And is
Figure GDA00033763129200001310
Wherein,
Figure GDA00033763129200001311
the two-dimensional coordinate position of the mth unmanned aerial vehicle under the OXY at the tth moment is shown,
Figure GDA00033763129200001312
indicating the coordinate positions of other unmanned aerial vehicles under OXY after the mth unmanned aerial vehicle is removed at the tth moment, wherein m 'is a positive integer, m' ≠ m, and
Figure GDA00033763129200001313
σm,n(t) represents the status of the nth user accessing the mth drone;
step 306, adopting a computer to observe the observation state of the mth unmanned aerial vehicle at the tth moment
Figure GDA00033763129200001314
In an unmanned aerial vehicle Actor network with an input initial value of phi (0), the unmanned aerial vehicle Actor network outputs the observation state of the mth unmanned aerial vehicle at the tth moment
Figure GDA00033763129200001315
Action of mth unmanned aerial vehicle at the next tth moment
Figure GDA00033763129200001316
Probability distribution of
Figure GDA00033763129200001317
Wherein,
Figure GDA00033763129200001318
obeying a beta distribution, i.e.
Figure GDA00033763129200001319
αφAnd betaφAre all shape parameters of beta distribution;
Figure GDA00033763129200001320
the action of the mth unmanned aerial vehicle at the tth moment is shown;
according to
Figure GDA00033763129200001321
Sampling action
Figure GDA00033763129200001322
Obtaining the transmitting power output value of the mth unmanned aerial vehicle to the nth user at the tth moment
Figure GDA00033763129200001323
Bandwidth output value of mth unmanned aerial vehicle to nth user at tth moment
Figure GDA0003376312920000141
And the flight azimuth angle of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000142
And the motion of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000143
Probability of (2)
Figure GDA0003376312920000144
Step 307, setting by computer
Figure GDA0003376312920000145
As an action mask of the mth unmanned aerial vehicle at the tth moment, a computer command is adopted
Figure GDA0003376312920000146
And
Figure GDA0003376312920000147
wherein,
Figure GDA0003376312920000148
indicating that the mth unmanned plane masks the nth user with the power value at the tth moment,
Figure GDA0003376312920000149
indicating that the mth unmanned aerial vehicle masks the nth user with the bandwidth value at the tth moment;
step 308, using a computer to
Figure GDA00033763129200001410
Obtaining the action component p of the transmitting power distributed to the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t);
By computer according to
Figure GDA00033763129200001411
Obtaining a bandwidth resource action component b distributed to the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t); wherein, bm(t) represents the bandwidth resources which can be allocated by the mth unmanned aerial vehicle at the tth moment, and
Figure GDA00033763129200001412
Btotalrepresenting the total bandwidth resource, s, shared by all UAVsm(t) represents the total number of users accessing the mth drone, bminRepresenting a minimum separable bandwidth;
step 309, obtaining the motion of the mth unmanned aerial vehicle at the tth moment by using a computer
Figure GDA00033763129200001413
And is
Figure GDA00033763129200001414
Wherein,
Figure GDA00033763129200001415
representing the flight azimuth angle of the mth unmanned aerial vehicle at the tth moment;
step 30A, the observation state of the nth user at the tth moment is
Figure GDA00033763129200001416
And the observation state of the mth unmanned aerial vehicle at the tth moment is
Figure GDA00033763129200001417
Merging the observed states recorded as the ith agent at the tth moment
Figure GDA00033763129200001418
Wherein, the agent includes M unmanned aerial vehicle and N users, and i is positive integer, and
Figure GDA00033763129200001419
act of selecting unmanned aerial vehicle by nth user at tth moment
Figure GDA00033763129200001420
And the action of the mth unmanned aerial vehicle at the tth moment
Figure GDA00033763129200001421
Merging actions written as ith agent at tth moment
Figure GDA00033763129200001422
Act of selecting unmanned aerial vehicle by nth user at tth moment
Figure GDA00033763129200001423
Probability of (2)
Figure GDA00033763129200001424
And the action of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000151
Probability of (2)
Figure GDA0003376312920000152
Merging action probabilities written as ith agent
Figure GDA0003376312920000153
Step four, acquiring global states of the unmanned aerial vehicle and the user:
step 401, inputting p in step 309 according to shannon channel capacity by using computerm,n(t) and bm,n(t), obtaining the theoretical communication speed c provided by the mth unmanned aerial vehicle for the nth user at the tth momentm,n(t);
Step 402, using a computer according to
Figure GDA0003376312920000154
Obtaining the communication speed of the nth user at the t moment
Figure GDA0003376312920000155
Step 403, setting the global state of the mth unmanned aerial vehicle at the tth moment to be
Figure GDA0003376312920000156
And is
Figure GDA0003376312920000157
Step 404, setting global state of nth user at tth moment as
Figure GDA0003376312920000158
Figure GDA0003376312920000159
Wherein,
Figure GDA00033763129200001510
indicating the coordinate positions of other users under OXY after the nth user is removed at the tth moment, n 'is a positive integer, n' ≠ n, and
Figure GDA00033763129200001511
step 405, carrying out global state of the mth unmanned aerial vehicle at the tth moment
Figure GDA00033763129200001512
And global state of nth user at tth moment
Figure GDA00033763129200001513
Merging global states recorded as ith agent at tth moment
Figure GDA00033763129200001514
Wherein i is a positive integer, and
Figure GDA00033763129200001515
step five, obtaining the rewards of the unmanned aerial vehicle and the user:
step 501, adopt the computer to according to
Figure GDA00033763129200001516
Obtaining the average communication speed c of N users at the t momentmean(t);
Step 502, using a computer according to
Figure GDA00033763129200001517
Obtaining the fairness index f of the mth unmanned aerial vehicle at the tth momentm(t);
Step 503, using computer to
Figure GDA00033763129200001518
To obtain the firstReward of mth unmanned aerial vehicle at t moments
Figure GDA00033763129200001519
Wherein r isdDenotes the reward factor, κ, of the dronerIs fm(t) an index parameter of (t),
Figure GDA0003376312920000161
the boundary penalty item of the mth unmanned aerial vehicle at the tth moment is represented;
step 504, using a computer to
Figure GDA0003376312920000162
Receive the reward of the nth user at the tth moment
Figure GDA0003376312920000163
Wherein r iscA reward factor representing a user;
step 505, reward of nth user at tth moment by computer
Figure GDA0003376312920000164
And reward of mth unmanned aerial vehicle at tth moment
Figure GDA0003376312920000165
Incorporating rewards accruing as ith agent at time t
Figure GDA0003376312920000166
Step six, storing experience tuples:
step 601, adopting a computer to send
Figure GDA0003376312920000167
The experience tuple is taken as the experience tuple of the ith agent at the tth moment and is stored in a cache region;
step 602, repeating the third step to the step 601, obtaining the experience tuple of the next moment, and storing the experience tuple into the buffer area until T is T ═ TmaxWhen the data is stored, completing data storage of one round; wherein,Tmaxrepresenting the total number of moments per round;
603, repeating the step 602, and storing the data of the next round until the number of the test tuples in the buffer area is B to obtain the training data of the first round; wherein B is greater than Tmax
Step seven, parameters of the MAPPO algorithm iterative optimization network model:
step 701, inputting first round training data, and performing gradient rise optimization on a parameter phi of an unmanned aerial vehicle Actor network and a parameter theta of a user Actor network by using a computer through a MAPPO algorithm to obtain a first round optimized value of the parameter phi of the unmanned aerial vehicle Actor network and a first round optimized value of the parameter theta of the user Actor network;
meanwhile, a computer is adopted to center the Critic network omega of the unmanned aerial vehicle by using MAPPO algorithm1Parameter of and user criticic network omega2The parameters are optimized by gradient descent to obtain the parameter omega of the Critic network of the unmanned aerial vehicle1First round of optimization values and parameters omega of the user Critic network2A first round of optimization values of;
step 702, obtaining next round of training data according to the method from the third step to the step 603;
step 703, inputting next round of training data, and according to the method in step 701, performing next round of optimization updating by using the previous round of optimized values as parameter initial values to obtain next round of optimized values of the parameter phi of the unmanned aerial vehicle Actor network, next round of optimized values of the parameter theta of the user Actor network, and parameter omega of the unmanned aerial vehicle Critic network1Next round of optimization and parameter omega of user Critic network2The next round of optimization values;
step 704, according to the method from step three to step 603, completing the set maximum round ThThe P-th round training data is obtained through data storage; wherein, P is a positive integer;
705, inputting the P-th round training data, and according to the method in the step 701, obtaining a P-th round optimized value of a parameter phi of an Actor network of the unmanned aerial vehicle, a P-th round optimized value of a parameter theta of the Actor network of the user, and a Crit of the unmanned aerial vehicle by using the previous round optimized value as a parameter initial valueParameter omega of ic network1The P-th round optimization value and the parameter omega of the user Critic network2The last round optimized value of the P round;
step eight, optimizing and predicting the cooperative communication of multiple unmanned aerial vehicles and multiple users:
step 801, according to the P-th round optimization value of the parameter phi of the unmanned plane Actor network, the P-th round optimization value of the parameter theta of the user Actor network, and the parameter omega of the unmanned plane Critic network1The P-th round optimization value and the parameter omega of the user Critic network2Obtaining an optimized network model according to the P-th round optimization value;
and 802, acquiring the observation state of the nth user and the observation state of the mth unmanned aerial vehicle at the subsequent moment, and inputting the optimized network model to obtain the cooperative communication optimization action strategy of the mth unmanned aerial vehicle and the nth user at the subsequent moment.
In this embodiment, step 401 is performed by inputting p in step 309 into a computer according to the shannon channel capacitym,n(t) and bm,n(t), obtaining the theoretical communication speed c provided by the mth unmanned aerial vehicle for the nth user at the tth momentm,n(t), the specific process is as follows:
step 4011, using computer according to formula
Figure GDA0003376312920000171
Obtaining LoS link probability from the mth unmanned aerial vehicle to the nth user at the tth moment
Figure GDA0003376312920000172
Wherein a denotes a first constant relating to the environment, b denotes a second constant relating to the environment, dm,n(t) represents the linear distance from the mth unmanned aerial vehicle to the nth user at the tth moment;
step 4012, using computer according to formula
Figure GDA0003376312920000173
Obtaining the path loss from the mth unmanned aerial vehicle to the nth user at the tth moment under the LoS link
Figure GDA0003376312920000174
Wherein ξLoSRepresents the added loss under the LoS link, c represents the speed of light, fcRepresents a signal carrier frequency;
step 4013, adopting computer to calculate according to formula
Figure GDA0003376312920000181
Obtaining the path loss from the mth unmanned aerial vehicle to the nth user at the tth moment under the NLoS link
Figure GDA0003376312920000182
Wherein ξNLoSRepresenting the additional loss under the NLoS link;
step 4014, using computer according to formula
Figure GDA0003376312920000183
Obtaining the path loss PL from the mth unmanned aerial vehicle to the nth user signalm,n(t); wherein,
Figure GDA0003376312920000184
the probability of NLoS link from the mth unmanned aerial vehicle to the nth user at the tth moment is represented, and
Figure GDA0003376312920000185
step 4015, using computer according to formula
Figure GDA0003376312920000186
Obtaining the signal power of the nth user signal at the tth moment for receiving the mth unmanned aerial vehicle
Figure GDA0003376312920000187
Step 4016, using computer according to formula
Figure GDA0003376312920000188
Obtaining the theoretical communication speed c provided for the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t); wherein n is0Presentation letterPower spectral density of gaussian white noise in the tract.
In this embodiment, in step 4011, a is greater than 4.88 and less than 28, and b is greater than 0 and less than 1;
additional loss xi under NLoS link in step 4012 and step 4013NLoSAdditional loss xi greater than in LoS linkLoSAdditional loss xi under LoS linkLoSThe value range of (0dB,50dB), additional loss xi under NLoS linkNLoSThe value range of (10dB,100 dB);
the user reward factor r in step 504cThe value range of (1) to (3);
reward factor r of the drone in step 503dHas a value range of 1 to 5, and rdGreater than rc(ii) a Index parameter kapparThe value range of (1) is a positive integer of 1-5.
In this embodiment, in step 503, the boundary penalty term of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000189
The specific process of obtaining is as follows:
step 5031, setting the upper bound of the ground area D on the X axis as umax,xThe upper bound of the ground area D on the Y axis is umax,yThe lower bound of the ground area D on the X-axis is umin,xThe lower bound of the ground area D on the Y axis is umin,y(ii) a And u ismin,x=umin,y=0;
Step 5032, adopting a computer to determine the position of the mth unmanned aerial vehicle at the tth moment
Figure GDA00033763129200001810
Obtaining the X coordinate of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000191
And the Y coordinate of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000192
Step 5033, when
Figure GDA0003376312920000193
Greater than umax,xOr
Figure GDA0003376312920000194
Less than umin,xAccording to the computer
Figure GDA0003376312920000195
Obtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
Figure GDA0003376312920000196
Wherein r isbDenotes a penalty factor, κbRepresenting gradient factors for determining the smoothness of the boundary function, and a penalty factor rbHas a value range of 10 to 50 and a gradient factor kappab0.07 to 0.1;
when in use
Figure GDA0003376312920000197
Greater than umax,yOr
Figure GDA0003376312920000198
Less than umin,yAccording to the computer
Figure GDA0003376312920000199
Obtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
Figure GDA00033763129200001910
When in use
Figure GDA00033763129200001911
Greater than umax,xAnd is
Figure GDA00033763129200001912
Greater than umax,yOr
Figure GDA00033763129200001913
Less than umin,xAnd is
Figure GDA00033763129200001914
Less than umin,yAccording to the computer
Figure GDA00033763129200001915
Obtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
Figure GDA00033763129200001916
When in use
Figure GDA00033763129200001917
And
Figure GDA00033763129200001918
are all located in the ground area D,
Figure GDA00033763129200001919
in this embodiment, the value range of w in step 301 is 3-20;
alpha in step 306φAnd betaφThe following are satisfied: alpha is alphaφ≥1,βφ≥1。
In this embodiment, the maximum round T set in step 704hThe value range of (1) is 5000-6000;
total number of wheels
Figure GDA00033763129200001920
In this embodiment, the area D is a 2km × 2km square area, the deployment height h of M unmanned aerial vehicles with respect to the ground area D is 500M, when each round starts, all unmanned aerial vehicles take off from the origin, and the users are randomly distributed in the area D and move in random directions and at random speeds, Tmax=1000。
In this embodiment, the maximum round T is sethThe value of (A) is 5000, and the value of B is 2000-4000.
In this embodiment, the value range of w is 3.
In this embodiment, total transmission power P of transmission of each unmanned aerial vehicletotal10mw, total bandwidth resource B shared by all UAVstotal30MHz, signal carrier frequency fcPower spectral density n of white gaussian noise in a channel at 2GHz0=1×10-17mw/Hz, minimum separable bandwidth bmin=0.1MHz。
In this example, Tmax1000, each decision time interval is 1s, i.e. the interval between the first time t and the first time t +1 is 1 s.
In this embodiment, when σ is actually usedm,nWhen (t) is 1, it means that the nth user selects the mth drone as the access base station, and otherwise, it is 0.
In this embodiment, the user reward coefficient rcIs 1, the reward coefficient r of the unmanned aerial vehicledIs taken as 2, the exponential parameter krThe value of (a) is 5,
in this embodiment, the penalty term coefficient rbHas a value range of 20 and a gradient factor kappabIs 8 x 10-2
In this example, αφ=βφ=1。
In conclusion, the method provided by the invention has the advantages of simple steps and reasonable design, realizes the optimization of the flight azimuth angle, the power and the bandwidth distribution of the unmanned aerial vehicle through the optimization of the unmanned aerial vehicle and the user network model parameters, effectively adapts to the observation states of a plurality of unmanned aerial vehicles and a plurality of users to predict and output a reasonable cooperative communication optimization strategy, and realizes the maximization of the throughput of a communication system under the action of a multidimensional decision and meets the fairness of resource distribution.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all simple modifications, changes and equivalent structural changes made to the above embodiment according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (6)

1. A multi-unmanned aerial vehicle and user cooperative communication optimization method based on a MAPPO algorithm is characterized by comprising the following steps:
step one, establishing an unmanned aerial vehicle network model and a user network model:
step 101, setting parameters of an Actor network of the unmanned aerial vehicle as phi and parameters of a Critic network of the unmanned aerial vehicle as omega1The parameter of the user Actor network is theta, and the parameter of the user Critic network is omega2
Step 102, setting an initial value of a parameter phi of an unmanned aerial vehicle Actor network to be phi (0), and setting a parameter omega of a Critic network of the unmanned aerial vehicle to be omega1Has an initial value of ω1(0) The initial value of the parameter theta of the user Actor network is theta (0), and the parameter omega of the user Critic network is omega2Has an initial value of ω2(0) (ii) a Wherein phi (0) and omega1(0) θ (0) and ω2(0) The orthogonal initialization of the neural network is met;
step two, setting unmanned aerial vehicles and user scenes:
step 201, establishing a two-dimensional rectangular coordinate system OXY; wherein, the two-dimensional rectangular coordinate system is superposed with the ground area D;
step 202, setting N users in the ground area D, wherein the set of users is
Figure FDA0003376312910000011
Figure FDA0003376312910000012
Wherein, the position coordinate of the nth user at the tth moment is
Figure FDA0003376312910000013
N and N are positive integers, N is more than or equal to 1 and less than or equal to N, the ground area D is positioned in the first quadrant of OXY, the origin O is coincident with the lower left corner of the ground area D, and t is a positive integer;
step 203, setting M unmanned aerial vehicles above the ground area D, wherein the unmanned aerial vehicles are integrated into
Figure FDA0003376312910000014
And is
Figure FDA0003376312910000015
The deployment heights of the M unmanned aerial vehicles relative to the ground area D are all h;
step three, acquiring the observation states of the unmanned aerial vehicle and the user:
step 301, setting the observation state of the nth user at the tth moment as
Figure FDA0003376312910000016
And is
Figure FDA0003376312910000017
Wherein,
Figure FDA0003376312910000018
indicating the coordinate position of the nth user at the time instant t,
Figure FDA0003376312910000019
the two-dimensional coordinate position of the mth unmanned aerial vehicle which can be accessed by the nth user at the tth moment under the OXY is represented, M and M are positive integers, and M is more than or equal to 1 and less than or equal to M; sm(t-j) represents the number of users served by the mth drone at the jth moment before the tth moment, j is a positive integer, and j is 1, …, w; w is a positive integer, and w is less than t;
step 302, the observation state of the nth user at the tth moment
Figure FDA0003376312910000021
In a user Actor network with an input initial value theta (0), the user Actor network outputs a preactivation component x of the mth unmanned aerial vehiclem(θ(0));
Step 303, using a computer to
Figure FDA0003376312910000022
Obtaining discrete probability distribution of action of the nth user selecting the mth unmanned aerial vehicle at the tth moment
Figure FDA0003376312910000023
Wherein exp (·) represents an exponential function with a natural constant e as the base,
Figure FDA0003376312910000024
representing the action of selecting the unmanned aerial vehicle by the nth user at the tth moment;
step 304, the nth user at the tth moment according to the discrete probability distribution
Figure FDA0003376312910000025
Sampling action
Figure FDA0003376312910000026
And selecting corresponding unmanned aerial vehicle for access, and acquiring action of selecting unmanned aerial vehicle by nth user at the tth moment
Figure FDA0003376312910000027
Probability of (2)
Figure FDA0003376312910000028
305, setting the observation state of the mth unmanned aerial vehicle at the tth moment to be the observation state of the mth unmanned aerial vehicle by adopting a computer according to the user selection and the state of the unmanned aerial vehicle
Figure FDA0003376312910000029
And is
Figure FDA00033763129100000210
Wherein,
Figure FDA00033763129100000211
the two-dimensional coordinate position of the mth unmanned aerial vehicle under the OXY at the tth moment is shown,
Figure FDA00033763129100000212
indicating the coordinate positions of other unmanned aerial vehicles under OXY after the mth unmanned aerial vehicle is removed at the tth moment, wherein m 'is a positive integer, m' ≠ m, and
Figure FDA00033763129100000213
σm,n(t) represents the status of the nth user accessing the mth drone;
step 306, adopting a computer to observe the observation state of the mth unmanned aerial vehicle at the tth moment
Figure FDA00033763129100000214
In an unmanned aerial vehicle Actor network with an input initial value of phi (0), the unmanned aerial vehicle Actor network outputs the observation state of the mth unmanned aerial vehicle at the tth moment
Figure FDA00033763129100000215
Action of mth unmanned aerial vehicle at the next tth moment
Figure FDA00033763129100000216
Probability distribution of
Figure FDA00033763129100000217
Wherein,
Figure FDA00033763129100000218
obeying a beta distribution, i.e.
Figure FDA00033763129100000219
αφAnd betaφAre all shape parameters of beta distribution;
Figure FDA00033763129100000220
the action of the mth unmanned aerial vehicle at the tth moment is shown;
according to
Figure FDA00033763129100000221
Sampling action
Figure FDA00033763129100000222
The launch power of the mth unmanned aerial vehicle to the nth user at the tth moment is obtainedRate output value
Figure FDA0003376312910000031
Bandwidth output value of mth unmanned aerial vehicle to nth user at tth moment
Figure FDA0003376312910000032
And the flight azimuth angle of the mth unmanned aerial vehicle at the tth moment
Figure FDA0003376312910000033
And the motion of the mth unmanned aerial vehicle at the tth moment
Figure FDA0003376312910000034
Probability of (2)
Figure FDA0003376312910000035
Step 307, setting by computer
Figure FDA0003376312910000036
As an action mask of the mth unmanned aerial vehicle at the tth moment, a computer command is adopted
Figure FDA0003376312910000037
And
Figure FDA0003376312910000038
wherein,
Figure FDA0003376312910000039
indicating that the mth unmanned plane masks the nth user with the power value at the tth moment,
Figure FDA00033763129100000310
indicating that the mth unmanned aerial vehicle masks the nth user with the bandwidth value at the tth moment;
step 308, using a computer to
Figure FDA00033763129100000311
Obtaining the action component p of the transmitting power distributed to the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t);
By computer according to
Figure FDA00033763129100000312
Obtaining a bandwidth resource action component b distributed to the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t); wherein, bm(t) represents the bandwidth resources which can be allocated by the mth unmanned aerial vehicle at the tth moment, and
Figure FDA00033763129100000313
Btotalrepresenting the total bandwidth resource, s, shared by all UAVsm(t) represents the total number of users accessing the mth drone, bminRepresenting a minimum separable bandwidth; ptotalRepresenting the total transmit power of the transmissions of each drone;
step 309, obtaining the motion of the mth unmanned aerial vehicle at the tth moment by using a computer
Figure FDA00033763129100000314
And is
Figure FDA00033763129100000315
Wherein,
Figure FDA00033763129100000316
representing the flight azimuth angle of the mth unmanned aerial vehicle at the tth moment;
step 30A, the observation state of the nth user at the tth moment is
Figure FDA00033763129100000317
And the observation state of the mth unmanned aerial vehicle at the tth moment is
Figure FDA00033763129100000318
Are combined and recorded as the ith time of the tObserved state of agent
Figure FDA00033763129100000319
Wherein, the agent includes M unmanned aerial vehicle and N users, and i is positive integer, and
Figure FDA00033763129100000320
act of selecting unmanned aerial vehicle by nth user at tth moment
Figure FDA00033763129100000321
And the action of the mth unmanned aerial vehicle at the tth moment
Figure FDA00033763129100000322
Merging actions written as ith agent at tth moment
Figure FDA00033763129100000323
Act of selecting unmanned aerial vehicle by nth user at tth moment
Figure FDA0003376312910000041
Probability of (2)
Figure FDA0003376312910000042
And the action of the mth unmanned aerial vehicle at the tth moment
Figure FDA0003376312910000043
Probability of (2)
Figure FDA0003376312910000044
Merging action probabilities written as ith agent
Figure FDA0003376312910000045
Step four, acquiring global states of the unmanned aerial vehicle and the user:
step 401,Using a computer to input p in step 309 according to the shannon channel capacitym,n(t) and bm,n(t), obtaining the theoretical communication speed c provided by the mth unmanned aerial vehicle for the nth user at the tth momentm,n(t);
Step 402, using a computer according to
Figure FDA0003376312910000046
Obtaining the communication speed of the nth user at the t moment
Figure FDA0003376312910000047
Step 403, setting the global state of the mth unmanned aerial vehicle at the tth moment to be
Figure FDA0003376312910000048
And is
Figure FDA0003376312910000049
Step 404, setting global state of nth user at tth moment as
Figure FDA00033763129100000410
Figure FDA00033763129100000411
Wherein,
Figure FDA00033763129100000412
indicating the coordinate positions of other users under OXY after the nth user is removed at the tth moment, n 'is a positive integer, n' ≠ n, and
Figure FDA00033763129100000413
step 405, carrying out global state of the mth unmanned aerial vehicle at the tth moment
Figure FDA00033763129100000414
And global state of nth user at tth moment
Figure FDA00033763129100000415
Merging global states recorded as ith agent at tth moment
Figure FDA00033763129100000416
Wherein i is a positive integer, and
Figure FDA00033763129100000417
step five, obtaining the rewards of the unmanned aerial vehicle and the user:
step 501, adopt the computer to according to
Figure FDA00033763129100000418
Obtaining the average communication speed c of N users at the t momentmean(t);
Step 502, using a computer according to
Figure FDA00033763129100000419
Obtaining the fairness index f of the mth unmanned aerial vehicle at the tth momentm(t);
Step 503, using computer to
Figure FDA00033763129100000420
Obtain reward of mth unmanned aerial vehicle at tth moment
Figure FDA0003376312910000051
Wherein r isdDenotes the reward factor, κ, of the dronerIs fm(t) an index parameter of (t),
Figure FDA0003376312910000052
the boundary penalty item of the mth unmanned aerial vehicle at the tth moment is represented;
step 504, using a computer to
Figure FDA0003376312910000053
Receive the reward of the nth user at the tth moment
Figure FDA0003376312910000054
Wherein r iscA reward factor representing a user;
step 505, reward of nth user at tth moment by computer
Figure FDA0003376312910000055
And reward of mth unmanned aerial vehicle at tth moment
Figure FDA0003376312910000056
Incorporating rewards accruing as ith agent at time t
Figure FDA0003376312910000057
Step six, storing experience tuples:
step 601, adopting a computer to send
Figure FDA0003376312910000058
The experience tuple is taken as the experience tuple of the ith agent at the tth moment and is stored in a cache region;
step 602, repeating the third step to the step 601, obtaining the experience tuple of the next moment, and storing the experience tuple into the buffer area until T is T ═ TmaxWhen the data is stored, completing data storage of one round; wherein, TmaxRepresenting the total number of moments per round;
603, repeating the step 602, and storing the data of the next round until the number of the test tuples in the buffer area is B to obtain the training data of the first round; wherein B is greater than Tmax
Step seven, parameters of the MAPPO algorithm iterative optimization network model:
step 701, inputting first round training data, and performing gradient rise optimization on a parameter phi of an unmanned aerial vehicle Actor network and a parameter theta of a user Actor network by using a computer through a MAPPO algorithm to obtain a first round optimized value of the parameter phi of the unmanned aerial vehicle Actor network and a first round optimized value of the parameter theta of the user Actor network;
meanwhile, a computer is adopted to center the Critic network omega of the unmanned aerial vehicle by using MAPPO algorithm1Parameter of and user criticic network omega2The parameters are optimized by gradient descent to obtain the parameter omega of the Critic network of the unmanned aerial vehicle1First round of optimization values and parameters omega of the user Critic network2A first round of optimization values of;
step 702, obtaining next round of training data according to the method from the third step to the step 603;
step 703, inputting next round of training data, and according to the method in step 701, performing next round of optimization updating by using the previous round of optimized values as parameter initial values to obtain next round of optimized values of the parameter phi of the unmanned aerial vehicle Actor network, next round of optimized values of the parameter theta of the user Actor network, and parameter omega of the unmanned aerial vehicle Critic network1Next round of optimization and parameter omega of user Critic network2The next round of optimization values;
step 704, according to the method from step three to step 603, completing the set maximum round ThThe P-th round training data is obtained through data storage; wherein, P is a positive integer;
705, inputting the P-th round of training data, and according to the method in the step 701, obtaining a P-th round of optimized parameter phi of the unmanned aerial vehicle Actor network, a P-th round of optimized parameter theta of the user Actor network, and a parameter omega of the unmanned aerial vehicle criticic network by using the previous round of optimized parameter as a parameter initial value1The P-th round optimization value and the parameter omega of the user Critic network2The last round optimized value of the P round;
step eight, optimizing and predicting the cooperative communication of multiple unmanned aerial vehicles and multiple users:
step 801, according to the P-th round optimization value of the parameter phi of the unmanned plane Actor network, the P-th round optimization value of the parameter theta of the user Actor network, and the parameter omega of the unmanned plane Critic network1The P-th round optimization value and the parameter omega of the user Critic network2The P-th best ofChanging the value to obtain an optimized network model;
and 802, acquiring the observation state of the nth user and the observation state of the mth unmanned aerial vehicle at the subsequent moment, and inputting the optimized network model to obtain the cooperative communication optimization action strategy of the mth unmanned aerial vehicle and the nth user at the subsequent moment.
2. The method for optimizing cooperative communication between multiple unmanned aerial vehicles and users based on MAPPO algorithm according to claim 1, wherein: step 401, using computer to input p in step 309 according to shannon channel capacitym,n(t) and bm,n(t), obtaining the theoretical communication speed c provided by the mth unmanned aerial vehicle for the nth user at the tth momentm,n(t), the specific process is as follows:
step 4011, using computer according to formula
Figure FDA0003376312910000061
Obtaining LoS link probability from the mth unmanned aerial vehicle to the nth user at the tth moment
Figure FDA0003376312910000062
Wherein a denotes a first constant relating to the environment, b denotes a second constant relating to the environment, dm,n(t) represents the linear distance from the mth unmanned aerial vehicle to the nth user at the tth moment;
step 4012, using computer according to formula
Figure FDA0003376312910000071
Obtaining the path loss from the mth unmanned aerial vehicle to the nth user at the tth moment under the LoS link
Figure FDA0003376312910000072
Wherein ξLoSRepresents the added loss under the LoS link, c represents the speed of light, fcRepresents a signal carrier frequency;
step 4013, adopting computer to calculate according to formula
Figure FDA0003376312910000073
Obtaining the path loss from the mth unmanned aerial vehicle to the nth user at the tth moment under the NLoS link
Figure FDA0003376312910000074
Wherein ξNLoSRepresenting the additional loss under the NLoS link;
step 4014, using computer according to formula
Figure FDA0003376312910000075
Obtaining the path loss PL from the mth unmanned aerial vehicle to the nth user signalm,n(t); wherein,
Figure FDA0003376312910000076
the probability of NLoS link from the mth unmanned aerial vehicle to the nth user at the tth moment is represented, and
Figure FDA0003376312910000077
step 4015, using computer according to formula
Figure FDA0003376312910000078
Obtaining the signal power of the nth user signal at the tth moment for receiving the mth unmanned aerial vehicle
Figure FDA0003376312910000079
Step 4016, using computer according to formula
Figure FDA00033763129100000710
Obtaining the theoretical communication speed c provided for the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t); wherein n is0Representing the power spectral density of gaussian white noise in the channel.
3. The method for optimizing cooperative communication between multiple unmanned aerial vehicles and users based on MAPPO algorithm according to claim 2, wherein: in the step 4011, a is more than 4.88 and less than 28, and b is more than 0 and less than 1;
additional loss xi under NLoS link in step 4012 and step 4013NLoSAdditional loss xi greater than in LoS linkLoSAdditional loss xi under LoS linkLoSThe value range of (0dB,50dB), additional loss xi under NLoS linkNLoSThe value range of (10dB,100 dB);
the user reward factor r in step 504cThe value range of (1) to (3);
reward factor r of the drone in step 503dHas a value range of 1 to 5, and rdGreater than rc(ii) a Index parameter kapparThe value range of (1) is a positive integer of 1-5.
4. The method for optimizing cooperative communication between multiple unmanned aerial vehicles and users based on MAPPO algorithm according to claim 1, wherein: boundary penalty item of mth unmanned aerial vehicle at tth moment in step 503
Figure FDA0003376312910000081
The specific process of obtaining is as follows:
step 5031, setting the upper bound of the ground area D on the X axis as umax,xThe upper bound of the ground area D on the Y axis is umax,yThe lower bound of the ground area D on the X-axis is umin,xThe lower bound of the ground area D on the Y axis is umin,y(ii) a And u ismin,x=umin,y=0;
Step 5032, adopting a computer to determine the position of the mth unmanned aerial vehicle at the tth moment
Figure FDA0003376312910000082
Obtaining the X coordinate of the mth unmanned aerial vehicle at the tth moment
Figure FDA0003376312910000083
And the Y coordinate of the mth unmanned aerial vehicle at the tth moment
Figure FDA0003376312910000084
Step 5033, when
Figure FDA0003376312910000085
Greater than umax,xOr
Figure FDA0003376312910000086
Less than umin,xAccording to the computer
Figure FDA0003376312910000087
Obtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
Figure FDA0003376312910000088
Wherein r isbDenotes a penalty factor, κbRepresenting gradient factors for determining the smoothness of the boundary function, and a penalty factor rbHas a value range of 10 to 50 and a gradient factor kappab0.07 to 0.1;
when in use
Figure FDA0003376312910000089
Greater than umax,yOr
Figure FDA00033763129100000810
Less than umin,yAccording to the computer
Figure FDA00033763129100000811
Obtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
Figure FDA00033763129100000812
When in use
Figure FDA00033763129100000813
Greater than umax,xAnd is
Figure FDA00033763129100000814
Greater than umax,yOr
Figure FDA00033763129100000815
Less than umin,xAnd is
Figure FDA00033763129100000816
Less than umin,yAccording to the computer
Figure FDA00033763129100000817
Obtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
Figure FDA00033763129100000818
When in use
Figure FDA00033763129100000819
And
Figure FDA00033763129100000820
are all located in the ground area D,
Figure FDA00033763129100000821
5. the method for optimizing cooperative communication between multiple unmanned aerial vehicles and users based on MAPPO algorithm according to claim 1, wherein: in the step 301, the value range of w is 3-20;
alpha in step 306φAnd betaφThe following are satisfied: alpha is alphaφ≥1,βφ≥1。
6. The method for optimizing cooperative communication between multiple unmanned aerial vehicles and users based on MAPPO algorithm according to claim 1, wherein: maximum round T set in step 704hThe value range of (1) is 5000-6000;
total number of wheels
Figure FDA0003376312910000091
CN202110806485.3A 2021-07-16 2021-07-16 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm Active CN113359480B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110806485.3A CN113359480B (en) 2021-07-16 2021-07-16 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110806485.3A CN113359480B (en) 2021-07-16 2021-07-16 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm

Publications (2)

Publication Number Publication Date
CN113359480A CN113359480A (en) 2021-09-07
CN113359480B true CN113359480B (en) 2022-02-01

Family

ID=77539837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110806485.3A Active CN113359480B (en) 2021-07-16 2021-07-16 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm

Country Status (1)

Country Link
CN (1) CN113359480B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114337785A (en) * 2021-12-30 2022-04-12 陕西锐远信息科技有限公司 Solar unmanned aerial vehicle communication energy management strategy, system, terminal and storage medium
CN114363340B (en) * 2022-01-12 2023-12-26 东南大学 Unmanned aerial vehicle cluster failure control method, system and storage medium
CN114895710A (en) * 2022-05-31 2022-08-12 中国人民解放军陆军工程大学 Control method and system for autonomous behavior of unmanned aerial vehicle cluster
CN114915998B (en) * 2022-05-31 2023-05-05 电子科技大学 Channel capacity calculation method for unmanned aerial vehicle auxiliary ad hoc network communication system
CN115484205B (en) * 2022-07-12 2023-12-01 北京邮电大学 Deterministic network routing and queue scheduling method and device
CN115494732B (en) * 2022-09-29 2024-04-12 湖南大学 Unmanned aerial vehicle track design and power distribution method based on near-end strategy optimization
CN118113482B (en) * 2024-04-26 2024-08-13 北京科技大学 Safe calculation unloading method and system for intelligent eavesdropper

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110404264A (en) * 2019-07-25 2019-11-05 哈尔滨工业大学(深圳) It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN111880563A (en) * 2020-07-17 2020-11-03 西北工业大学 Multi-unmanned aerial vehicle task decision method based on MADDPG
WO2021033486A1 (en) * 2019-08-22 2021-02-25 オムロン株式会社 Model generation device, model generation method, control device, and control method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3725471A1 (en) * 2019-04-16 2020-10-21 Robert Bosch GmbH Configuring a system which interacts with an environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110404264A (en) * 2019-07-25 2019-11-05 哈尔滨工业大学(深圳) It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
WO2021033486A1 (en) * 2019-08-22 2021-02-25 オムロン株式会社 Model generation device, model generation method, control device, and control method
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN111880563A (en) * 2020-07-17 2020-11-03 西北工业大学 Multi-unmanned aerial vehicle task decision method based on MADDPG

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Efficient Deployment With Geometric Analysis for mmWave UAV Communications;Jianwei Zhao et. al.;《IEEE WIRELESS COMMUNICATIONS LETTERS》;20200731;第9卷(第7期);第1115-1119页 *
一种基于公平性的无人机基站通信智能资源调度方法;吴官翰 等;《中兴通讯技术》;20210430;第27卷(第2期);第31-36页 *
无人机骨干网分布式组网及接入选择算法;吴炜钰 等;《计算机学报》;20190228;第42卷(第2期);第121-137页 *

Also Published As

Publication number Publication date
CN113359480A (en) 2021-09-07

Similar Documents

Publication Publication Date Title
CN113359480B (en) Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
CN111786713B (en) Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN108419286B (en) 5G unmanned aerial vehicle communication combined beam and power distribution method
CN111193536A (en) Multi-unmanned aerial vehicle base station track optimization and power distribution method
CN114169234B (en) Scheduling optimization method and system for unmanned aerial vehicle auxiliary mobile edge calculation
CN111970709B (en) Unmanned aerial vehicle relay deployment method and system based on particle swarm optimization algorithm
CN113660681B (en) Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission
CN115278729B (en) Unmanned plane cooperation data collection and data unloading method in ocean Internet of things
CN115499921A (en) Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network
CN112203289A (en) Aerial base station network deployment method for area coverage of cluster unmanned aerial vehicle
Hajiakhondi-Meybodi et al. Joint transmission scheme and coded content placement in cluster-centric UAV-aided cellular networks
CN113115344A (en) Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization
CN117499867A (en) Method for realizing high-energy-efficiency calculation and unloading through strategy gradient algorithm in multi-unmanned plane auxiliary movement edge calculation
CN113919483A (en) Method and system for constructing and positioning radio map in wireless communication network
CN112702713B (en) Low-altitude unmanned-machine communication deployment method under multi-constraint condition
CN116684852B (en) Mountain land metallocene forest environment unmanned aerial vehicle communication resource and hovering position planning method
CN116249202A (en) Combined positioning and computing support method for Internet of things equipment
CN116321181A (en) Online track and resource optimization method for multi-unmanned aerial vehicle auxiliary edge calculation
CN115765826A (en) Unmanned aerial vehicle network topology reconstruction method for on-demand service
CN114980205A (en) QoE (quality of experience) maximization method and device for multi-antenna unmanned aerial vehicle video transmission system
Li et al. Resource optimization for multi uav formation communication based on dqsenet
CN117750505A (en) Space-earth integrated slice network resource allocation method
CN117858105B (en) Multi-unmanned aerial vehicle cooperation set dividing and deploying method in complex electromagnetic environment
CN118764879A (en) RIS-assisted multi-unmanned aerial vehicle high-energy-efficiency fair communication coverage method
CN117479237A (en) Data caching and distributing method and system based on three-dimensional matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant