CN113359480B - Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm - Google Patents
Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm Download PDFInfo
- Publication number
- CN113359480B CN113359480B CN202110806485.3A CN202110806485A CN113359480B CN 113359480 B CN113359480 B CN 113359480B CN 202110806485 A CN202110806485 A CN 202110806485A CN 113359480 B CN113359480 B CN 113359480B
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- user
- mth
- tth moment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000005457 optimization Methods 0.000 title claims abstract description 71
- 238000004891 communication Methods 0.000 title claims abstract description 64
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000009471 action Effects 0.000 claims abstract description 47
- 239000003795 chemical substances by application Substances 0.000 claims description 26
- 238000012549 training Methods 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 7
- 238000013500 data storage Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 5
- 230000003595 spectral effect Effects 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000013468 resource allocation Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a multi-unmanned aerial vehicle and user cooperative communication optimization method based on a MAPPO algorithm, which comprises the following steps: firstly, establishing an unmanned aerial vehicle network model and a user network model; setting an unmanned aerial vehicle and a user scene; thirdly, acquiring the observation states of the unmanned aerial vehicle and the user; acquiring global states of the unmanned aerial vehicle and the user; fifthly, obtaining rewards of the unmanned aerial vehicle and the user; sixthly, storing experience tuples; seventhly, iteratively optimizing parameters of the network model by using the MAPPO algorithm; and eighthly, optimizing and predicting the communication between multiple unmanned aerial vehicles and multiple users. According to the invention, through the optimization of the parameters of the unmanned aerial vehicle and the user network model, the optimization of the flight azimuth angle, power and bandwidth distribution of the unmanned aerial vehicle is realized, the observation states of a plurality of unmanned aerial vehicles and a plurality of users are effectively adapted to predict and output a reasonable cooperative communication optimization strategy, the throughput of a communication system is maximized under the action of a multidimensional decision, and the fairness of resource distribution is satisfied.
Description
Technical Field
The invention belongs to the technical field of unmanned aerial vehicles and user communication, and particularly relates to a multi-unmanned aerial vehicle and user cooperative communication optimization method based on a MAPPO algorithm.
Background
In the current 5G mobile communication, the ground backbone network bears huge data transmission pressure with the rapid development of various emerging industries. While being limited by geographical conditions, many remote areas are still in a state of insufficient wireless coverage. These unprecedented demands for high-quality wireless communication services present significant challenges to current traditional terrestrial communication networks. For this reason, Unmanned Aerial Vehicles (UAVs) as air access nodes assist ground communication in future 6G and beyond wireless communication become a promising solution.
Unmanned aerial vehicle has stronger flexibility and degree of freedom as flying base station, can stride across multiple topography and provide wireless coverage for the User, can unload the calculation load that part Ground spilled over on the one hand, alleviates Ground base station and calculates transmission pressure, and on the other hand can adjust Ground coverage and region in a flexible way to the Ground User (GU) of corresponding random motion. Meanwhile, due to the good line-of-sight characteristic of the air-ground link of the unmanned aerial vehicle, the probability of non-line-of-sight shielding and shadow effects is greatly reduced, unnecessary path loss is reduced to a certain extent, and the working time of the unmanned aerial vehicle is prolonged under the conditions of limited energy of the unmanned aerial vehicle and equal Quality of Service (QoS) provided by the unmanned aerial vehicle.
Existing unmanned aerial vehicles are mainly subjected to trajectory optimization under fixed communication resource allocation or single communication resource allocation. The optimization goals are limited to drone or ground access control only and are not studied from multiple drones and multiple user planes.
Therefore, a method for optimizing cooperative communication between multiple unmanned aerial vehicles and users based on an MAPPO algorithm is absent at present, optimization of flight azimuth, power and bandwidth distribution of the unmanned aerial vehicles is realized through optimization of parameters of unmanned aerial vehicles and user network models, observation states of the multiple unmanned aerial vehicles and multiple users are effectively adapted to predict and output a reasonable cooperative communication optimization strategy, throughput of a communication system is maximized under multi-dimensional decision-making action, and fairness of resource distribution is met.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a cooperative communication optimization method of multiple unmanned aerial vehicles and users based on MAPPO algorithm, which is simple in steps and reasonable in design, realizes the optimization of flight azimuth angle, power and bandwidth distribution of the unmanned aerial vehicles through the optimization of the unmanned aerial vehicles and user network model parameters, effectively adapts to the observation states of multiple unmanned aerial vehicles and multiple users to predict and output a reasonable cooperative communication optimization strategy, and realizes the maximization of the throughput of a communication system under the action of multidimensional decision and meets the fairness of resource distribution.
In order to solve the technical problems, the invention adopts the technical scheme that: a multi-unmanned aerial vehicle and user cooperative communication optimization method based on a MAPPO algorithm is characterized by comprising the following steps:
step one, establishing an unmanned aerial vehicle network model and a user network model:
step 101, setting parameters of an Actor network of the unmanned aerial vehicle as phi and parameters of a Critic network of the unmanned aerial vehicle as omega1The parameter of the user Actor network is theta, and the parameter of the user Critic network is omega2;
Step 102, setting an initial value of a parameter phi of an unmanned aerial vehicle Actor network to be phi (0), and setting a parameter omega of a Critic network of the unmanned aerial vehicle to be omega1Has an initial value of ω1(0) The initial value of the parameter theta of the user Actor network is theta (0), and the parameter omega of the user Critic network is omega2Has an initial value of ω2(0) (ii) a Wherein phi (0) and omega1(0) θ (0) and ω2(0) The orthogonal initialization of the neural network is met;
step two, setting unmanned aerial vehicles and user scenes:
step 201, establishing a two-dimensional rectangular coordinate system OXY; wherein, the two-dimensional rectangular coordinate system is superposed with the ground area D;
step 202, setting N users in the ground area D, wherein the set of users is Wherein, the position coordinate of the nth user at the tth moment isN and N are positive integers, N is more than or equal to 1 and less than or equal to N, the ground area D is positioned in the first quadrant of OXY, the origin O is coincident with the lower left corner of the ground area D, and t is a positive integer;
step 203, setting M unmanned aerial vehicles above the ground area D, wherein the unmanned aerial vehicles are integrated intoAnd isThe deployment heights of the M unmanned aerial vehicles relative to the ground area D are all h;
step three, acquiring the observation states of the unmanned aerial vehicle and the user:
step 301, setting the observation state of the nth user at the tth moment asAnd isWherein,indicating the coordinate position of the nth user at the time instant t,the two-dimensional coordinate position of the mth unmanned aerial vehicle which can be accessed by the nth user at the tth moment under the OXY is represented, M and M are positive integers, and M is more than or equal to 1 and less than or equal to M; sm(t-j) represents the number of users served by the mth drone at the jth moment before the tth moment, j is a positive integer, and j is 1, …, w; w is a positive integer, and w is less than t;
step 302, the observation state of the nth user at the tth momentIn a user Actor network with an input initial value theta (0), the user Actor network outputs a preactivation component x of the mth unmanned aerial vehiclem(θ(0));
Step 303, using a computer toObtaining discrete probability distribution of action of the nth user selecting the mth unmanned aerial vehicle at the tth momentWherein exp (·) represents an exponential function with a natural constant e as the base,representing the action of selecting the unmanned aerial vehicle by the nth user at the tth moment;
step 304, the nth user at the tth moment according to the discrete probability distributionSampling actionAnd selecting corresponding unmanned aerial vehicle for access, and acquiring action of selecting unmanned aerial vehicle by nth user at the tth momentProbability of (2)
305, setting the observation state of the mth unmanned aerial vehicle at the tth moment to be the observation state of the mth unmanned aerial vehicle by adopting a computer according to the user selection and the state of the unmanned aerial vehicleAnd isWherein,the two-dimensional coordinate position of the mth unmanned aerial vehicle under the OXY at the tth moment is shown,indicating the coordinate positions of other unmanned aerial vehicles under OXY after the mth unmanned aerial vehicle is removed at the tth moment, wherein m 'is a positive integer, m' ≠ m, andσm,n(t) represents the status of the nth user accessing the mth drone;
step 306, adopting a computer to observe the observation state of the mth unmanned aerial vehicle at the tth momentIn an unmanned aerial vehicle Actor network with an input initial value of phi (0), the unmanned aerial vehicle Actor network outputs the observation state of the mth unmanned aerial vehicle at the tth momentAction of mth unmanned aerial vehicle at the next tth momentProbability distribution ofWherein,obeying a beta distribution, i.e.αφAnd betaφAre all shape parameters of beta distribution;the action of the mth unmanned aerial vehicle at the tth moment is shown;
according toSampling actionObtaining the transmitting power output value of the mth unmanned aerial vehicle to the nth user at the tth momentBandwidth output value of mth unmanned aerial vehicle to nth user at tth momentAnd the flight azimuth angle of the mth unmanned aerial vehicle at the tth momentAnd the motion of the mth unmanned aerial vehicle at the tth momentProbability of (2)
Step 307, setting by computerAs an action mask of the mth unmanned aerial vehicle at the tth moment, a computer command is adoptedAndwherein,indicating that the mth unmanned plane masks the nth user with the power value at the tth moment,indicating that the mth unmanned aerial vehicle masks the nth user with the bandwidth value at the tth moment;
step 308, using a computer toObtaining the action component p of the transmitting power distributed to the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t);
By computer according toObtaining a bandwidth resource action component b distributed to the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t); wherein, bm(t) represents the bandwidth resources which can be allocated by the mth unmanned aerial vehicle at the tth moment, andBtotalrepresenting the total bandwidth resource, s, shared by all UAVsm(t) represents the total number of users accessing the mth drone, bminRepresenting a minimum separable bandwidth;
step 309, obtaining the motion of the mth unmanned aerial vehicle at the tth moment by using a computerAnd isWherein,representing the flight azimuth angle of the mth unmanned aerial vehicle at the tth moment;
step 30A, the observation state of the nth user at the tth moment isAnd the observation state of the mth unmanned aerial vehicle at the tth moment isMerging the observed states recorded as the ith agent at the tth momentWherein, the agent includes M unmanned aerial vehicle and N users, and i is positive integer, and
act of selecting unmanned aerial vehicle by nth user at tth momentAnd the action of the mth unmanned aerial vehicle at the tth momentMerging actions written as ith agent at tth moment
Act of selecting unmanned aerial vehicle by nth user at tth momentProbability of (2)And the action of the mth unmanned aerial vehicle at the tth momentProbability of (2)Merging action probabilities written as ith agent
Step four, acquiring global states of the unmanned aerial vehicle and the user:
step 401, inputting p in step 309 according to shannon channel capacity by using computerm,n(t) and bm,n(t), obtaining the theoretical communication speed c provided by the mth unmanned aerial vehicle for the nth user at the tth momentm,n(t);
Step 402, using a computer according toObtaining the communication speed of the nth user at the t moment
Step 404, setting global state of nth user at tth moment as Wherein,indicating the coordinate positions of other users under OXY after the nth user is removed at the tth moment, n 'is a positive integer, n' ≠ n, and
step 405, carrying out global state of the mth unmanned aerial vehicle at the tth momentAnd global state of nth user at tth momentMerging global states recorded as ith agent at tth momentWherein i is a positive integer, and
step five, obtaining the rewards of the unmanned aerial vehicle and the user:
step 501, adopt the computer to according toObtaining the average communication speed c of N users at the t momentmean(t);
Step 502, using a computer according toObtaining the fairness index f of the mth unmanned aerial vehicle at the tth momentm(t);
Step 503, using computer toObtain reward of mth unmanned aerial vehicle at tth momentWherein r isdDenotes the reward factor, κ, of the dronerIs fm(t) an index parameter of (t),the boundary penalty item of the mth unmanned aerial vehicle at the tth moment is represented;
step 504, using a computer toReceive the reward of the nth user at the tth momentWherein r iscA reward factor representing a user;
step 505, reward of nth user at tth moment by computerAnd reward of mth unmanned aerial vehicle at tth momentIncorporating rewards accruing as ith agent at time t
Step six, storing experience tuples:
step 601, adopting a computer to sendThe experience tuple is taken as the experience tuple of the ith agent at the tth moment and is stored in a cache region;
step 602, repeating the third step to the step 601, obtaining the experience tuple of the next moment, and storing the experience tuple into the buffer area until T is T ═ TmaxWhen the data is stored, completing data storage of one round; wherein, TmaxRepresenting the total number of moments per round;
603, repeating the step 602, and storing the data of the next round until the number of the test tuples in the buffer area is B to obtain the training data of the first round; wherein B is greater than Tmax;
Step seven, parameters of the MAPPO algorithm iterative optimization network model:
step 701, inputting first round training data, and performing gradient rise optimization on a parameter phi of an unmanned aerial vehicle Actor network and a parameter theta of a user Actor network by using a computer through a MAPPO algorithm to obtain a first round optimized value of the parameter phi of the unmanned aerial vehicle Actor network and a first round optimized value of the parameter theta of the user Actor network;
meanwhile, a computer is adopted to center the Critic network omega of the unmanned aerial vehicle by using MAPPO algorithm1Parameter of and user criticic network omega2The parameters are optimized by gradient descent to obtain the parameter omega of the Critic network of the unmanned aerial vehicle1First round of optimization values and parameters omega of the user Critic network2A first round of optimization values of;
step 702, obtaining next round of training data according to the method from the third step to the step 603;
step 703, inputting next round of training data, and according to the method in step 701, performing next round of optimization updating by using the previous round of optimized values as parameter initial values to obtain next round of optimized values of the parameter phi of the unmanned aerial vehicle Actor network, next round of optimized values of the parameter theta of the user Actor network, and parameter omega of the unmanned aerial vehicle Critic network1Next round of optimization and parameter omega of user Critic network2The next round of optimization values;
step 704, according to the method from step three to step 603, completing the set maximum round ThThe P-th round training data is obtained through data storage; wherein, P is a positive integer;
705, inputting the P-th round of training data, and according to the method in the step 701, obtaining a P-th round of optimized parameter phi of the unmanned aerial vehicle Actor network, a P-th round of optimized parameter theta of the user Actor network, and a parameter omega of the unmanned aerial vehicle criticic network by using the previous round of optimized parameter as a parameter initial value1The P-th round optimization value and the parameter omega of the user Critic network2The last round optimized value of the P round;
step eight, optimizing and predicting the cooperative communication of multiple unmanned aerial vehicles and multiple users:
step 801, optimizing the P-th wheel according to the parameter phi of the Actor network of the unmanned aerial vehicleThe value, the P-th round optimization value of the parameter theta of the user Actor network, and the parameter omega of the unmanned aerial vehicle Critic network1The P-th round optimization value and the parameter omega of the user Critic network2Obtaining an optimized network model according to the P-th round optimization value;
and 802, acquiring the observation state of the nth user and the observation state of the mth unmanned aerial vehicle at the subsequent moment, and inputting the optimized network model to obtain the cooperative communication optimization action strategy of the mth unmanned aerial vehicle and the nth user at the subsequent moment.
The MAPPO algorithm-based multi-unmanned aerial vehicle and user cooperative communication optimization method is characterized by comprising the following steps: step 401, using computer to input p in step 309 according to shannon channel capacitym,n(t) and bm,n(t), obtaining the theoretical communication speed c provided by the mth unmanned aerial vehicle for the nth user at the tth momentm,n(t), the specific process is as follows:
step 4011, using computer according to formulaObtaining LoS link probability from the mth unmanned aerial vehicle to the nth user at the tth momentWherein a denotes a first constant relating to the environment, b denotes a second constant relating to the environment, dm,n(t) represents the linear distance from the mth unmanned aerial vehicle to the nth user at the tth moment;
step 4012, using computer according to formulaObtaining the path loss from the mth unmanned aerial vehicle to the nth user at the tth moment under the LoS linkWherein ξLoSRepresents the added loss under the LoS link, c represents the speed of light, fcRepresents a signal carrier frequency;
step 4013, adopting computer to calculate according to formulaObtaining the path loss from the mth unmanned aerial vehicle to the nth user at the tth moment under the NLoS linkWherein ξNLoSRepresenting the additional loss under the NLoS link;
step 4014, using computer according to formulaObtaining the path loss PL from the mth unmanned aerial vehicle to the nth user signalm,n(t); wherein,the probability of NLoS link from the mth unmanned aerial vehicle to the nth user at the tth moment is represented, and
step 4015, using computer according to formulaObtaining the signal power of the nth user signal at the tth moment for receiving the mth unmanned aerial vehicle
Step 4016, using computer according to formulaObtaining the theoretical communication speed c provided for the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t); wherein n is0Representing the power spectral density of gaussian white noise in the channel.
The MAPPO algorithm-based multi-unmanned aerial vehicle and user cooperative communication optimization method is characterized by comprising the following steps: in the step 4011, a is more than 4.88 and less than 28, and b is more than 0 and less than 1;
additional loss xi under NLoS link in step 4012 and step 4013NLoSAdditional loss xi greater than in LoS linkLoSAdditional loss xi under LoS linkLoSThe value range of (0dB,50dB), additional loss xi under NLoS linkNLoSThe value range of (10dB,100 dB);
the user reward factor r in step 504cThe value range of (1) to (3);
reward factor r of the drone in step 503dHas a value range of 1 to 5, and rdGreater than rc(ii) a Index parameter kapparThe value range of (1) is a positive integer of 1-5.
The MAPPO algorithm-based multi-unmanned aerial vehicle and user cooperative communication optimization method is characterized by comprising the following steps: boundary penalty item of mth unmanned aerial vehicle at tth moment in step 503The specific process of obtaining is as follows:
step 5031, setting the upper bound of the ground area D on the X axis as umax,xThe upper bound of the ground area D on the Y axis is umax,yThe lower bound of the ground area D on the X-axis is umin,xThe lower bound of the ground area D on the Y axis is umin,y(ii) a And u ismin,x=umin,y=0;
Step 5032, adopting a computer to determine the position of the mth unmanned aerial vehicle at the tth momentObtaining the X coordinate of the mth unmanned aerial vehicle at the tth momentAnd the Y coordinate of the mth unmanned aerial vehicle at the tth moment
Step 5033, whenGreater than umax,xOrLess than umin,xAccording to the computerObtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth momentWherein r isbDenotes a penalty factor, κbRepresenting gradient factors for determining the smoothness of the boundary function, and a penalty factor rbHas a value range of 10 to 50 and a gradient factor kappab0.07 to 0.1;
when in useGreater than umax,yOrLess than umin,yAccording to the computerObtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
When in useGreater than umax,xAnd isGreater than umax,yOrLess than umin,xAnd isLess than umin,yAccording to the computerObtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
the MAPPO algorithm-based multi-unmanned aerial vehicle and user cooperative communication optimization method is characterized by comprising the following steps: in the step 301, the value range of w is 3-20;
alpha in step 306φAnd betaφThe following are satisfied: alpha is alphaφ≥1,βφ≥1。
The MAPPO algorithm-based multi-unmanned aerial vehicle and user cooperative communication optimization method is characterized by comprising the following steps: maximum round T set in step 704hThe value range of (1) is 5000-6000;
Compared with the prior art, the invention has the following advantages:
1. the method has simple steps and reasonable design, is suitable for the games of a plurality of unmanned aerial vehicles and a plurality of users, realizes the prediction of the cooperative communication optimization strategy, maximizes the throughput of a communication system under the action of multidimensional decision and meets the fairness of resource allocation.
2. The method comprises the steps of firstly establishing an unmanned aerial vehicle network model and a user network model, then obtaining training data through unmanned aerial vehicle and user scene setting, unmanned aerial vehicle and user observation state obtaining, unmanned aerial vehicle and user global state obtaining, unmanned aerial vehicle and user reward obtaining and experience tuple storing, and training the training data through MAPPO algorithm to realize updating and optimization of parameters of the network model to obtain an optimized network model; and finally, inputting the observation state of the user and the observation state of the unmanned aerial vehicle at the subsequent moment into the optimized network model so as to obtain the cooperative communication optimization strategy of the unmanned aerial vehicle and the user.
3. According to the invention, parameters of an unmanned aerial vehicle Actor network, parameters of a user Actor network, parameters of an unmanned aerial vehicle criticic network and parameters of the user criticic network are trained and iterated by using a MAPP algorithm, so that all users can acquire communication rate by themselves through greedy maximization of a competition strategy, each unmanned aerial vehicle intelligently allocates power and bandwidth resources for users who select to access the unmanned aerial vehicle, dynamically decides flight azimuth angles of the unmanned aerial vehicle, and forms a most appropriate space topological structure under the current environment through cooperation with other unmanned aerial vehicles.
4. The invention performs joint optimization on the access strategy of users, the power distributed by the unmanned aerial vehicles, the bandwidth resource scheduling distributed by the unmanned aerial vehicles and the flight azimuth angle of the unmanned aerial vehicles, and all the unmanned aerial vehicles share the total bandwidth resource, thereby maximizing the system throughput through dynamic resource scheduling and simultaneously ensuring the fairness of the communication rate among the users under the condition of meeting the constraint condition of the minimum communication rate of each user.
5. The invention adopts MAPP (Multi-Agent public Policy Optimization) algorithm to solve the problem of coexistence of discrete and continuous actions of various types of agents. Different from the previous method for centrally deciding the multidimensional action of the unmanned aerial vehicle cluster, the MAPPO algorithm considers partial observability under the real condition, so that each agent only depends on self-observation distributed decision. The defects that the dimensionality is too high and cannot be expanded and the like caused by a centralized decision-making mode when a single-agent reinforcement learning algorithm is used for processing the problem of multiple agents are overcome.
6. Aiming at the practical problem that different unmanned aerial vehicles can be selectively accessed by different numbers of users, the unmanned aerial vehicle resource allocation strategy dimensionality is dynamically adjusted by setting the action mask, and the user information which is not selectively accessed is shielded by the action mask, namely, the unmanned aerial vehicle only needs to allocate resources for the user which is selectively accessed.
7. Aiming at the fact that the flight azimuth angle of the unmanned aerial vehicle is bounded when the flight azimuth angle of the unmanned aerial vehicle is optimized, the parameterized beta strategy is adopted to replace the traditional Gaussian strategy, the problem of biased estimation of the Gaussian strategy under the condition that the action of the unmanned aerial vehicle is bounded can be solved, and the phenomenon that the unmanned aerial vehicle converges to local optimum under the multi-peak reward environment is improved.
8. The invention not only carries out strategy allocation on power, but also carries out strategy allocation on bandwidth, thereby improving the flexibility and latitude of allocation.
In conclusion, the method provided by the invention has the advantages of simple steps and reasonable design, realizes the optimization of the flight azimuth angle, the power and the bandwidth distribution of the unmanned aerial vehicle through the optimization of the unmanned aerial vehicle and the user network model parameters, effectively adapts to the observation states of a plurality of unmanned aerial vehicles and a plurality of users to predict and output a reasonable cooperative communication optimization strategy, and realizes the maximization of the throughput of a communication system under the action of a multidimensional decision and meets the fairness of resource distribution.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a block diagram of the process flow of the present invention.
Detailed Description
As shown in fig. 1, a method for optimizing cooperative communication between multiple drones and users based on MAPPO algorithm includes the following steps:
step one, establishing an unmanned aerial vehicle network model and a user network model:
step 101, setting parameters of an Actor network of the unmanned aerial vehicle as phi and parameters of a Critic network of the unmanned aerial vehicle as omega1The parameter of the user Actor network is theta, and the parameter of the user Critic network is omega2;
Step 102, setting an initial value of a parameter phi of an unmanned aerial vehicle Actor network to be phi (0), and setting a parameter omega of a Critic network of the unmanned aerial vehicle to be omega1Of (2) is initiatedValue of omega1(0) The initial value of the parameter theta of the user Actor network is theta (0), and the parameter omega of the user Critic network is omega2Has an initial value of ω2(0) (ii) a Wherein phi (0) and omega1(0) θ (0) and ω2(0) The orthogonal initialization of the neural network is met;
step two, setting unmanned aerial vehicles and user scenes:
step 201, establishing a two-dimensional rectangular coordinate system OXY; wherein, the two-dimensional rectangular coordinate system is superposed with the ground area D;
step 202, setting N users in the ground area D, wherein the set of users is Wherein, the position coordinate of the nth user at the tth moment isN and N are positive integers, N is more than or equal to 1 and less than or equal to N, the ground area D is positioned in the first quadrant of OXY, the origin O is coincident with the lower left corner of the ground area D, and t is a positive integer;
step 203, setting M unmanned aerial vehicles above the ground area D, wherein the unmanned aerial vehicles are integrated intoAnd isThe deployment heights of the M unmanned aerial vehicles relative to the ground area D are all h;
step three, acquiring the observation states of the unmanned aerial vehicle and the user:
step 301, setting the observation state of the nth user at the tth moment asAnd isWherein,indicating the coordinate position of the nth user at the time instant t,the two-dimensional coordinate position of the mth unmanned aerial vehicle which can be accessed by the nth user at the tth moment under the OXY is represented, M and M are positive integers, and M is more than or equal to 1 and less than or equal to M; sm(t-j) represents the number of users served by the mth drone at the jth moment before the tth moment, j is a positive integer, and j is 1, …, w; w is a positive integer, and w is less than t;
step 302, the observation state of the nth user at the tth momentIn a user Actor network with an input initial value theta (0), the user Actor network outputs a preactivation component x of the mth unmanned aerial vehiclem(θ(0));
Step 303, using a computer toObtaining discrete probability distribution of action of the nth user selecting the mth unmanned aerial vehicle at the tth momentWherein exp (·) represents an exponential function with a natural constant e as the base,representing the action of selecting the unmanned aerial vehicle by the nth user at the tth moment;
step 304, the nth user at the tth moment according to the discrete probability distributionSampling actionAnd selecting corresponding unmanned aerial vehicle for access, and acquiring action of selecting unmanned aerial vehicle by nth user at the tth momentProbability of (2)
305, setting the observation state of the mth unmanned aerial vehicle at the tth moment to be the observation state of the mth unmanned aerial vehicle by adopting a computer according to the user selection and the state of the unmanned aerial vehicleAnd isWherein,the two-dimensional coordinate position of the mth unmanned aerial vehicle under the OXY at the tth moment is shown,indicating the coordinate positions of other unmanned aerial vehicles under OXY after the mth unmanned aerial vehicle is removed at the tth moment, wherein m 'is a positive integer, m' ≠ m, andσm,n(t) represents the status of the nth user accessing the mth drone;
step 306, adopting a computer to observe the observation state of the mth unmanned aerial vehicle at the tth momentIn an unmanned aerial vehicle Actor network with an input initial value of phi (0), the unmanned aerial vehicle Actor network outputs the observation state of the mth unmanned aerial vehicle at the tth momentAction of mth unmanned aerial vehicle at the next tth momentProbability distribution ofWherein,obeying a beta distribution, i.e.αφAnd betaφAre all shape parameters of beta distribution;the action of the mth unmanned aerial vehicle at the tth moment is shown;
according toSampling actionObtaining the transmitting power output value of the mth unmanned aerial vehicle to the nth user at the tth momentBandwidth output value of mth unmanned aerial vehicle to nth user at tth momentAnd the flight azimuth angle of the mth unmanned aerial vehicle at the tth momentAnd the motion of the mth unmanned aerial vehicle at the tth momentProbability of (2)
Step 307, setting by computerAs an action mask of the mth unmanned aerial vehicle at the tth moment, a computer command is adoptedAndwherein,indicating that the mth unmanned plane masks the nth user with the power value at the tth moment,indicating that the mth unmanned aerial vehicle masks the nth user with the bandwidth value at the tth moment;
step 308, using a computer toObtaining the action component p of the transmitting power distributed to the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t);
By computer according toObtaining a bandwidth resource action component b distributed to the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t); wherein, bm(t) represents the bandwidth resources which can be allocated by the mth unmanned aerial vehicle at the tth moment, andBtotalrepresenting the total bandwidth resource, s, shared by all UAVsm(t) represents the total number of users accessing the mth drone, bminRepresenting a minimum separable bandwidth;
step 309, obtaining the motion of the mth unmanned aerial vehicle at the tth moment by using a computerAnd isWherein,representing the flight azimuth angle of the mth unmanned aerial vehicle at the tth moment;
step 30A, the observation state of the nth user at the tth moment isAnd the observation state of the mth unmanned aerial vehicle at the tth moment isMerging the observed states recorded as the ith agent at the tth momentWherein, the agent includes M unmanned aerial vehicle and N users, and i is positive integer, and
act of selecting unmanned aerial vehicle by nth user at tth momentAnd the action of the mth unmanned aerial vehicle at the tth momentMerging actions written as ith agent at tth moment
Act of selecting unmanned aerial vehicle by nth user at tth momentProbability of (2)And the action of the mth unmanned aerial vehicle at the tth momentProbability of (2)Merging action probabilities written as ith agent
Step four, acquiring global states of the unmanned aerial vehicle and the user:
step 401, inputting p in step 309 according to shannon channel capacity by using computerm,n(t) and bm,n(t), obtaining the theoretical communication speed c provided by the mth unmanned aerial vehicle for the nth user at the tth momentm,n(t);
Step 402, using a computer according toObtaining the communication speed of the nth user at the t moment
Step 404, setting global state of nth user at tth moment as Wherein,indicating the coordinate positions of other users under OXY after the nth user is removed at the tth moment, n 'is a positive integer, n' ≠ n, and
step 405, carrying out global state of the mth unmanned aerial vehicle at the tth momentAnd global state of nth user at tth momentMerging global states recorded as ith agent at tth momentWherein i is a positive integer, and
step five, obtaining the rewards of the unmanned aerial vehicle and the user:
step 501, adopt the computer to according toObtaining the average communication speed c of N users at the t momentmean(t);
Step 502, using a computer according toObtaining the fairness index f of the mth unmanned aerial vehicle at the tth momentm(t);
Step 503, using computer toTo obtain the firstReward of mth unmanned aerial vehicle at t momentsWherein r isdDenotes the reward factor, κ, of the dronerIs fm(t) an index parameter of (t),the boundary penalty item of the mth unmanned aerial vehicle at the tth moment is represented;
step 504, using a computer toReceive the reward of the nth user at the tth momentWherein r iscA reward factor representing a user;
step 505, reward of nth user at tth moment by computerAnd reward of mth unmanned aerial vehicle at tth momentIncorporating rewards accruing as ith agent at time t
Step six, storing experience tuples:
step 601, adopting a computer to sendThe experience tuple is taken as the experience tuple of the ith agent at the tth moment and is stored in a cache region;
step 602, repeating the third step to the step 601, obtaining the experience tuple of the next moment, and storing the experience tuple into the buffer area until T is T ═ TmaxWhen the data is stored, completing data storage of one round; wherein,Tmaxrepresenting the total number of moments per round;
603, repeating the step 602, and storing the data of the next round until the number of the test tuples in the buffer area is B to obtain the training data of the first round; wherein B is greater than Tmax;
Step seven, parameters of the MAPPO algorithm iterative optimization network model:
step 701, inputting first round training data, and performing gradient rise optimization on a parameter phi of an unmanned aerial vehicle Actor network and a parameter theta of a user Actor network by using a computer through a MAPPO algorithm to obtain a first round optimized value of the parameter phi of the unmanned aerial vehicle Actor network and a first round optimized value of the parameter theta of the user Actor network;
meanwhile, a computer is adopted to center the Critic network omega of the unmanned aerial vehicle by using MAPPO algorithm1Parameter of and user criticic network omega2The parameters are optimized by gradient descent to obtain the parameter omega of the Critic network of the unmanned aerial vehicle1First round of optimization values and parameters omega of the user Critic network2A first round of optimization values of;
step 702, obtaining next round of training data according to the method from the third step to the step 603;
step 703, inputting next round of training data, and according to the method in step 701, performing next round of optimization updating by using the previous round of optimized values as parameter initial values to obtain next round of optimized values of the parameter phi of the unmanned aerial vehicle Actor network, next round of optimized values of the parameter theta of the user Actor network, and parameter omega of the unmanned aerial vehicle Critic network1Next round of optimization and parameter omega of user Critic network2The next round of optimization values;
step 704, according to the method from step three to step 603, completing the set maximum round ThThe P-th round training data is obtained through data storage; wherein, P is a positive integer;
705, inputting the P-th round training data, and according to the method in the step 701, obtaining a P-th round optimized value of a parameter phi of an Actor network of the unmanned aerial vehicle, a P-th round optimized value of a parameter theta of the Actor network of the user, and a Crit of the unmanned aerial vehicle by using the previous round optimized value as a parameter initial valueParameter omega of ic network1The P-th round optimization value and the parameter omega of the user Critic network2The last round optimized value of the P round;
step eight, optimizing and predicting the cooperative communication of multiple unmanned aerial vehicles and multiple users:
step 801, according to the P-th round optimization value of the parameter phi of the unmanned plane Actor network, the P-th round optimization value of the parameter theta of the user Actor network, and the parameter omega of the unmanned plane Critic network1The P-th round optimization value and the parameter omega of the user Critic network2Obtaining an optimized network model according to the P-th round optimization value;
and 802, acquiring the observation state of the nth user and the observation state of the mth unmanned aerial vehicle at the subsequent moment, and inputting the optimized network model to obtain the cooperative communication optimization action strategy of the mth unmanned aerial vehicle and the nth user at the subsequent moment.
In this embodiment, step 401 is performed by inputting p in step 309 into a computer according to the shannon channel capacitym,n(t) and bm,n(t), obtaining the theoretical communication speed c provided by the mth unmanned aerial vehicle for the nth user at the tth momentm,n(t), the specific process is as follows:
step 4011, using computer according to formulaObtaining LoS link probability from the mth unmanned aerial vehicle to the nth user at the tth momentWherein a denotes a first constant relating to the environment, b denotes a second constant relating to the environment, dm,n(t) represents the linear distance from the mth unmanned aerial vehicle to the nth user at the tth moment;
step 4012, using computer according to formulaObtaining the path loss from the mth unmanned aerial vehicle to the nth user at the tth moment under the LoS linkWherein ξLoSRepresents the added loss under the LoS link, c represents the speed of light, fcRepresents a signal carrier frequency;
step 4013, adopting computer to calculate according to formulaObtaining the path loss from the mth unmanned aerial vehicle to the nth user at the tth moment under the NLoS linkWherein ξNLoSRepresenting the additional loss under the NLoS link;
step 4014, using computer according to formulaObtaining the path loss PL from the mth unmanned aerial vehicle to the nth user signalm,n(t); wherein,the probability of NLoS link from the mth unmanned aerial vehicle to the nth user at the tth moment is represented, and
step 4015, using computer according to formulaObtaining the signal power of the nth user signal at the tth moment for receiving the mth unmanned aerial vehicle
Step 4016, using computer according to formulaObtaining the theoretical communication speed c provided for the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t); wherein n is0Presentation letterPower spectral density of gaussian white noise in the tract.
In this embodiment, in step 4011, a is greater than 4.88 and less than 28, and b is greater than 0 and less than 1;
additional loss xi under NLoS link in step 4012 and step 4013NLoSAdditional loss xi greater than in LoS linkLoSAdditional loss xi under LoS linkLoSThe value range of (0dB,50dB), additional loss xi under NLoS linkNLoSThe value range of (10dB,100 dB);
the user reward factor r in step 504cThe value range of (1) to (3);
reward factor r of the drone in step 503dHas a value range of 1 to 5, and rdGreater than rc(ii) a Index parameter kapparThe value range of (1) is a positive integer of 1-5.
In this embodiment, in step 503, the boundary penalty term of the mth unmanned aerial vehicle at the tth momentThe specific process of obtaining is as follows:
step 5031, setting the upper bound of the ground area D on the X axis as umax,xThe upper bound of the ground area D on the Y axis is umax,yThe lower bound of the ground area D on the X-axis is umin,xThe lower bound of the ground area D on the Y axis is umin,y(ii) a And u ismin,x=umin,y=0;
Step 5032, adopting a computer to determine the position of the mth unmanned aerial vehicle at the tth momentObtaining the X coordinate of the mth unmanned aerial vehicle at the tth momentAnd the Y coordinate of the mth unmanned aerial vehicle at the tth moment
Step 5033, whenGreater than umax,xOrLess than umin,xAccording to the computerObtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth momentWherein r isbDenotes a penalty factor, κbRepresenting gradient factors for determining the smoothness of the boundary function, and a penalty factor rbHas a value range of 10 to 50 and a gradient factor kappab0.07 to 0.1;
when in useGreater than umax,yOrLess than umin,yAccording to the computerObtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
When in useGreater than umax,xAnd isGreater than umax,yOrLess than umin,xAnd isLess than umin,yAccording to the computerObtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
in this embodiment, the value range of w in step 301 is 3-20;
alpha in step 306φAnd betaφThe following are satisfied: alpha is alphaφ≥1,βφ≥1。
In this embodiment, the maximum round T set in step 704hThe value range of (1) is 5000-6000;
In this embodiment, the area D is a 2km × 2km square area, the deployment height h of M unmanned aerial vehicles with respect to the ground area D is 500M, when each round starts, all unmanned aerial vehicles take off from the origin, and the users are randomly distributed in the area D and move in random directions and at random speeds, Tmax=1000。
In this embodiment, the maximum round T is sethThe value of (A) is 5000, and the value of B is 2000-4000.
In this embodiment, the value range of w is 3.
In this embodiment, total transmission power P of transmission of each unmanned aerial vehicletotal10mw, total bandwidth resource B shared by all UAVstotal30MHz, signal carrier frequency fcPower spectral density n of white gaussian noise in a channel at 2GHz0=1×10-17mw/Hz, minimum separable bandwidth bmin=0.1MHz。
In this example, Tmax1000, each decision time interval is 1s, i.e. the interval between the first time t and the first time t +1 is 1 s.
In this embodiment, when σ is actually usedm,nWhen (t) is 1, it means that the nth user selects the mth drone as the access base station, and otherwise, it is 0.
In this embodiment, the user reward coefficient rcIs 1, the reward coefficient r of the unmanned aerial vehicledIs taken as 2, the exponential parameter krThe value of (a) is 5,
in this embodiment, the penalty term coefficient rbHas a value range of 20 and a gradient factor kappabIs 8 x 10-2。
In this example, αφ=βφ=1。
In conclusion, the method provided by the invention has the advantages of simple steps and reasonable design, realizes the optimization of the flight azimuth angle, the power and the bandwidth distribution of the unmanned aerial vehicle through the optimization of the unmanned aerial vehicle and the user network model parameters, effectively adapts to the observation states of a plurality of unmanned aerial vehicles and a plurality of users to predict and output a reasonable cooperative communication optimization strategy, and realizes the maximization of the throughput of a communication system under the action of a multidimensional decision and meets the fairness of resource distribution.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all simple modifications, changes and equivalent structural changes made to the above embodiment according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.
Claims (6)
1. A multi-unmanned aerial vehicle and user cooperative communication optimization method based on a MAPPO algorithm is characterized by comprising the following steps:
step one, establishing an unmanned aerial vehicle network model and a user network model:
step 101, setting parameters of an Actor network of the unmanned aerial vehicle as phi and parameters of a Critic network of the unmanned aerial vehicle as omega1The parameter of the user Actor network is theta, and the parameter of the user Critic network is omega2;
Step 102, setting an initial value of a parameter phi of an unmanned aerial vehicle Actor network to be phi (0), and setting a parameter omega of a Critic network of the unmanned aerial vehicle to be omega1Has an initial value of ω1(0) The initial value of the parameter theta of the user Actor network is theta (0), and the parameter omega of the user Critic network is omega2Has an initial value of ω2(0) (ii) a Wherein phi (0) and omega1(0) θ (0) and ω2(0) The orthogonal initialization of the neural network is met;
step two, setting unmanned aerial vehicles and user scenes:
step 201, establishing a two-dimensional rectangular coordinate system OXY; wherein, the two-dimensional rectangular coordinate system is superposed with the ground area D;
step 202, setting N users in the ground area D, wherein the set of users is Wherein, the position coordinate of the nth user at the tth moment isN and N are positive integers, N is more than or equal to 1 and less than or equal to N, the ground area D is positioned in the first quadrant of OXY, the origin O is coincident with the lower left corner of the ground area D, and t is a positive integer;
step 203, setting M unmanned aerial vehicles above the ground area D, wherein the unmanned aerial vehicles are integrated intoAnd isThe deployment heights of the M unmanned aerial vehicles relative to the ground area D are all h;
step three, acquiring the observation states of the unmanned aerial vehicle and the user:
step 301, setting the observation state of the nth user at the tth moment asAnd isWherein,indicating the coordinate position of the nth user at the time instant t,the two-dimensional coordinate position of the mth unmanned aerial vehicle which can be accessed by the nth user at the tth moment under the OXY is represented, M and M are positive integers, and M is more than or equal to 1 and less than or equal to M; sm(t-j) represents the number of users served by the mth drone at the jth moment before the tth moment, j is a positive integer, and j is 1, …, w; w is a positive integer, and w is less than t;
step 302, the observation state of the nth user at the tth momentIn a user Actor network with an input initial value theta (0), the user Actor network outputs a preactivation component x of the mth unmanned aerial vehiclem(θ(0));
Step 303, using a computer toObtaining discrete probability distribution of action of the nth user selecting the mth unmanned aerial vehicle at the tth momentWherein exp (·) represents an exponential function with a natural constant e as the base,representing the action of selecting the unmanned aerial vehicle by the nth user at the tth moment;
step 304, the nth user at the tth moment according to the discrete probability distributionSampling actionAnd selecting corresponding unmanned aerial vehicle for access, and acquiring action of selecting unmanned aerial vehicle by nth user at the tth momentProbability of (2)
305, setting the observation state of the mth unmanned aerial vehicle at the tth moment to be the observation state of the mth unmanned aerial vehicle by adopting a computer according to the user selection and the state of the unmanned aerial vehicleAnd isWherein,the two-dimensional coordinate position of the mth unmanned aerial vehicle under the OXY at the tth moment is shown,indicating the coordinate positions of other unmanned aerial vehicles under OXY after the mth unmanned aerial vehicle is removed at the tth moment, wherein m 'is a positive integer, m' ≠ m, andσm,n(t) represents the status of the nth user accessing the mth drone;
step 306, adopting a computer to observe the observation state of the mth unmanned aerial vehicle at the tth momentIn an unmanned aerial vehicle Actor network with an input initial value of phi (0), the unmanned aerial vehicle Actor network outputs the observation state of the mth unmanned aerial vehicle at the tth momentAction of mth unmanned aerial vehicle at the next tth momentProbability distribution ofWherein,obeying a beta distribution, i.e.αφAnd betaφAre all shape parameters of beta distribution;the action of the mth unmanned aerial vehicle at the tth moment is shown;
according toSampling actionThe launch power of the mth unmanned aerial vehicle to the nth user at the tth moment is obtainedRate output valueBandwidth output value of mth unmanned aerial vehicle to nth user at tth momentAnd the flight azimuth angle of the mth unmanned aerial vehicle at the tth momentAnd the motion of the mth unmanned aerial vehicle at the tth momentProbability of (2)
Step 307, setting by computerAs an action mask of the mth unmanned aerial vehicle at the tth moment, a computer command is adoptedAndwherein,indicating that the mth unmanned plane masks the nth user with the power value at the tth moment,indicating that the mth unmanned aerial vehicle masks the nth user with the bandwidth value at the tth moment;
step 308, using a computer toObtaining the action component p of the transmitting power distributed to the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t);
By computer according toObtaining a bandwidth resource action component b distributed to the nth user by the mth unmanned aerial vehicle at the tth momentm,n(t); wherein, bm(t) represents the bandwidth resources which can be allocated by the mth unmanned aerial vehicle at the tth moment, andBtotalrepresenting the total bandwidth resource, s, shared by all UAVsm(t) represents the total number of users accessing the mth drone, bminRepresenting a minimum separable bandwidth; ptotalRepresenting the total transmit power of the transmissions of each drone;
step 309, obtaining the motion of the mth unmanned aerial vehicle at the tth moment by using a computerAnd isWherein,representing the flight azimuth angle of the mth unmanned aerial vehicle at the tth moment;
step 30A, the observation state of the nth user at the tth moment isAnd the observation state of the mth unmanned aerial vehicle at the tth moment isAre combined and recorded as the ith time of the tObserved state of agentWherein, the agent includes M unmanned aerial vehicle and N users, and i is positive integer, and
act of selecting unmanned aerial vehicle by nth user at tth momentAnd the action of the mth unmanned aerial vehicle at the tth momentMerging actions written as ith agent at tth moment
Act of selecting unmanned aerial vehicle by nth user at tth momentProbability of (2)And the action of the mth unmanned aerial vehicle at the tth momentProbability of (2)Merging action probabilities written as ith agent
Step four, acquiring global states of the unmanned aerial vehicle and the user:
step 401,Using a computer to input p in step 309 according to the shannon channel capacitym,n(t) and bm,n(t), obtaining the theoretical communication speed c provided by the mth unmanned aerial vehicle for the nth user at the tth momentm,n(t);
Step 402, using a computer according toObtaining the communication speed of the nth user at the t moment
Step 404, setting global state of nth user at tth moment as Wherein,indicating the coordinate positions of other users under OXY after the nth user is removed at the tth moment, n 'is a positive integer, n' ≠ n, and
step 405, carrying out global state of the mth unmanned aerial vehicle at the tth momentAnd global state of nth user at tth momentMerging global states recorded as ith agent at tth momentWherein i is a positive integer, and
step five, obtaining the rewards of the unmanned aerial vehicle and the user:
step 501, adopt the computer to according toObtaining the average communication speed c of N users at the t momentmean(t);
Step 502, using a computer according toObtaining the fairness index f of the mth unmanned aerial vehicle at the tth momentm(t);
Step 503, using computer toObtain reward of mth unmanned aerial vehicle at tth momentWherein r isdDenotes the reward factor, κ, of the dronerIs fm(t) an index parameter of (t),the boundary penalty item of the mth unmanned aerial vehicle at the tth moment is represented;
step 504, using a computer toReceive the reward of the nth user at the tth momentWherein r iscA reward factor representing a user;
step 505, reward of nth user at tth moment by computerAnd reward of mth unmanned aerial vehicle at tth momentIncorporating rewards accruing as ith agent at time t
Step six, storing experience tuples:
step 601, adopting a computer to sendThe experience tuple is taken as the experience tuple of the ith agent at the tth moment and is stored in a cache region;
step 602, repeating the third step to the step 601, obtaining the experience tuple of the next moment, and storing the experience tuple into the buffer area until T is T ═ TmaxWhen the data is stored, completing data storage of one round; wherein, TmaxRepresenting the total number of moments per round;
603, repeating the step 602, and storing the data of the next round until the number of the test tuples in the buffer area is B to obtain the training data of the first round; wherein B is greater than Tmax;
Step seven, parameters of the MAPPO algorithm iterative optimization network model:
step 701, inputting first round training data, and performing gradient rise optimization on a parameter phi of an unmanned aerial vehicle Actor network and a parameter theta of a user Actor network by using a computer through a MAPPO algorithm to obtain a first round optimized value of the parameter phi of the unmanned aerial vehicle Actor network and a first round optimized value of the parameter theta of the user Actor network;
meanwhile, a computer is adopted to center the Critic network omega of the unmanned aerial vehicle by using MAPPO algorithm1Parameter of and user criticic network omega2The parameters are optimized by gradient descent to obtain the parameter omega of the Critic network of the unmanned aerial vehicle1First round of optimization values and parameters omega of the user Critic network2A first round of optimization values of;
step 702, obtaining next round of training data according to the method from the third step to the step 603;
step 703, inputting next round of training data, and according to the method in step 701, performing next round of optimization updating by using the previous round of optimized values as parameter initial values to obtain next round of optimized values of the parameter phi of the unmanned aerial vehicle Actor network, next round of optimized values of the parameter theta of the user Actor network, and parameter omega of the unmanned aerial vehicle Critic network1Next round of optimization and parameter omega of user Critic network2The next round of optimization values;
step 704, according to the method from step three to step 603, completing the set maximum round ThThe P-th round training data is obtained through data storage; wherein, P is a positive integer;
705, inputting the P-th round of training data, and according to the method in the step 701, obtaining a P-th round of optimized parameter phi of the unmanned aerial vehicle Actor network, a P-th round of optimized parameter theta of the user Actor network, and a parameter omega of the unmanned aerial vehicle criticic network by using the previous round of optimized parameter as a parameter initial value1The P-th round optimization value and the parameter omega of the user Critic network2The last round optimized value of the P round;
step eight, optimizing and predicting the cooperative communication of multiple unmanned aerial vehicles and multiple users:
step 801, according to the P-th round optimization value of the parameter phi of the unmanned plane Actor network, the P-th round optimization value of the parameter theta of the user Actor network, and the parameter omega of the unmanned plane Critic network1The P-th round optimization value and the parameter omega of the user Critic network2The P-th best ofChanging the value to obtain an optimized network model;
and 802, acquiring the observation state of the nth user and the observation state of the mth unmanned aerial vehicle at the subsequent moment, and inputting the optimized network model to obtain the cooperative communication optimization action strategy of the mth unmanned aerial vehicle and the nth user at the subsequent moment.
2. The method for optimizing cooperative communication between multiple unmanned aerial vehicles and users based on MAPPO algorithm according to claim 1, wherein: step 401, using computer to input p in step 309 according to shannon channel capacitym,n(t) and bm,n(t), obtaining the theoretical communication speed c provided by the mth unmanned aerial vehicle for the nth user at the tth momentm,n(t), the specific process is as follows:
step 4011, using computer according to formulaObtaining LoS link probability from the mth unmanned aerial vehicle to the nth user at the tth momentWherein a denotes a first constant relating to the environment, b denotes a second constant relating to the environment, dm,n(t) represents the linear distance from the mth unmanned aerial vehicle to the nth user at the tth moment;
step 4012, using computer according to formulaObtaining the path loss from the mth unmanned aerial vehicle to the nth user at the tth moment under the LoS linkWherein ξLoSRepresents the added loss under the LoS link, c represents the speed of light, fcRepresents a signal carrier frequency;
step 4013, adopting computer to calculate according to formulaObtaining the path loss from the mth unmanned aerial vehicle to the nth user at the tth moment under the NLoS linkWherein ξNLoSRepresenting the additional loss under the NLoS link;
step 4014, using computer according to formulaObtaining the path loss PL from the mth unmanned aerial vehicle to the nth user signalm,n(t); wherein,the probability of NLoS link from the mth unmanned aerial vehicle to the nth user at the tth moment is represented, and
step 4015, using computer according to formulaObtaining the signal power of the nth user signal at the tth moment for receiving the mth unmanned aerial vehicle
3. The method for optimizing cooperative communication between multiple unmanned aerial vehicles and users based on MAPPO algorithm according to claim 2, wherein: in the step 4011, a is more than 4.88 and less than 28, and b is more than 0 and less than 1;
additional loss xi under NLoS link in step 4012 and step 4013NLoSAdditional loss xi greater than in LoS linkLoSAdditional loss xi under LoS linkLoSThe value range of (0dB,50dB), additional loss xi under NLoS linkNLoSThe value range of (10dB,100 dB);
the user reward factor r in step 504cThe value range of (1) to (3);
reward factor r of the drone in step 503dHas a value range of 1 to 5, and rdGreater than rc(ii) a Index parameter kapparThe value range of (1) is a positive integer of 1-5.
4. The method for optimizing cooperative communication between multiple unmanned aerial vehicles and users based on MAPPO algorithm according to claim 1, wherein: boundary penalty item of mth unmanned aerial vehicle at tth moment in step 503The specific process of obtaining is as follows:
step 5031, setting the upper bound of the ground area D on the X axis as umax,xThe upper bound of the ground area D on the Y axis is umax,yThe lower bound of the ground area D on the X-axis is umin,xThe lower bound of the ground area D on the Y axis is umin,y(ii) a And u ismin,x=umin,y=0;
Step 5032, adopting a computer to determine the position of the mth unmanned aerial vehicle at the tth momentObtaining the X coordinate of the mth unmanned aerial vehicle at the tth momentAnd the Y coordinate of the mth unmanned aerial vehicle at the tth moment
Step 5033, whenGreater than umax,xOrLess than umin,xAccording to the computerObtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth momentWherein r isbDenotes a penalty factor, κbRepresenting gradient factors for determining the smoothness of the boundary function, and a penalty factor rbHas a value range of 10 to 50 and a gradient factor kappab0.07 to 0.1;
when in useGreater than umax,yOrLess than umin,yAccording to the computerObtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
When in useGreater than umax,xAnd isGreater than umax,yOrLess than umin,xAnd isLess than umin,yAccording to the computerObtaining the boundary punishment item of the mth unmanned aerial vehicle at the tth moment
5. the method for optimizing cooperative communication between multiple unmanned aerial vehicles and users based on MAPPO algorithm according to claim 1, wherein: in the step 301, the value range of w is 3-20;
alpha in step 306φAnd betaφThe following are satisfied: alpha is alphaφ≥1,βφ≥1。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110806485.3A CN113359480B (en) | 2021-07-16 | 2021-07-16 | Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110806485.3A CN113359480B (en) | 2021-07-16 | 2021-07-16 | Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113359480A CN113359480A (en) | 2021-09-07 |
CN113359480B true CN113359480B (en) | 2022-02-01 |
Family
ID=77539837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110806485.3A Active CN113359480B (en) | 2021-07-16 | 2021-07-16 | Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113359480B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114337785A (en) * | 2021-12-30 | 2022-04-12 | 陕西锐远信息科技有限公司 | Solar unmanned aerial vehicle communication energy management strategy, system, terminal and storage medium |
CN114363340B (en) * | 2022-01-12 | 2023-12-26 | 东南大学 | Unmanned aerial vehicle cluster failure control method, system and storage medium |
CN114895710A (en) * | 2022-05-31 | 2022-08-12 | 中国人民解放军陆军工程大学 | Control method and system for autonomous behavior of unmanned aerial vehicle cluster |
CN114915998B (en) * | 2022-05-31 | 2023-05-05 | 电子科技大学 | Channel capacity calculation method for unmanned aerial vehicle auxiliary ad hoc network communication system |
CN115484205B (en) * | 2022-07-12 | 2023-12-01 | 北京邮电大学 | Deterministic network routing and queue scheduling method and device |
CN115494732B (en) * | 2022-09-29 | 2024-04-12 | 湖南大学 | Unmanned aerial vehicle track design and power distribution method based on near-end strategy optimization |
CN118113482B (en) * | 2024-04-26 | 2024-08-13 | 北京科技大学 | Safe calculation unloading method and system for intelligent eavesdropper |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110404264A (en) * | 2019-07-25 | 2019-11-05 | 哈尔滨工业大学(深圳) | It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game |
CN110488861A (en) * | 2019-07-30 | 2019-11-22 | 北京邮电大学 | Unmanned plane track optimizing method, device and unmanned plane based on deeply study |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
CN111880563A (en) * | 2020-07-17 | 2020-11-03 | 西北工业大学 | Multi-unmanned aerial vehicle task decision method based on MADDPG |
WO2021033486A1 (en) * | 2019-08-22 | 2021-02-25 | オムロン株式会社 | Model generation device, model generation method, control device, and control method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3725471A1 (en) * | 2019-04-16 | 2020-10-21 | Robert Bosch GmbH | Configuring a system which interacts with an environment |
-
2021
- 2021-07-16 CN CN202110806485.3A patent/CN113359480B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110404264A (en) * | 2019-07-25 | 2019-11-05 | 哈尔滨工业大学(深圳) | It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game |
CN110488861A (en) * | 2019-07-30 | 2019-11-22 | 北京邮电大学 | Unmanned plane track optimizing method, device and unmanned plane based on deeply study |
WO2021033486A1 (en) * | 2019-08-22 | 2021-02-25 | オムロン株式会社 | Model generation device, model generation method, control device, and control method |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
CN111880563A (en) * | 2020-07-17 | 2020-11-03 | 西北工业大学 | Multi-unmanned aerial vehicle task decision method based on MADDPG |
Non-Patent Citations (3)
Title |
---|
Efficient Deployment With Geometric Analysis for mmWave UAV Communications;Jianwei Zhao et. al.;《IEEE WIRELESS COMMUNICATIONS LETTERS》;20200731;第9卷(第7期);第1115-1119页 * |
一种基于公平性的无人机基站通信智能资源调度方法;吴官翰 等;《中兴通讯技术》;20210430;第27卷(第2期);第31-36页 * |
无人机骨干网分布式组网及接入选择算法;吴炜钰 等;《计算机学报》;20190228;第42卷(第2期);第121-137页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113359480A (en) | 2021-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113359480B (en) | Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm | |
CN111786713B (en) | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning | |
CN108419286B (en) | 5G unmanned aerial vehicle communication combined beam and power distribution method | |
CN111193536A (en) | Multi-unmanned aerial vehicle base station track optimization and power distribution method | |
CN114169234B (en) | Scheduling optimization method and system for unmanned aerial vehicle auxiliary mobile edge calculation | |
CN111970709B (en) | Unmanned aerial vehicle relay deployment method and system based on particle swarm optimization algorithm | |
CN113660681B (en) | Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission | |
CN115278729B (en) | Unmanned plane cooperation data collection and data unloading method in ocean Internet of things | |
CN115499921A (en) | Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network | |
CN112203289A (en) | Aerial base station network deployment method for area coverage of cluster unmanned aerial vehicle | |
Hajiakhondi-Meybodi et al. | Joint transmission scheme and coded content placement in cluster-centric UAV-aided cellular networks | |
CN113115344A (en) | Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization | |
CN117499867A (en) | Method for realizing high-energy-efficiency calculation and unloading through strategy gradient algorithm in multi-unmanned plane auxiliary movement edge calculation | |
CN113919483A (en) | Method and system for constructing and positioning radio map in wireless communication network | |
CN112702713B (en) | Low-altitude unmanned-machine communication deployment method under multi-constraint condition | |
CN116684852B (en) | Mountain land metallocene forest environment unmanned aerial vehicle communication resource and hovering position planning method | |
CN116249202A (en) | Combined positioning and computing support method for Internet of things equipment | |
CN116321181A (en) | Online track and resource optimization method for multi-unmanned aerial vehicle auxiliary edge calculation | |
CN115765826A (en) | Unmanned aerial vehicle network topology reconstruction method for on-demand service | |
CN114980205A (en) | QoE (quality of experience) maximization method and device for multi-antenna unmanned aerial vehicle video transmission system | |
Li et al. | Resource optimization for multi uav formation communication based on dqsenet | |
CN117750505A (en) | Space-earth integrated slice network resource allocation method | |
CN117858105B (en) | Multi-unmanned aerial vehicle cooperation set dividing and deploying method in complex electromagnetic environment | |
CN118764879A (en) | RIS-assisted multi-unmanned aerial vehicle high-energy-efficiency fair communication coverage method | |
CN117479237A (en) | Data caching and distributing method and system based on three-dimensional matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |