CN116113025A

CN116113025A - Track design and power distribution method in unmanned aerial vehicle cooperative communication network

Info

Publication number: CN116113025A
Application number: CN202310121573.9A
Authority: CN
Inventors: 陆永安; 唐洪莹
Original assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Current assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Priority date: 2023-02-16
Filing date: 2023-02-16
Publication date: 2023-05-12

Abstract

The invention relates to a track design and power distribution method in an unmanned aerial vehicle cooperative communication network, which comprises the following steps: establishing a power distribution model by taking the maximum capacity of a battery of which the final virtual residual energy of the node is closest to the node as a target; and solving the power distribution model by adopting a reinforcement learning distance reward and punishment algorithm to obtain an optimization scheme of node resource distribution and residual energy. The invention can lead the node energy of the whole unmanned plane to be distributed more uniformly.

Description

Track design and power distribution method in unmanned aerial vehicle cooperative communication network

Technical Field

The invention relates to the technical fields of wireless communication technology and Internet of things, in particular to a track design and power distribution method in an unmanned aerial vehicle cooperative communication network.

Background

In the traditional scene, the wireless sensor network node battery is limited, and huge energy consumption cannot be processed. Radio frequency based energy harvesting may be considered a prospective approach to extending the lifetime of energy-constrained wireless sensor network nodes. In addition, massive ground network nodes have a large and frequent communication need. A wireless power supply communication network (WPCN) integrates wireless energy transfer and wireless data transfer, providing a viable solution for energy-constrained wireless sensor network nodes.

The unmanned aerial vehicle has the advantages of flexible deployment, strong line of sight (LoS) channels with ground users and controllable mobility, and has wide application in the aspects of cargo transportation, air monitoring, shooting, industrial Internet of things and the like. For example, the drone may be used as a mobile relay to facilitate information exchange between remote ground users, or as a mobile Base Station (BS) to facilitate enhanced wireless coverage and network capacity for ground mobile users. In addition, the drone may be used as a mobile energy transmitter to charge the low power wireless sensor network node WDs on the ground. By utilizing its fully controllable mobility, the unmanned aerial vehicle can properly adjust its position over time to reduce the distance from the target ground user, thereby improving the data transmission and energy transmission efficiency of the wireless power supply communication network (WPCN).

In the prior art, most of the unmanned aerial vehicle adopts a scheme of 'shortest sequence WPCN', namely, the unmanned aerial vehicle flies from a starting point position to each node position at a constant speed, data transmission is carried out on the nodes during flying, and when the unmanned aerial vehicle is right above the nodes, energy transmission is carried out on all the nodes at fixed points. Meanwhile, the conventional convex optimization algorithm is commonly used in the solution algorithm, and a continuous convex approximation method, a module gradient descent method and a punishment function method are used for converting the problem into a series of convex problems and then carrying out iterative solution. Yet another simpler approach, known as "static WPCN", is to deploy an energy-carrying communication device in the center of the network, which typically has only a fixed charging communication scheme, e.g. fixed for 10s of data transmission to each node, followed by 10s of energy transmission to each node.

None of the above methods take into account the remaining energy of the node. However, there is often imbalance in the node energy of the wireless sensor network. Since the sensor nodes around the node forward the data of other nodes at the same time, the energy consumption is faster, causing the nodes to die faster and even the transmission path to be interrupted, so-called energy holes. In addition, the sensing node is in a dormant state most of the time, so that the sensing network node has higher energy consumption and higher data transmission requirement in an area with abnormal conditions. Therefore, the inconsistent residual energy of the nodes of the sensor network causes the inconsistent energy supplementing demands on the nodes, and the energy supplementing is very necessary to consider according to the different energy demands of the nodes. On the other hand, after the node reaches the maximum energy, the remaining energy is not changed even if more energy is received. If it is to be charged again with the drone, it is a waste of energy. Therefore, the remaining energy of the node is optimized by combining the maximum energy of the battery, so that the node energy of the whole unmanned aerial vehicle can be distributed more uniformly.

Furthermore, on the node resource allocation and residual energy optimization issues, two core constraint sub-issues can be decoupled. Firstly, node resource allocation data receives the lowest constraint problem; second is the problem of remaining capacity optimization. However, since the coupling relationship between the drone trajectory and the resource allocation design variables involves complex data throughput and energy collection functions, the problem is that mixed integers are not convex, and it is difficult to obtain an optimal solution. Moreover, the conventional convex optimization algorithms such as gradient descent method, coordinate descent method, newton iteration method and genetic algorithm have a plurality of defects and shortcomings. For example, when the number of nodes and the transmission space are too large, the iteration difficulty increases exponentially, and the iteration speed also becomes very slow. The solving result strongly depends on the initial value, if the transmission parameters of the nodes are changed, for example, the initial residual energy of the nodes is inconsistent, the spatial geographic positions of the nodes are different, the model needs to be iteratively trained again, the advantage of online learning cannot be realized, the generalization capability of the model is weak, and the environmental adaptability is poor.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a track design and power distribution method in an unmanned aerial vehicle cooperative energy communication network, so that node energy of the whole unmanned aerial vehicle is distributed more uniformly.

The technical scheme adopted for solving the technical problems is as follows: the track design and power distribution method in the unmanned aerial vehicle cooperative communication network comprises the following steps:

establishing a power distribution model by taking the maximum capacity of a battery of which the final virtual residual energy of the node is closest to the node as a target;

and solving the power distribution model by adopting a reinforcement learning distance reward and punishment algorithm to obtain an optimization scheme of node resource distribution and residual energy.

The objective function of the power distribution model is:

wherein (1)>

Representing the final virtual residual energy of node k, E _max Representing the maximum energy of the battery at node k,

represents the initial energy of the node k, N represents the maximum step length of the flight cycle, eta represents the energy receiving conversion efficiency of the node, beta ₀ The channel gain of a unit distance is represented, and P represents the fixed power when the unmanned aerial vehicle broadcasts energy; z ₀ (n) and z _k (n) is a binary vector, when z _k (n) =1 and z ₀ When (n) =0, the unmanned aerial vehicle performs uplink data communication with the node k in the time slot n, and when z _k (n) =0 and z ₀ When (n) =1, the unmanned aerial vehicle performs downlink energy broadcasting in time slot n, q represents the flight coordinate of the unmanned aerial vehicle, and x _k Representing the coordinates of a node k, wherein H is the flying height of the unmanned aerial vehicle, delta t is the unit flying time slot, and p _k (n) represents the uplink communication transmit power of node k at time slot n.

The constraint conditions of the power distribution model are as follows:

wherein K represents the number of nodes, T represents the flight period, R _k Representing the data throughput of node k during the flight period,

σ ² representing the power of the additive white gaussian noise at the node, q (n) representing the timeFlight coordinates of unmanned aerial vehicle in gap n, R _min Representing a preset threshold; e (E) _k (n) represents the remaining energy of node k at time slot n,/or->

When the reinforcement learning distance reward and punishment algorithm is adopted to solve the power distribution model, the power distribution model is described as a Markov decision process, wherein a state space comprises three parts, namely: the current geographic position of the unmanned plane, past n historical actions and the data throughput of the current n nodes; the action space is defined as the movement direction of the unmanned aerial vehicle in each time slot, the energy transmission action on each node and the data transmission action on a single node, wherein each step of the unmanned aerial vehicle can only take one of the energy transmission action or the data transmission action; the reward function comprises an average achievable rate reward function and a maximum remaining energy difference reward function of each node of the uplink in each period; the transition probability matrix is set to determine probability 1.

The average achievable rate reward function of each node of the uplink in each period is that

Wherein K represents the number of nodes, T represents the flight period, R _k Representing the data throughput of node k during the flight period.

The maximum remaining energy difference rewarding function is

Wherein K represents the number of nodes, +.>

Represents the initial energy of the node k, eta represents the energy receiving conversion efficiency of the node, beta ₀ The channel gain of a unit distance is represented, and P represents the fixed power when the unmanned aerial vehicle broadcasts energy; z ₀ (n) and z _k (n) is a binary vector, q represents the flight coordinates of the unmanned aerial vehicle, and x _k Representing the coordinates of a node k, wherein H is the flying height of the unmanned aerial vehicle, delta t is the unit flying time slot, and p _k (n) represents the uplink communication transmission power of node k at time slot n, E _max Representing the maximum energy of the battery at node k.

Advantageous effects

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the method adopted by the invention not only can carry out energy transmission right above the nodes, but also can carry out energy transmission between the two nodes, and the energy transmission duty ratio between the two nodes is higher. The invention optimizes the residual energy of the node by combining the maximum energy of the battery, so that the node energy of the whole unmanned aerial vehicle can be distributed more uniformly. In addition, compared with the traditional method, the total energy collection difference is lower, the energy optimization aspect is better, the data and energy optimization problem of the unmanned aerial vehicle WPCN network can be better met, and therefore the energy saving effect is achieved, and the practicability is higher.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is a schematic diagram of a drone communication system in an embodiment of the present invention;

FIG. 3 is a flow chart of the reinforcement learning distance reward and punish algorithm in an embodiment of the present invention;

fig. 4 is a horizontal flight trajectory diagram of a drone employing an embodiment of the present invention in the examples;

FIG. 5 is a graph comparing average throughput rates using embodiments of the present invention and prior art;

FIG. 6 is a comparison graph of node energy harvesting employing an embodiment of the present invention and prior art;

fig. 7 is a graph comparing total collected energy differences using embodiments of the present invention and prior art.

Detailed Description

The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Further, it is understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents are intended to fall within the scope of the claims appended hereto.

The embodiment of the invention relates to a track design and power distribution method in an unmanned aerial vehicle cooperative communication network, as shown in fig. 1, comprising the following steps: (1) Establishing a power distribution model by taking the maximum capacity of a battery of which the final virtual residual energy of the node is closest to the node as a target; (2) And solving the power distribution model by adopting a reinforcement learning distance reward and punishment algorithm to obtain an optimization scheme of node resource distribution and residual energy. Wherein, step (1) establishes a system model of the problem; and (2) providing a reinforcement learning distance punishment algorithm for solving the problem. The method comprises the following steps:

in this embodiment, it is assumed that a wireless sensor network assisted by an unmanned aerial vehicle is shown in fig. 2. The coordinates of each sensing node are

A rotor unmanned aerial vehicle with take can communication equipment fixes flight altitude and is H, and flight cycle is T, and the size of unit flight time slot is +.>

N is the maximum step length of the flight period, and the flight coordinate of any time slot N is +.>

The core task of the wireless sensor network is data transmission. The communication channel between the unmanned aerial vehicle and the node is assumed to be a line-of-sight link, and the Doppler effect caused by the movement of the unmanned aerial vehicle can be perfectly compensated. According to the free space path loss model, the channel gain from the unmanned aerial vehicle to the node k is

Wherein beta is ₀ Representation ofChannel gain per unit distance (1 meter). Let p _k (n) represents the uplink communication transmission power of node k at time slot n, the data transmission rate of node k at time slot n is

Wherein,,

σ ² representing the power of Additive White Gaussian Noise (AWGN) at the node. Thus, the data throughput per node during the flight period is +.>

Based on the broadcasting characteristics of wireless transmission, all nodes are in a state of receiving energy at the same time when the unmanned aerial vehicle broadcasts the energy in a downlink mode. Assuming that the energy-carrying communication mode of the unmanned aerial vehicle adopts a time division multiplexing mode, namely, for any moment of the node k, a binary vector z exists _k (n) and z ₀ (n). When z _k (n) =1 and z ₀ (n) =0, and the unmanned aerial vehicle performs uplink data communication with the node k in the time slot n; when z _k (n) =0 and z ₀ When (n) =1, the unmanned aerial vehicle performs downlink energy broadcasting in the time slot n. As can be seen from the above strategy, there are

Assuming that a fixed power P is adopted when the unmanned aerial vehicle broadcasts energy, the energy receiving conversion efficiency of a node is eta, and the initial energy of a node k is

The maximum energy of the battery of the node is E _max . The node k is within the time slot n, and the unmanned aerial vehicle can both supplement energy and consume energy due to data transmission, so that the energy E remains currently _k (n) is:

in order to balance the energy of the network nodes, the final virtual remaining energy of node k is defined

The method comprises the following steps:

final virtual residual energy E of node k _k ^v And is not limited by the maximum energy of the battery being E _max . The more the node's final virtual remaining energy is closer to the maximum energy of the battery, the more the network node's energy is balanced and the less energy is wasted. For node k, the magnitude of the final virtual remaining energy of the node as close to the maximum capacity of the battery may be measured by the absolute value between them.

The aim of the embodiment is to jointly optimize the flight track and the power distribution of the unmanned aerial vehicle, so that the final virtual residual energy of the node is closest to the maximum capacity of the battery in the whole charging process. Thus, the problem can be expressed as:

E _max ≥E _k (n)≥0(Q.2)

p _k (n)≥0(Q.4)

wherein the first constraint (Q.1) is to satisfy the unmanned aerial vehicle collecting data for each node in each cycleIs greater than a certain threshold R _min The second constraint (Q.2) is to satisfy the remaining energy E of each time slot n _k (n) none of which is greater than the maximum battery energy of the node E _max And cannot be less than 0.

The reinforcement learning problem may be described as a Markov Decision Process (MDP) defined by a 4-tuple { S, A, P, R }. Specifically, s= { S ₁ ,s ₂ ,…,s _m -state space; a= { a ₁ ,a ₂ ,…,a _m -representing an action space; r represents a reward function, wherein R (s, a) represents a reward for performing action a in state s; p represents a transition probability matrix. The optimal policy is obtained through the interaction of the RL proxy with the environment. The whole interaction process is as follows: the RL proxy observes the environment and then obtains the current state s _t E S. In this state an action a is selected and performed _t After E A, the agent s can be obtained _t+1 At the end of the period, the RL agent gets a prize r according to the circumstances _t 。

In this embodiment, the 4-tuple { S, A, P, R } is defined as follows:

1) State space S (t): in MDP, the system state should be observable and accessible, and a reasonable state can allow the model to converge faster, and the end result can converge to sub-optimal or even optimal values. The state space S consists of three parts, namely 1. The current geographic position V (t) = { x (t), y (t) } of the unmanned aerial vehicle; 2. past n historical actions A _n (t)＝{a _t ,a _t-1 ,…,a _t-n -a };3. data throughput G for current n nodes _n (t)＝{g ₁ (t),g ₂ (t),…,g _n (t) }. Thus, the state space is denoted as S (t) = { V (t), a _n (t),G _n (t)}。

2) Action space a (t): the action of the drone may be defined as its direction of motion a in each time slot ₁ = { up, down, left, right, stop } and energy transfer action a for each node ₂ And data transfer action a for a single node ₃ . Wherein, each step of the unmanned plane can only take energy transmission action A ₂ Or number ofData transmission action A ₃ One of them. Thus a (t) = { a ₁ ,(A ₂ orA ₃ )}。

3) Bonus function R (t): since this embodiment focuses on MDP of limited time range, i.e. movement of T steps, the definition of each step may consist of the total uplink rate in each time slot and the maximum remaining energy difference of each node, the key factors affecting the prize size are the unmanned-to-node distance change process and the path selection process. Wherein the average achievable rate reward function of the uplink in each period is

Maximum remaining energy difference bonus function

Thus the bonus function R (t) = { R ₁ (t),R ₂ (t)}。

4) Transition probability matrix P: probability of state transition P (S) _t+1 |S _t ,A _t ) Defined as the probability distribution of the next state given the current state and the action taken, and characterizes the dynamics of the overall system. In the present embodiment, for convenience, the transition probability matrix is set to the determination probability 1, i.e., the current state S _t Action A is performed _t In the event that only the unique state S can be reached _t+1 。

Obviously, the action space of the MDP defined in the above-mentioned problem is discrete, the transition probability matrix P is deterministic, and the state action space is huge, and the reinforcement learning method based on table values is not applicable. That is, the MDP problem of node resource allocation and residual energy optimization should be solved using DQN algorithm suitable for discrete action space in reinforcement learning (see fig. 3). Finally, a distance reward punishment function is combined, and an RD-DQN algorithm is provided, wherein the pseudo code is as follows:

an example of the application of the present method to an actual scene is listed below.

For an unmanned aerial vehicle WPCN communication system, setting the number of nodes to be K=6, the flight height H=5m of the unmanned aerial vehicle and the maximum horizontal flight speed V _L =10m/s, maximum emission power P=10W of energy conservation of the unmanned aerial vehicle, and flight starting point of the unmanned aerial vehicle is q ₀ ＝[0,0,5] ^T . Node noise power sigma ² Channel gain beta at distance 1 of = -80dBm ₀ -30dB, slot interval delta _t Energy collection efficiency η=50% for a node with a transmit power Q for an unmanned aerial vehicle of =0.1 s _k =0.0001W, node maximum energy E _max =0.004, node initial energy E _i Obeying uniform distribution, probability density function is

The nodes are distributed on the planes with the length and the width of 20m, and the initial positions are respectively K ₁ ＝[1,1] ^T m，K ₂ ＝[6,2] ^T m，K ₃ ＝[12,5] ^T m，K ₄ ＝[15,10] ^T m，K ₅ ＝[10,15] ^T m，K ₆ ＝[14,18] ^T m, node minimum average throughput rate requirement R _min =0.75 bps/Hz. In the simulation, according to the ascending order of the positions of the nodes on the x-axis, the nodes are respectively numbered as node 1, node 2, … … and node 6, and the unmanned aerial vehicle can count the data transmission condition R of the nodes at each moment _k Energy harvesting case E _v . Considering two comparison schemes, namely, the unmanned aerial vehicle flies from a starting point position to a node position at a constant speed, data transmission is carried out on the node during flying, and when the unmanned aerial vehicle is right above the node, energy transmission is carried out on the node at a fixed point, and the model is named as a shortest sequence WPCN; unmanned aerial vehicle deployment is in network central authorities q= [10,10,5 ]] ^T m performs static data transmission and energy transmission and is named as 'static WPCN'. The scheme proposed by this embodiment is named "RD-DQN-WPCN".

Fig. 4 is a horizontal flight trajectory diagram of the unmanned aerial vehicle of "RD-DQN-WPCN", and it can be seen that the flight trajectory of the unmanned aerial vehicle is approximately the same as the node distribution situation, and meets the expected assumption situation. Meanwhile, the algorithm can not only transmit energy right above the nodes, but also transmit energy between the two nodes, and the energy transmission ratio between the two nodes is higher, and the energy transmission difference between the algorithm and the shortest-order WPCN is obvious only right above the nodes, so that the algorithm has the characteristic of more flexibility.

Fig. 5 is a comparison of the average throughput rates for three "WPCN" schemes. It can be seen that only the mentioned "RD-DQN-WPCN" scheme meets the minimum average throughput rate requirement R _min =0.75 bps/Hz; by contrast, the result of the 'static WPCN' is worst, the average throughput rate of the nodes only meets the requirement of nodes No. 3-5, and nodes No. 1 and No. 6 only have 0.57bps/Hz and 0.65bps/Hz, and the average throughput rate is inversely related to the distance from the position of the node to the network center. The average throughput rate of the nodes No. 1-5 in the shortest sequence WPCN is highest, but the node No. 6 does not meet the minimum requirement of the average throughput rate, because the transmission time is fixed, the time occupation ratio used by the nodes at the front stage is slightly higher when the nodes at the front stage transmit data, so that the transmission time of the end node is less, and therefore, the average throughput rate of the end node is only 0.72bps/Hz.

FIG. 6 is the virtual residual energy E of each node _v In contrast, where the vertical axis is E _v The amount of energy is in joules J. As can be seen from fig. 6, the proposed "RD-DQN-WPCN" scheme is a virtual residual energy E of the remaining nodes except node 1, in comparison with the remaining two schemes _v Closer to node energy E _max All at node energy E _max Up and down fluctuation, the variation amplitude is minimum.

Fig. 7 is a comparison of three scheme node energy harvesting. Collecting the total energy difference E _dif Defined as a single node virtual residual energy E _v And maximum energy E _max The absolute value sum is:

as can be seen from FIG. 7, where the total energy collection difference is lowest for the "RD-DQN-WPCN" protocol, only 0.0013711J, and 0.0022849J for the "shortest order WPCN", the energy difference is 66.64% higher than for the proposed protocol; the "static WPCN" is 0.0055924J, and the energy difference is 407.8% higher than that of the proposed scheme; the energy received by the nodes of the two schemes is far beyond the maximum capacity of the battery, and the energy utilization rate is lower. The method further shows that the RD-DQN-WPCN is optimal in energy optimization, so that the data and energy optimization problem of the unmanned aerial vehicle WPCN network can be met, the energy saving effect is achieved, and the practicability is higher.

Claims

1. The track design and power distribution method in the unmanned aerial vehicle cooperative communication network is characterized by comprising the following steps of:

2. The method for trajectory planning and power distribution in a unmanned aerial vehicle co-energy communication network according to claim 1, wherein the objective function of the power distribution model is:

wherein (1)>

represents the initial energy of node k, N represents the flight periodMaximum step length, eta represents energy receiving conversion efficiency of the node, beta ₀ The channel gain of a unit distance is represented, and P represents the fixed power when the unmanned aerial vehicle broadcasts energy; z ₀ (n) and z _k (n) is a binary vector, when z _k (n) =1 and z ₀ When (n) =0, the unmanned aerial vehicle performs uplink data communication with the node k in the time slot n, and when z _k (n) =0 and z ₀ When (n) =1, the unmanned aerial vehicle performs downlink energy broadcasting in time slot n, q represents the flight coordinate of the unmanned aerial vehicle, and x _k Representing the coordinates of a node k, wherein H is the flying height of the unmanned aerial vehicle, delta t is the unit flying time slot, and p _k (n) represents the uplink communication transmit power of node k at time slot n.

3. The method for trajectory planning and power distribution in a unmanned aerial vehicle co-energy communication network according to claim 2, wherein the constraint conditions of the power distribution model are:

σ ² representing the power of additive white gaussian noise at the node, q (n) representing the flight coordinates of the drone at time slot n, R _min Representing a preset threshold; e (E) _k (n) represents the remaining energy of node k at time slot n,

4. the method for trajectory planning and power distribution in an unmanned aerial vehicle cooperative communication network according to claim 1, wherein when the power distribution model is solved by adopting a reinforcement learning distance reward and punishment algorithm, the power distribution model is described as a markov decision process, and the state space comprises three parts, namely: the current geographic position of the unmanned plane, past n historical actions and the data throughput of the current n nodes; the action space is defined as the movement direction of the unmanned aerial vehicle in each time slot, the energy transmission action on each node and the data transmission action on a single node, wherein each step of the unmanned aerial vehicle can only take one of the energy transmission action or the data transmission action; the reward function comprises an average achievable rate reward function and a maximum remaining energy difference reward function of each node of the uplink in each period; the transition probability matrix is set to determine probability 1.

5. The method for trajectory planning and power distribution in a unmanned aerial vehicle co-energy communication network of claim 4, wherein the average achievable rate reward function for each period of each node of the uplink is

6. The method for trajectory planning and power distribution in a unmanned aerial vehicle cooperative communication network of claim 4, wherein the maximum remaining energy difference reward function is

Wherein K represents the number of nodes, +.>

Represents the initial energy of the node k, eta represents the energy receiving conversion efficiency of the node, beta ₀ The channel gain of a unit distance is represented, and P represents the fixed power when the unmanned aerial vehicle broadcasts energy; z ₀ (n) and z _k (n) is a binary vector, q represents the flight coordinates of the unmanned aerial vehicle, and x _k Representing the coordinates of a node k, H being the unmanned aerial vehicleFly height, Δt is the unit flight time slot, p _k (n) represents the uplink communication transmission power of node k at time slot n, E _max Representing the maximum energy of the battery at node k. />