[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN116113025A - Track design and power distribution method in unmanned aerial vehicle cooperative communication network - Google Patents

Track design and power distribution method in unmanned aerial vehicle cooperative communication network Download PDF

Info

Publication number
CN116113025A
CN116113025A CN202310121573.9A CN202310121573A CN116113025A CN 116113025 A CN116113025 A CN 116113025A CN 202310121573 A CN202310121573 A CN 202310121573A CN 116113025 A CN116113025 A CN 116113025A
Authority
CN
China
Prior art keywords
node
energy
unmanned aerial
aerial vehicle
power distribution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310121573.9A
Other languages
Chinese (zh)
Inventor
陆永安
唐洪莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Institute of Microsystem and Information Technology of CAS
Original Assignee
Shanghai Institute of Microsystem and Information Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Institute of Microsystem and Information Technology of CAS filed Critical Shanghai Institute of Microsystem and Information Technology of CAS
Priority to CN202310121573.9A priority Critical patent/CN116113025A/en
Publication of CN116113025A publication Critical patent/CN116113025A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/18TPC being performed according to specific parameters
    • H04W52/24TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
    • H04W52/241TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account channel quality metrics, e.g. SIR, SNR, CIR, Eb/lo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/18502Airborne stations
    • H04B7/18506Communications with or from aircraft, i.e. aeronautical mobile service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/30TPC using constraints in the total amount of available transmission power
    • H04W52/34TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading
    • H04W52/343TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading taking into account loading or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/30TPC using constraints in the total amount of available transmission power
    • H04W52/36TPC using constraints in the total amount of available transmission power with a discrete range or set of values, e.g. step size, ramping or offsets
    • H04W52/367Power values between minimum and maximum limits, e.g. dynamic range
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • Astronomy & Astrophysics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Charge And Discharge Circuits For Batteries Or The Like (AREA)

Abstract

The invention relates to a track design and power distribution method in an unmanned aerial vehicle cooperative communication network, which comprises the following steps: establishing a power distribution model by taking the maximum capacity of a battery of which the final virtual residual energy of the node is closest to the node as a target; and solving the power distribution model by adopting a reinforcement learning distance reward and punishment algorithm to obtain an optimization scheme of node resource distribution and residual energy. The invention can lead the node energy of the whole unmanned plane to be distributed more uniformly.

Description

Track design and power distribution method in unmanned aerial vehicle cooperative communication network
Technical Field
The invention relates to the technical fields of wireless communication technology and Internet of things, in particular to a track design and power distribution method in an unmanned aerial vehicle cooperative communication network.
Background
In the traditional scene, the wireless sensor network node battery is limited, and huge energy consumption cannot be processed. Radio frequency based energy harvesting may be considered a prospective approach to extending the lifetime of energy-constrained wireless sensor network nodes. In addition, massive ground network nodes have a large and frequent communication need. A wireless power supply communication network (WPCN) integrates wireless energy transfer and wireless data transfer, providing a viable solution for energy-constrained wireless sensor network nodes.
The unmanned aerial vehicle has the advantages of flexible deployment, strong line of sight (LoS) channels with ground users and controllable mobility, and has wide application in the aspects of cargo transportation, air monitoring, shooting, industrial Internet of things and the like. For example, the drone may be used as a mobile relay to facilitate information exchange between remote ground users, or as a mobile Base Station (BS) to facilitate enhanced wireless coverage and network capacity for ground mobile users. In addition, the drone may be used as a mobile energy transmitter to charge the low power wireless sensor network node WDs on the ground. By utilizing its fully controllable mobility, the unmanned aerial vehicle can properly adjust its position over time to reduce the distance from the target ground user, thereby improving the data transmission and energy transmission efficiency of the wireless power supply communication network (WPCN).
In the prior art, most of the unmanned aerial vehicle adopts a scheme of 'shortest sequence WPCN', namely, the unmanned aerial vehicle flies from a starting point position to each node position at a constant speed, data transmission is carried out on the nodes during flying, and when the unmanned aerial vehicle is right above the nodes, energy transmission is carried out on all the nodes at fixed points. Meanwhile, the conventional convex optimization algorithm is commonly used in the solution algorithm, and a continuous convex approximation method, a module gradient descent method and a punishment function method are used for converting the problem into a series of convex problems and then carrying out iterative solution. Yet another simpler approach, known as "static WPCN", is to deploy an energy-carrying communication device in the center of the network, which typically has only a fixed charging communication scheme, e.g. fixed for 10s of data transmission to each node, followed by 10s of energy transmission to each node.
None of the above methods take into account the remaining energy of the node. However, there is often imbalance in the node energy of the wireless sensor network. Since the sensor nodes around the node forward the data of other nodes at the same time, the energy consumption is faster, causing the nodes to die faster and even the transmission path to be interrupted, so-called energy holes. In addition, the sensing node is in a dormant state most of the time, so that the sensing network node has higher energy consumption and higher data transmission requirement in an area with abnormal conditions. Therefore, the inconsistent residual energy of the nodes of the sensor network causes the inconsistent energy supplementing demands on the nodes, and the energy supplementing is very necessary to consider according to the different energy demands of the nodes. On the other hand, after the node reaches the maximum energy, the remaining energy is not changed even if more energy is received. If it is to be charged again with the drone, it is a waste of energy. Therefore, the remaining energy of the node is optimized by combining the maximum energy of the battery, so that the node energy of the whole unmanned aerial vehicle can be distributed more uniformly.
Furthermore, on the node resource allocation and residual energy optimization issues, two core constraint sub-issues can be decoupled. Firstly, node resource allocation data receives the lowest constraint problem; second is the problem of remaining capacity optimization. However, since the coupling relationship between the drone trajectory and the resource allocation design variables involves complex data throughput and energy collection functions, the problem is that mixed integers are not convex, and it is difficult to obtain an optimal solution. Moreover, the conventional convex optimization algorithms such as gradient descent method, coordinate descent method, newton iteration method and genetic algorithm have a plurality of defects and shortcomings. For example, when the number of nodes and the transmission space are too large, the iteration difficulty increases exponentially, and the iteration speed also becomes very slow. The solving result strongly depends on the initial value, if the transmission parameters of the nodes are changed, for example, the initial residual energy of the nodes is inconsistent, the spatial geographic positions of the nodes are different, the model needs to be iteratively trained again, the advantage of online learning cannot be realized, the generalization capability of the model is weak, and the environmental adaptability is poor.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a track design and power distribution method in an unmanned aerial vehicle cooperative energy communication network, so that node energy of the whole unmanned aerial vehicle is distributed more uniformly.
The technical scheme adopted for solving the technical problems is as follows: the track design and power distribution method in the unmanned aerial vehicle cooperative communication network comprises the following steps:
establishing a power distribution model by taking the maximum capacity of a battery of which the final virtual residual energy of the node is closest to the node as a target;
and solving the power distribution model by adopting a reinforcement learning distance reward and punishment algorithm to obtain an optimization scheme of node resource distribution and residual energy.
The objective function of the power distribution model is:
Figure BDA0004080110570000021
wherein (1)>
Figure BDA0004080110570000022
Representing the final virtual residual energy of node k, E max Representing the maximum energy of the battery at node k,
Figure BDA0004080110570000023
Figure BDA0004080110570000024
represents the initial energy of the node k, N represents the maximum step length of the flight cycle, eta represents the energy receiving conversion efficiency of the node, beta 0 The channel gain of a unit distance is represented, and P represents the fixed power when the unmanned aerial vehicle broadcasts energy; z 0 (n) and z k (n) is a binary vector, when z k (n) =1 and z 0 When (n) =0, the unmanned aerial vehicle performs uplink data communication with the node k in the time slot n, and when z k (n) =0 and z 0 When (n) =1, the unmanned aerial vehicle performs downlink energy broadcasting in time slot n, q represents the flight coordinate of the unmanned aerial vehicle, and x k Representing the coordinates of a node k, wherein H is the flying height of the unmanned aerial vehicle, delta t is the unit flying time slot, and p k (n) represents the uplink communication transmit power of node k at time slot n.
The constraint conditions of the power distribution model are as follows:
Figure BDA0004080110570000031
wherein K represents the number of nodes, T represents the flight period, R k Representing the data throughput of node k during the flight period,
Figure BDA0004080110570000032
σ 2 representing the power of the additive white gaussian noise at the node, q (n) representing the timeFlight coordinates of unmanned aerial vehicle in gap n, R min Representing a preset threshold; e (E) k (n) represents the remaining energy of node k at time slot n,/or->
Figure BDA0004080110570000033
When the reinforcement learning distance reward and punishment algorithm is adopted to solve the power distribution model, the power distribution model is described as a Markov decision process, wherein a state space comprises three parts, namely: the current geographic position of the unmanned plane, past n historical actions and the data throughput of the current n nodes; the action space is defined as the movement direction of the unmanned aerial vehicle in each time slot, the energy transmission action on each node and the data transmission action on a single node, wherein each step of the unmanned aerial vehicle can only take one of the energy transmission action or the data transmission action; the reward function comprises an average achievable rate reward function and a maximum remaining energy difference reward function of each node of the uplink in each period; the transition probability matrix is set to determine probability 1.
The average achievable rate reward function of each node of the uplink in each period is that
Figure BDA0004080110570000034
Wherein K represents the number of nodes, T represents the flight period, R k Representing the data throughput of node k during the flight period.
The maximum remaining energy difference rewarding function is
Figure BDA0004080110570000035
Wherein K represents the number of nodes, +.>
Figure BDA0004080110570000036
Represents the initial energy of the node k, eta represents the energy receiving conversion efficiency of the node, beta 0 The channel gain of a unit distance is represented, and P represents the fixed power when the unmanned aerial vehicle broadcasts energy; z 0 (n) and z k (n) is a binary vector, q represents the flight coordinates of the unmanned aerial vehicle, and x k Representing the coordinates of a node k, wherein H is the flying height of the unmanned aerial vehicle, delta t is the unit flying time slot, and p k (n) represents the uplink communication transmission power of node k at time slot n, E max Representing the maximum energy of the battery at node k.
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the method adopted by the invention not only can carry out energy transmission right above the nodes, but also can carry out energy transmission between the two nodes, and the energy transmission duty ratio between the two nodes is higher. The invention optimizes the residual energy of the node by combining the maximum energy of the battery, so that the node energy of the whole unmanned aerial vehicle can be distributed more uniformly. In addition, compared with the traditional method, the total energy collection difference is lower, the energy optimization aspect is better, the data and energy optimization problem of the unmanned aerial vehicle WPCN network can be better met, and therefore the energy saving effect is achieved, and the practicability is higher.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
FIG. 2 is a schematic diagram of a drone communication system in an embodiment of the present invention;
FIG. 3 is a flow chart of the reinforcement learning distance reward and punish algorithm in an embodiment of the present invention;
fig. 4 is a horizontal flight trajectory diagram of a drone employing an embodiment of the present invention in the examples;
FIG. 5 is a graph comparing average throughput rates using embodiments of the present invention and prior art;
FIG. 6 is a comparison graph of node energy harvesting employing an embodiment of the present invention and prior art;
fig. 7 is a graph comparing total collected energy differences using embodiments of the present invention and prior art.
Detailed Description
The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Further, it is understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents are intended to fall within the scope of the claims appended hereto.
The embodiment of the invention relates to a track design and power distribution method in an unmanned aerial vehicle cooperative communication network, as shown in fig. 1, comprising the following steps: (1) Establishing a power distribution model by taking the maximum capacity of a battery of which the final virtual residual energy of the node is closest to the node as a target; (2) And solving the power distribution model by adopting a reinforcement learning distance reward and punishment algorithm to obtain an optimization scheme of node resource distribution and residual energy. Wherein, step (1) establishes a system model of the problem; and (2) providing a reinforcement learning distance punishment algorithm for solving the problem. The method comprises the following steps:
in this embodiment, it is assumed that a wireless sensor network assisted by an unmanned aerial vehicle is shown in fig. 2. The coordinates of each sensing node are
Figure BDA0004080110570000051
A rotor unmanned aerial vehicle with take can communication equipment fixes flight altitude and is H, and flight cycle is T, and the size of unit flight time slot is +.>
Figure BDA0004080110570000052
N is the maximum step length of the flight period, and the flight coordinate of any time slot N is +.>
Figure BDA0004080110570000053
The core task of the wireless sensor network is data transmission. The communication channel between the unmanned aerial vehicle and the node is assumed to be a line-of-sight link, and the Doppler effect caused by the movement of the unmanned aerial vehicle can be perfectly compensated. According to the free space path loss model, the channel gain from the unmanned aerial vehicle to the node k is
Figure BDA0004080110570000054
Wherein beta is 0 Representation ofChannel gain per unit distance (1 meter). Let p k (n) represents the uplink communication transmission power of node k at time slot n, the data transmission rate of node k at time slot n is
Figure BDA0004080110570000055
Wherein,,
Figure BDA0004080110570000056
σ 2 representing the power of Additive White Gaussian Noise (AWGN) at the node. Thus, the data throughput per node during the flight period is +.>
Figure BDA0004080110570000057
Based on the broadcasting characteristics of wireless transmission, all nodes are in a state of receiving energy at the same time when the unmanned aerial vehicle broadcasts the energy in a downlink mode. Assuming that the energy-carrying communication mode of the unmanned aerial vehicle adopts a time division multiplexing mode, namely, for any moment of the node k, a binary vector z exists k (n) and z 0 (n). When z k (n) =1 and z 0 (n) =0, and the unmanned aerial vehicle performs uplink data communication with the node k in the time slot n; when z k (n) =0 and z 0 When (n) =1, the unmanned aerial vehicle performs downlink energy broadcasting in the time slot n. As can be seen from the above strategy, there are
Figure BDA0004080110570000058
Assuming that a fixed power P is adopted when the unmanned aerial vehicle broadcasts energy, the energy receiving conversion efficiency of a node is eta, and the initial energy of a node k is
Figure BDA0004080110570000059
The maximum energy of the battery of the node is E max . The node k is within the time slot n, and the unmanned aerial vehicle can both supplement energy and consume energy due to data transmission, so that the energy E remains currently k (n) is:
Figure BDA0004080110570000061
in order to balance the energy of the network nodes, the final virtual remaining energy of node k is defined
Figure BDA0004080110570000062
The method comprises the following steps:
Figure BDA0004080110570000063
final virtual residual energy E of node k k v And is not limited by the maximum energy of the battery being E max . The more the node's final virtual remaining energy is closer to the maximum energy of the battery, the more the network node's energy is balanced and the less energy is wasted. For node k, the magnitude of the final virtual remaining energy of the node as close to the maximum capacity of the battery may be measured by the absolute value between them.
The aim of the embodiment is to jointly optimize the flight track and the power distribution of the unmanned aerial vehicle, so that the final virtual residual energy of the node is closest to the maximum capacity of the battery in the whole charging process. Thus, the problem can be expressed as:
Figure BDA0004080110570000064
Figure BDA0004080110570000065
E max ≥E k (n)≥0(Q.2)
Figure BDA0004080110570000066
p k (n)≥0(Q.4)
wherein the first constraint (Q.1) is to satisfy the unmanned aerial vehicle collecting data for each node in each cycleIs greater than a certain threshold R min The second constraint (Q.2) is to satisfy the remaining energy E of each time slot n k (n) none of which is greater than the maximum battery energy of the node E max And cannot be less than 0.
The reinforcement learning problem may be described as a Markov Decision Process (MDP) defined by a 4-tuple { S, A, P, R }. Specifically, s= { S 1 ,s 2 ,…,s m -state space; a= { a 1 ,a 2 ,…,a m -representing an action space; r represents a reward function, wherein R (s, a) represents a reward for performing action a in state s; p represents a transition probability matrix. The optimal policy is obtained through the interaction of the RL proxy with the environment. The whole interaction process is as follows: the RL proxy observes the environment and then obtains the current state s t E S. In this state an action a is selected and performed t After E A, the agent s can be obtained t+1 At the end of the period, the RL agent gets a prize r according to the circumstances t
In this embodiment, the 4-tuple { S, A, P, R } is defined as follows:
1) State space S (t): in MDP, the system state should be observable and accessible, and a reasonable state can allow the model to converge faster, and the end result can converge to sub-optimal or even optimal values. The state space S consists of three parts, namely 1. The current geographic position V (t) = { x (t), y (t) } of the unmanned aerial vehicle; 2. past n historical actions A n (t)={a t ,a t-1 ,…,a t-n -a };3. data throughput G for current n nodes n (t)={g 1 (t),g 2 (t),…,g n (t) }. Thus, the state space is denoted as S (t) = { V (t), a n (t),G n (t)}。
2) Action space a (t): the action of the drone may be defined as its direction of motion a in each time slot 1 = { up, down, left, right, stop } and energy transfer action a for each node 2 And data transfer action a for a single node 3 . Wherein, each step of the unmanned plane can only take energy transmission action A 2 Or number ofData transmission action A 3 One of them. Thus a (t) = { a 1 ,(A 2 orA 3 )}。
3) Bonus function R (t): since this embodiment focuses on MDP of limited time range, i.e. movement of T steps, the definition of each step may consist of the total uplink rate in each time slot and the maximum remaining energy difference of each node, the key factors affecting the prize size are the unmanned-to-node distance change process and the path selection process. Wherein the average achievable rate reward function of the uplink in each period is
Figure BDA0004080110570000071
Maximum remaining energy difference bonus function
Figure BDA0004080110570000072
Thus the bonus function R (t) = { R 1 (t),R 2 (t)}。
4) Transition probability matrix P: probability of state transition P (S) t+1 |S t ,A t ) Defined as the probability distribution of the next state given the current state and the action taken, and characterizes the dynamics of the overall system. In the present embodiment, for convenience, the transition probability matrix is set to the determination probability 1, i.e., the current state S t Action A is performed t In the event that only the unique state S can be reached t+1
Obviously, the action space of the MDP defined in the above-mentioned problem is discrete, the transition probability matrix P is deterministic, and the state action space is huge, and the reinforcement learning method based on table values is not applicable. That is, the MDP problem of node resource allocation and residual energy optimization should be solved using DQN algorithm suitable for discrete action space in reinforcement learning (see fig. 3). Finally, a distance reward punishment function is combined, and an RD-DQN algorithm is provided, wherein the pseudo code is as follows:
Figure BDA0004080110570000081
an example of the application of the present method to an actual scene is listed below.
For an unmanned aerial vehicle WPCN communication system, setting the number of nodes to be K=6, the flight height H=5m of the unmanned aerial vehicle and the maximum horizontal flight speed V L =10m/s, maximum emission power P=10W of energy conservation of the unmanned aerial vehicle, and flight starting point of the unmanned aerial vehicle is q 0 =[0,0,5] T . Node noise power sigma 2 Channel gain beta at distance 1 of = -80dBm 0 -30dB, slot interval delta t Energy collection efficiency η=50% for a node with a transmit power Q for an unmanned aerial vehicle of =0.1 s k =0.0001W, node maximum energy E max =0.004, node initial energy E i Obeying uniform distribution, probability density function is
Figure BDA0004080110570000082
The nodes are distributed on the planes with the length and the width of 20m, and the initial positions are respectively K 1 =[1,1] T m,K 2 =[6,2] T m,K 3 =[12,5] T m,K 4 =[15,10] T m,K 5 =[10,15] T m,K 6 =[14,18] T m, node minimum average throughput rate requirement R min =0.75 bps/Hz. In the simulation, according to the ascending order of the positions of the nodes on the x-axis, the nodes are respectively numbered as node 1, node 2, … … and node 6, and the unmanned aerial vehicle can count the data transmission condition R of the nodes at each moment k Energy harvesting case E v . Considering two comparison schemes, namely, the unmanned aerial vehicle flies from a starting point position to a node position at a constant speed, data transmission is carried out on the node during flying, and when the unmanned aerial vehicle is right above the node, energy transmission is carried out on the node at a fixed point, and the model is named as a shortest sequence WPCN; unmanned aerial vehicle deployment is in network central authorities q= [10,10,5 ]] T m performs static data transmission and energy transmission and is named as 'static WPCN'. The scheme proposed by this embodiment is named "RD-DQN-WPCN".
Fig. 4 is a horizontal flight trajectory diagram of the unmanned aerial vehicle of "RD-DQN-WPCN", and it can be seen that the flight trajectory of the unmanned aerial vehicle is approximately the same as the node distribution situation, and meets the expected assumption situation. Meanwhile, the algorithm can not only transmit energy right above the nodes, but also transmit energy between the two nodes, and the energy transmission ratio between the two nodes is higher, and the energy transmission difference between the algorithm and the shortest-order WPCN is obvious only right above the nodes, so that the algorithm has the characteristic of more flexibility.
Fig. 5 is a comparison of the average throughput rates for three "WPCN" schemes. It can be seen that only the mentioned "RD-DQN-WPCN" scheme meets the minimum average throughput rate requirement R min =0.75 bps/Hz; by contrast, the result of the 'static WPCN' is worst, the average throughput rate of the nodes only meets the requirement of nodes No. 3-5, and nodes No. 1 and No. 6 only have 0.57bps/Hz and 0.65bps/Hz, and the average throughput rate is inversely related to the distance from the position of the node to the network center. The average throughput rate of the nodes No. 1-5 in the shortest sequence WPCN is highest, but the node No. 6 does not meet the minimum requirement of the average throughput rate, because the transmission time is fixed, the time occupation ratio used by the nodes at the front stage is slightly higher when the nodes at the front stage transmit data, so that the transmission time of the end node is less, and therefore, the average throughput rate of the end node is only 0.72bps/Hz.
FIG. 6 is the virtual residual energy E of each node v In contrast, where the vertical axis is E v The amount of energy is in joules J. As can be seen from fig. 6, the proposed "RD-DQN-WPCN" scheme is a virtual residual energy E of the remaining nodes except node 1, in comparison with the remaining two schemes v Closer to node energy E max All at node energy E max Up and down fluctuation, the variation amplitude is minimum.
Fig. 7 is a comparison of three scheme node energy harvesting. Collecting the total energy difference E dif Defined as a single node virtual residual energy E v And maximum energy E max The absolute value sum is:
Figure BDA0004080110570000101
as can be seen from FIG. 7, where the total energy collection difference is lowest for the "RD-DQN-WPCN" protocol, only 0.0013711J, and 0.0022849J for the "shortest order WPCN", the energy difference is 66.64% higher than for the proposed protocol; the "static WPCN" is 0.0055924J, and the energy difference is 407.8% higher than that of the proposed scheme; the energy received by the nodes of the two schemes is far beyond the maximum capacity of the battery, and the energy utilization rate is lower. The method further shows that the RD-DQN-WPCN is optimal in energy optimization, so that the data and energy optimization problem of the unmanned aerial vehicle WPCN network can be met, the energy saving effect is achieved, and the practicability is higher.

Claims (6)

1. The track design and power distribution method in the unmanned aerial vehicle cooperative communication network is characterized by comprising the following steps of:
establishing a power distribution model by taking the maximum capacity of a battery of which the final virtual residual energy of the node is closest to the node as a target;
and solving the power distribution model by adopting a reinforcement learning distance reward and punishment algorithm to obtain an optimization scheme of node resource distribution and residual energy.
2. The method for trajectory planning and power distribution in a unmanned aerial vehicle co-energy communication network according to claim 1, wherein the objective function of the power distribution model is:
Figure FDA0004080110560000011
wherein (1)>
Figure FDA0004080110560000012
Representing the final virtual residual energy of node k, E max Representing the maximum energy of the battery at node k,
Figure FDA0004080110560000013
Figure FDA0004080110560000014
represents the initial energy of node k, N represents the flight periodMaximum step length, eta represents energy receiving conversion efficiency of the node, beta 0 The channel gain of a unit distance is represented, and P represents the fixed power when the unmanned aerial vehicle broadcasts energy; z 0 (n) and z k (n) is a binary vector, when z k (n) =1 and z 0 When (n) =0, the unmanned aerial vehicle performs uplink data communication with the node k in the time slot n, and when z k (n) =0 and z 0 When (n) =1, the unmanned aerial vehicle performs downlink energy broadcasting in time slot n, q represents the flight coordinate of the unmanned aerial vehicle, and x k Representing the coordinates of a node k, wherein H is the flying height of the unmanned aerial vehicle, delta t is the unit flying time slot, and p k (n) represents the uplink communication transmit power of node k at time slot n.
3. The method for trajectory planning and power distribution in a unmanned aerial vehicle co-energy communication network according to claim 2, wherein the constraint conditions of the power distribution model are:
Figure FDA0004080110560000015
wherein K represents the number of nodes, T represents the flight period, R k Representing the data throughput of node k during the flight period,
Figure FDA0004080110560000016
σ 2 representing the power of additive white gaussian noise at the node, q (n) representing the flight coordinates of the drone at time slot n, R min Representing a preset threshold; e (E) k (n) represents the remaining energy of node k at time slot n,
Figure FDA0004080110560000021
4. the method for trajectory planning and power distribution in an unmanned aerial vehicle cooperative communication network according to claim 1, wherein when the power distribution model is solved by adopting a reinforcement learning distance reward and punishment algorithm, the power distribution model is described as a markov decision process, and the state space comprises three parts, namely: the current geographic position of the unmanned plane, past n historical actions and the data throughput of the current n nodes; the action space is defined as the movement direction of the unmanned aerial vehicle in each time slot, the energy transmission action on each node and the data transmission action on a single node, wherein each step of the unmanned aerial vehicle can only take one of the energy transmission action or the data transmission action; the reward function comprises an average achievable rate reward function and a maximum remaining energy difference reward function of each node of the uplink in each period; the transition probability matrix is set to determine probability 1.
5. The method for trajectory planning and power distribution in a unmanned aerial vehicle co-energy communication network of claim 4, wherein the average achievable rate reward function for each period of each node of the uplink is
Figure FDA0004080110560000022
Wherein K represents the number of nodes, T represents the flight period, R k Representing the data throughput of node k during the flight period.
6. The method for trajectory planning and power distribution in a unmanned aerial vehicle cooperative communication network of claim 4, wherein the maximum remaining energy difference reward function is
Figure FDA0004080110560000023
Wherein K represents the number of nodes, +.>
Figure FDA0004080110560000024
Represents the initial energy of the node k, eta represents the energy receiving conversion efficiency of the node, beta 0 The channel gain of a unit distance is represented, and P represents the fixed power when the unmanned aerial vehicle broadcasts energy; z 0 (n) and z k (n) is a binary vector, q represents the flight coordinates of the unmanned aerial vehicle, and x k Representing the coordinates of a node k, H being the unmanned aerial vehicleFly height, Δt is the unit flight time slot, p k (n) represents the uplink communication transmission power of node k at time slot n, E max Representing the maximum energy of the battery at node k. />
CN202310121573.9A 2023-02-16 2023-02-16 Track design and power distribution method in unmanned aerial vehicle cooperative communication network Pending CN116113025A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310121573.9A CN116113025A (en) 2023-02-16 2023-02-16 Track design and power distribution method in unmanned aerial vehicle cooperative communication network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310121573.9A CN116113025A (en) 2023-02-16 2023-02-16 Track design and power distribution method in unmanned aerial vehicle cooperative communication network

Publications (1)

Publication Number Publication Date
CN116113025A true CN116113025A (en) 2023-05-12

Family

ID=86259581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310121573.9A Pending CN116113025A (en) 2023-02-16 2023-02-16 Track design and power distribution method in unmanned aerial vehicle cooperative communication network

Country Status (1)

Country Link
CN (1) CN116113025A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115550878A (en) * 2022-09-20 2022-12-30 重庆邮电大学 Unmanned aerial vehicle communication network resource allocation and deployment method supporting wireless power transmission
CN116880551A (en) * 2023-07-13 2023-10-13 之江实验室 Flight path planning method, system and storage medium based on random event capturing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115550878A (en) * 2022-09-20 2022-12-30 重庆邮电大学 Unmanned aerial vehicle communication network resource allocation and deployment method supporting wireless power transmission
CN116880551A (en) * 2023-07-13 2023-10-13 之江实验室 Flight path planning method, system and storage medium based on random event capturing

Similar Documents

Publication Publication Date Title
Do et al. Deep reinforcement learning for energy-efficient federated learning in UAV-enabled wireless powered networks
You et al. Hybrid offline-online design for UAV-enabled data harvesting in probabilistic LoS channels
Cao et al. Deep reinforcement learning for multi-user access control in non-terrestrial networks
Shamsoshoara et al. An autonomous spectrum management scheme for unmanned aerial vehicle networks in disaster relief operations
CN111050286B (en) Trajectory and resource optimization method in unmanned aerial vehicle auxiliary sensor network
WO2020015214A1 (en) Optimization method for wireless information and energy transmission based on unmanned aerial vehicle
CN116113025A (en) Track design and power distribution method in unmanned aerial vehicle cooperative communication network
CN110730031A (en) Unmanned aerial vehicle track and resource allocation joint optimization method for multi-carrier communication
CN114650567B (en) Unmanned aerial vehicle auxiliary V2I network task unloading method
Yuan et al. Harnessing UAVs for fair 5G bandwidth allocation in vehicular communication via deep reinforcement learning
CN115499921A (en) Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network
CN112564767A (en) Continuous coverage method based on self-organizing optimization cooperation in unmanned aerial vehicle network
Chen et al. Trajectory design and link selection in UAV-assisted hybrid satellite-terrestrial network
Chang et al. Machine learning-based resource allocation for multi-UAV communications system
CN113163332A (en) Road sign graph coloring unmanned aerial vehicle energy-saving endurance data collection method based on metric learning
Zhou et al. Dynamic channel allocation for multi-UAVs: A deep reinforcement learning approach
Lee et al. Multi-Agent Reinforcement Learning in Controlling Offloading Ratio and Trajectory for Multi-UAV Mobile Edge Computing
Liu et al. Learning-based multi-UAV assisted data acquisition and computation for information freshness in WPT enabled space-air-ground PIoT
Zhang et al. QoS maximization scheduling of multiple UAV base stations in 3D environment
CN113776531B (en) Multi-unmanned aerial vehicle autonomous navigation and task allocation algorithm of wireless self-powered communication network
Huang et al. Delay-oriented knowledge-driven resource allocation in sagin-based vehicular networks
Yuan et al. Joint Multi-Ground-User Edge Caching Resource Allocation for Cache-Enabled High-Low-Altitude-Platforms Integrated Network
Singh et al. Energy-efficient uav trajectory planning in rechargeable iot networks
CN116882270A (en) Multi-unmanned aerial vehicle wireless charging and edge computing combined optimization method and system based on deep reinforcement learning
Wang et al. An efficient and robust UAVs’ path planning approach for timely data collection in wireless sensor networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination