CN111212438B - Resource allocation method of wireless energy-carrying communication technology - Google Patents
Resource allocation method of wireless energy-carrying communication technology Download PDFInfo
- Publication number
- CN111212438B CN111212438B CN202010113438.6A CN202010113438A CN111212438B CN 111212438 B CN111212438 B CN 111212438B CN 202010113438 A CN202010113438 A CN 202010113438A CN 111212438 B CN111212438 B CN 111212438B
- Authority
- CN
- China
- Prior art keywords
- resource allocation
- user
- energy
- users
- decision process
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000013468 resource allocation Methods 0.000 title claims abstract description 32
- 238000005516 engineering process Methods 0.000 title claims abstract description 30
- 238000004891 communication Methods 0.000 title claims abstract description 27
- 230000008569 process Effects 0.000 claims abstract description 37
- 230000005540 biological transmission Effects 0.000 claims abstract description 36
- 230000009471 action Effects 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000013507 mapping Methods 0.000 claims description 11
- 230000002787 reinforcement Effects 0.000 claims description 8
- 230000009977 dual effect Effects 0.000 claims description 5
- 230000008030 elimination Effects 0.000 claims description 3
- 238000003379 elimination reaction Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 abstract description 28
- 230000008901 benefit Effects 0.000 description 5
- 238000004088 simulation Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000011217 control strategy Methods 0.000 description 2
- 238000005562 fading Methods 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W24/00—Supervisory, monitoring or testing arrangements
- H04W24/02—Arrangements for optimising operational condition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/06—TPC algorithms
- H04W52/14—Separate analysis of uplink or downlink
- H04W52/143—Downlink power control
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/18—TPC being performed according to specific parameters
- H04W52/24—TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters
- H04W52/241—TPC being performed according to specific parameters using SIR [Signal to Interference Ratio] or other wireless path parameters taking into account channel quality metrics, e.g. SIR, SNR, CIR, Eb/lo
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/18—TPC being performed according to specific parameters
- H04W52/26—TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service]
- H04W52/265—TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service] taking into account the quality of service QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/18—TPC being performed according to specific parameters
- H04W52/26—TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service]
- H04W52/267—TPC being performed according to specific parameters using transmission rate or quality of service QoS [Quality of Service] taking into account the information rate
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/04—TPC
- H04W52/30—TPC using constraints in the total amount of available transmission power
- H04W52/34—TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a resource allocation method of a wireless energy-carrying downlink communication scene facing a mode division multiple access technology, which solves the problem of minimizing the total transmission power of a transmitting terminal on the premise of ensuring the service quality of all users by providing a Q learning algorithm based on a constraint Markov process, wherein the service quality of the users comprises the minimum energy requirement and the minimum data rate requirement received by the users. The proposed resource allocation strategy is verified to significantly reduce the total transmission power of the transmitting end.
Description
[ technical field ] A method for producing a semiconductor device
The invention belongs to the field of wireless energy-carrying communication, and particularly relates to a resource allocation method in a wireless energy-carrying downlink communication scene facing a mode division multiple access technology.
[ background of the invention ]
The wireless energy-carrying communication technology is a novel wireless communication type, combines wireless power transmission and wireless signal transmission, and transmits energy while realizing reliable information interaction. Along with the rapid development of wireless energy-carrying communication technology, many drawbacks of the traditional power supply mode are as follows: the problem that the electric wire is easy to age and difficult to replace the battery in time is solved. However, solving the power saving and spectrum utilization problems in wireless energy-carrying communication technologies is still challenging at present.
In addition, the non-orthogonal multiple access technology is a 5G technology with great prospect, and can meet the requirements of low power consumption, high throughput, low time delay and wide coverage of a next generation mobile communication system. And the advantages of high spectrum efficiency, high access quantity and the like in the non-orthogonal multiple access technology just meet the explosive data growth and access requirements of the 5G era. In addition, the mode division multiple access technology in the non-orthogonal multiple access technology can fully utilize the multidimensional domain processing, and has the advantages of high coding flexibility, wide application range, low complexity and the like. And the application of the mode division multiple access technology in the wireless energy carrying communication technology can effectively improve the utilization rate of frequency spectrum resources and the energy efficiency. User quality of service as referred to herein includes the minimum received energy requirement and minimum data rate requirement of the receiving end user. Therefore, there is a need to find an effective tool to address the serious challenges.
In recent years, there has been an increasing discussion of how to design a reasonably efficient resource allocation method in a wireless energy-carrying communication system. The method has the advantages of universality and optimal user service quality, but has the disadvantage that the power consumption of a transmitting end cannot be minimized. The traditional method has high computational complexity and many constraints when solving the problem of minimizing the total power of transmission of a transmitting terminal in a wireless energy-carrying downlink communication scene of a mode division multiple access technology. Especially when the receiving end has a plurality of users, the service quality of each user is satisfied.
[ summary of the invention ]
The invention aims to provide a resource allocation method in a wireless energy-carrying downlink communication scene facing a mode division multiple access technology, which aims to solve the problem that the minimum total transmission power of a transmitting end still has great computational complexity while the service quality of a receiving end user is met.
The technical scheme adopted by the invention is that the resource allocation method in the wireless energy-carrying downlink communication scene facing the mode division multiple access technology is implemented according to the following steps:
step one, making a constraint Markov decision process:
describing a resource allocation problem in a wireless energy-carrying communication scene facing a mode division multiple access technology as a constraint Markov decision process, and converting the problem into an unconstrained Markov decision process by using a Lagrangian dual method;
step two, solving the unconstrained Markov decision process in the step one by using a reinforcement learning method to finally obtain an optimal resource allocation strategy; the objective of this strategy is to minimize the total power of transmission at the transmitting end while satisfying the quality of service for each user at the receiving end.
Further, the wireless energy-carrying downlink communication scenario is constructed as a system model, and the system model specifically includes:
a base station carries out wireless transmission of data and energy to T users in a specific area through K subcarriers, wherein a transmitting end adopts superposition coding, a receiving end adopts a serial interference elimination technology, and the base station of the transmitting end and the users of the receiving end are matched with a single antenna; the users are randomly distributed within a circle with radius r centered at the base station.
Further, the first step specifically comprises:
1) according to the system model, defining a state space and an action space of the system:
the state space of the system is specifically as follows:
s=(SINRk,t,k=0,1,...K,t=0,1,...T)∈S=SINR (1),
wherein, the SINRk,tThe SINR is the SINR when the kth subcarrier is loaded to the tth user, and the state set SINR belongs to a limited set of SINRs;
the action space of the system is specifically as follows:
wherein,is a vector of transmission time ratios, P, assigned to the decoding of the information by T usersPDMAIs a power distribution matrix, GPDMAIs a sub-carrier mapping matrix that is,GPDMA∈G,PPDMAe is P represents that the vector and the matrix respectively belong to a finite set of transmission time ratio, subcarrier mapping and power distribution allocated to information decoding;
2) the constrained markov decision process is detailed as follows:
wherein, PtotalIs the total power of transmission at the transmitting end; equations (4) and (5) represent the constraints on the quality of service for each user, i.e. the energy E received by each usertAnd a data rate RtAre required to respectively satisfy the minimum energy requirement EreqAnd a data rate requirement Rreq(ii) a The Markov decision process is described as being through an adjustment actionGPDMA,PPDMAMinimizing the total power of transmission at the transmitting end under the constraint of satisfying the service quality of each user;
the markov decision process can be relaxed to an unconstrained markov process, i.e.:
wherein,two sets of lagrangian operators, respectively; II type*The optimal resource allocation strategy is converted into a saddle point of a solving function L (lambda, mu, Π).
Further, in the second step, the updating formula of the Q value in reinforcement learning is specifically as follows:
wherein r isk+1Gamma and rho < 1 < 0 are respectively the reward obtained at the moment of k +1, the reward discount coefficient and the learning rate;
the optimum function is expressed as follows:
wherein Q is*(s, a) is the Q value given when the optimal policy is followed for state s and action a.
The beneficial results of the invention are:
1. the invention provides a resource allocation method in a wireless energy-carrying downlink communication scene facing a mode division multiple access technology. Taking the time switching receiver as an example, the minimum total transmission power at the transmitting end is obtained by jointly optimizing the time slot ratio, the subcarrier mapping matrix and the power allocation matrix allocated to the energy reception and the data rate by the receiver.
2. In order to solve the problem that the constrained Markov decision process is difficult to solve, the Lagrangian dual theory is used to convert the constrained Markov decision process into an unconstrained Markov decision process. And finally, obtaining the optimal strategy in the Markov decision process by applying a Q learning algorithm in reinforcement learning.
3. The effectiveness of the method is verified through experiments, and compared with other methods, the method has the advantage that the transmitting end can obtain lower total transmission power.
[ description of the drawings ]
Fig. 1 is a diagram of a system model in a wireless energy-carrying downlink communication scenario for a mode-division multiple access technology according to the present invention;
FIG. 2 is a schematic diagram illustrating a variation of total transmission power at different iterations in the embodiment;
FIG. 3 is a comparison of performance of the embodiment using the DBN algorithm and the proposed Q learning algorithm under different user data rate requirements;
FIG. 4 is a comparison of performance of the embodiment using the DBN algorithm and the proposed Q learning algorithm under different user received energy requirements;
fig. 5 is a comparison of minimum total transmit power at the transmitting end for different qos requirements of users and different numbers of users in the embodiment.
[ detailed description ] embodiments
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
In order to ensure that the total transmission power of a transmitting end in a wireless energy-carrying downlink communication scene facing to a mode division multiple access technology is minimum, the invention researches a resource allocation method based on a constraint Markov decision process. Specifically, the resource allocation problem in the wireless energy-carrying communication scene of the mode division multiple access technology is described as a constrained Markov decision process, and the constrained Markov decision problem is converted into an unconstrained Markov decision process by utilizing a Lagrangian duality theory. Finally, a Q-learning algorithm is proposed to solve the optimal solution of the unconstrained Markov decision process. Take the time-switched receiver as an example: the power allocation matrix, the subcarrier mapping matrix and the slot ratio allocated to information decoding and energy collection in the above-described scenario are adjusted to optimal values to minimize the total transmission power of the transmitter while satisfying the quality of service of each user.
Step one, constructing a system model: the system model is a wireless energy-carrying downlink communication system model based on a mode division multiple access technology and consists of a base station and a plurality of users;
the specific mode of the first step is as follows:
as shown in FIG. 1, assume that there is a base station that wirelessly transmits data and energy to T users in a particular area over K subcarriers, whereAndrespectively user index and subcarrier index. In addition to this, superposition coding is employed at the transmitter and the subcarrier mapping matrix G is satisfiedPDMA∈NK×TIn which K isk={n|gk,t1 (K ∈ K) andrespectively the set and number of users to which the k-th sub-carrier is mapped. The mapping matrix with 3 sub-carriers and 5 users is shown in fig. 1, where K 11, 2, 3, 4 and | K1And 4. In addition, the time switching receiver is taken as an example to solve the optimal resource allocation strategy. User UtBy subcarrier HkThe received signals are:
wherein h isk,t=rk,tdk -βIs through a subcarrier HkFrom base station to user UtOf channel gain rk,tIs a small scale fading that satisfies the rayleigh distribution,is large scale fading related to the distance between the base station and the user; in addition, Pk,tAnd xk,tIs to transmit a signal through a subcarrier HkLoaded to user UtPower and signal of wk,t~CN(0,σk 2) Is additive white gaussian noise.
The receiving end adopts the serial interference elimination technology according toDecoding is performed in that order. In addition to the initial point of the process,is the ratio of channel to noise, and CNRk,tShould satisfyThen, the normalized interference is:
thus, the snr when the kth subcarrier is loaded to the tth user is:
wherein,it is ensured that the decoding process is not interrupted. User UtBased on subcarrier HkThe information rate and energy obtained are respectively:
Rk,t=Bklog2(1+SINRk,t) (4)
where η is the energy collection efficiency. In addition, αtAnd 1-alphatThe transmission slot ratios assigned to information decoding and energy collection, respectively, can be deduced that the information and energy collected by each user is:
step two, formulation of a constraint Markov decision problem: the resource allocation problem in the wireless energy-carrying communication system is converted into a constraint Markov decision problem, and the constraint Markov decision problem is converted into the unconstrained Markov decision problem by using Lagrangian dual theory.
The specific implementation manner of the second step is as follows:
the decision maker minimizes the total power of transmission at the transmitting end while meeting the energy requirements and data rate requirements received by each user at the receiving end. The resource allocation problem with user quality of service constraints is denoted as a constrained markov decision problem, which provides a corresponding resource allocation policy for each state. Next, the state space, the action space, the targets, and the constraints of the system will be described separately.
1) State space: to characterize the energy and signal received by the user, we define the state space as:
s=(SINRk,t,k=0,1,...K,t=0,1,...T)∈S=SINR (8)
wherein the state set SINR is a finite set belonging to the signal-to-interference ratio.
2) An action space: the transmitter minimizes the total power of transmission by controlling power allocation and subcarrier mapping, and the receiver by controlling the ratio of time slots allocated for information decoding and energy collection. Thus, the motion space is:
wherein,and PPDMARespectively, the slot ratio vector and the power allocation matrix that all user receivers allocate to the decoding of the information. In addition, the first and second substrates are,GPDMA∈G,PPDMAe P is discrete in the system and the sets of a, G, P belong to a finite set of slot ratios, subcarrier mappings and power allocations, respectively, allocated to information decoding by all receivers.
3) Targets and constraints: the goal is to find the optimal strategy pi such that the total power transmitted, P, at the transmitting endtotalMinimum; the constraints are to meet minimum energy and data rate requirements per user. This resource allocation problem can be translated into a constrained markov decision process, i.e., P1:
the problem is that the total transmission power of a transmitting end is minimized by adopting a strategy pi to adaptively adjust the time slot ratio allocated to information decoding by all receivers, the subcarrier mapping and the power allocation of the transmitting end while meeting the service quality constraint of each user. In order to solve the problem of constrained Markov, the Lagrangian dual theory converts the constrained Markov problem into an unconstrained Markov process. The generalized Lagrangian function will be introduced below:
wherein λ ═ { λ ═ λ1,λ2,λ3,...,λt=T}、μ={μ1,μ2,μ3,...,μt=TIs a set of Lagrangian operators and the element λ1,λ2,λ3,...,λt=TAnd mu1,μ2,μ3,...,μt=TThe lagrange multipliers respectively correspond to the constraints of the energy harvested and the received data rate for each user. Considering L (λ, μ, Π) as a function of λ and μ, defined as:
the value of θ (Π) is P when the receiver satisfies the user quality of service constrainttotal. When the constraint is not satisfied, two groups of Lagrangian operators are positive and infinite, and the value of theta (pi) tends to be infinite, so that the function has no solution. Thus, the θ (Π) function can be described as:
thus, the constrained markov decision process can be relaxed to an unconstrained markov decision process, i.e.:
wherein,andadditionally, pi*Is the optimal strategy. Thus, the optimal resource allocation strategy translates into a saddle point solving the function L (Π, λ, μ). Namely, (II)*,λ*,μ*) It should satisfy:
L(Π,λ*,μ*)≥L(Π*,λ*,μ*)≥L(Π*,λ,μ) (21)
since the channel transition probability is difficult to estimate, a Q learning algorithm is proposed to solve the optimal solution of the unconstrained markov decision process.
And thirdly, acquiring an optimal strategy of resource allocation based on a constraint Markov decision process in a wireless energy-carrying communication scene of a mode division multiple access technology by using a reinforcement learning method.
The specific implementation manner of the third step is as follows:
the reinforcement learning algorithm is widely applied to learning of an optimal control strategy of a model-free MDP problem, which means that environmental models such as channel conversion do not need to be considered. Therefore, the Q learning algorithm in reinforcement learning is proposed to solve the above resource allocation problem. The Q value calculation formula, the update formula, the epsilon-greedy strategy and the reward function of the Q learning algorithm will be given below respectively. For policy π, the Q value calculation formula when action a is performed at state s is:
Qπ(s,a)=Eπ[rk+1+γQπ(sk+1,ak+1)|sk=s,ak=a] (22)
wherein r isk+1And γ are the prize and bonus discount coefficients obtained at time k + 1, respectively. In the Q learning algorithm, the update formula of the Q value is:
wherein 0 < ρ < 1 is the learning rate. At state s, action a is chosen according to the strategy of ε -greedy, in order to make the best decision overall. Thus, the selection of actions follows:
wherein the-U (A) function randomly chooses any motion within the uniform motion space. To directly reflect the reward function of a target value, it is defined as:
in addition, the lagrange multiplier is calculated and updated using a secondary gradient method. After the Q value is calculated and updated, the control strategy for the problem (P2) can be described as:
where Q is*(s, a) is the Q value given following the optimal strategy for state s and action a.
Example (b):
the diagrams provided in the following examples and the setting of specific parameter values in the models are mainly for explaining the basic idea of the present invention and performing simulation verification on the present invention, and can be appropriately adjusted according to the actual scene and requirements in the specific application environment.
The invention relates to a wireless energy-carrying communication scene oriented to a mode division multiple access technology, wherein a transmitter and a receiver are both provided with a single antenna. The effectiveness of the proposed method is demonstrated by simulation: (1) the convergence performance of the algorithm under different learning rates is compared; (2) the total transmission power of the transmitting end varies with different algorithms as the receiving energy requirement of the user varies. Here, the proposed constrained markov process based Q learning algorithm is compared with the genetic algorithm based DBN algorithm; (3) the total transmission power at the transmitting end varies with different algorithms as the data rate requirements of the users vary. Here, the proposed constrained markov process based Q learning algorithm is compared with the genetic algorithm based DBN algorithm; (4) with the change of the number of users at the receiving end, the minimum total transmission power at the transmitting end changes with the difference of the requirements of the service quality of the users.
In the simulation, we assume that all users are distributed within a circle with a radius of 300meters, d, centered at the base stationkAre randomly generated in (0meters,300 meters). The path loss coefficient β is assumed to be 3.76. To meet the energy requirement of the receiving end, the power conversion efficiency of the energy harvesting receiver is assumed to be equal to 30%. In addition, assume that the maximum available transmit power and noise values are set to P30% and σ, respectively20.01 w. To learn the Q value, an action set satisfying the constraints (13), (14), and (15) is set. Thus, the state space is a limited set of corresponding motion spaces. In addition, other parameters are set as: k is a radical ofmax2500,. epsilon. 0.1 and. gamma. 0.8. In the simulation process, three performance indexes are: total power transmitted at the transmitting end, energy harvested at the receiving end, and data rate. The advantages and disadvantages of the resource allocation strategy are characterized by performance indicators.
As shown in fig. 2, the convergence of the total transmission power at different learning rates was studied, and suitable algorithmic learning rates were determined, where ρ was set to 0.4, 0.5, and 0.6, respectively. In addition, the number of users and the number of subcarriers are set to 2. Furthermore, the acquired energy constraint and the data rate constraint are set to E, respectivelyreq0.1w and Rreq=1Mbit/And s. It can be observed that the total power of transmission converges to 0.35w at different learning rates. Obviously, the convergence speed and stability are different at different learning rates. In consideration of two factors, the convergence speed and the stability, a learning rate of 0.6 is adopted. Since the algorithm adopts a greedy strategy, the total transmission power of the resource allocation scheme based on the constrained Markov process will slightly change with the increase of the iteration number, but the overall trend of the total transmission power is not affected.
As shown in fig. 3 and 4, the effectiveness of the algorithm was studied, which reflects the proposed performance comparison of the Q-learning based algorithm and the DBN algorithm at different user quality of service. In the simulation parameter setting, the number of receiving end users is set to 3. The results show that the algorithm is efficient and can significantly reduce the total transmission power.
Finally, fig. 5 shows the minimum total transmission power at the transmitting end estimated by the proposed Q learning algorithm under the constraints of different users and different user qos, where the number of users at the receiver is set to 2, 3 and 4, respectively. As shown in fig. 5, it is observed that the total power of transmission at the transmitting end tends to increase due to an increase in the quality of service of the user. In addition, as the number of users increases, the increasing trend of the minimum total transmission power of the transmitting end is gradually shown. The above results verify the effectiveness and reasonability of the algorithm.
Claims (2)
1. A resource allocation method of wireless energy-carrying communication technology is characterized by comprising the following steps:
step one, making a constraint Markov decision process:
describing a resource allocation problem in a wireless energy-carrying communication scene facing a mode division multiple access technology as a constraint Markov decision process, and converting the problem into an unconstrained Markov decision process by using a Lagrangian dual method;
step two, solving the unconstrained Markov decision process in the step one by using a reinforcement learning method to finally obtain an optimal resource allocation strategy; the strategy aims to minimize the total transmission power of a transmitting end on the premise of meeting the service quality of each user at a receiving end;
the wireless energy-carrying downlink communication scene is constructed into a system model, wherein the system model specifically comprises the following components: a base station carries out wireless transmission of data and energy to T users in a specific area through K subcarriers, wherein a transmitting end adopts superposition coding, a receiving end adopts a serial interference elimination technology, and the base station of the transmitting end and the users of the receiving end are matched with a single antenna; users are randomly distributed in a circle with the base station as the center and the radius r;
the first step is specifically as follows:
1) according to the system model, defining a state space and an action space of the system:
the state space of the system is specifically as follows:
s=(SINRk,t,k=0,1,...K,t=0,1,...T)∈S=SINR (1),
wherein, the SINRk,tThe SINR is the SINR when the kth subcarrier is loaded to the tth user, and the state set SINR belongs to a limited set of SINRs;
the action space of the system is specifically as follows:
wherein,is a vector of transmission time ratios, P, assigned to the decoding of the information by T usersPDMAIs a power distribution matrix, GPDMAIs a sub-carrier mapping matrix, and the alpha and G, P sets respectively belong to the limited sets of time slot ratio, sub-carrier mapping and power allocation allocated to information decoding by all receivers in the systemGPDMA∈G,PPDMAE is discrete; (ii) a
2) The constrained markov decision process is detailed as follows:
wherein, PtotalIs the total power of transmission at the transmitting end; equations (4) and (5) represent the constraints on the quality of service for each user, i.e. the energy E received by each usertAnd a data rate RtAre required to respectively satisfy the minimum energy requirement EreqAnd a data rate requirement Rreq(ii) a The Markov decision process is described as being through an adjustment actionGPDMA,PPDMAMinimizing the total power of transmission at the transmitting end under the constraint of satisfying the service quality of each user;
the markov decision process can be relaxed to an unconstrained markov process, i.e.:
wherein,are respectivelyTwo sets of lagrangian operators; II type*The method comprises the following steps that an optimal resource allocation strategy is obtained, and the optimal resource allocation strategy is converted into saddle points of a solving function L (lambda, mu, pi); the policy II represents a resource allocation policy of the system, EiAnd RiRepresenting the energy and information rate received by a user i when the current system adopts a resource allocation strategy II; λ ═ λ1,λ2,λ3,...,λt=T}、μ={μ1,μ2,μ3,...,μt=TIs the set of Lagrangian operators, element λ1,λ2,λ3,...,λt=TAnd mu1,μ2,μ3,...,μt=TLagrange multipliers respectively correspond to the constraints of the energy harvested and the received data rate for each user; l (λ, μ, Π) is an unconstrained markov resource allocation problem.
2. The method as claimed in claim 1, wherein the updating formula of the Q value in the reinforcement learning in the second step is as follows:
wherein r isk+1Gamma and rho < 1 < 0 are respectively the reward obtained at the moment of k +1, the reward discount coefficient and the learning rate;
the optimum function is expressed as follows:
wherein Q is*(s, a) is the Q value given when the optimal policy is followed for state s and action a.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010113438.6A CN111212438B (en) | 2020-02-24 | 2020-02-24 | Resource allocation method of wireless energy-carrying communication technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010113438.6A CN111212438B (en) | 2020-02-24 | 2020-02-24 | Resource allocation method of wireless energy-carrying communication technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111212438A CN111212438A (en) | 2020-05-29 |
CN111212438B true CN111212438B (en) | 2021-07-16 |
Family
ID=70789128
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010113438.6A Active CN111212438B (en) | 2020-02-24 | 2020-02-24 | Resource allocation method of wireless energy-carrying communication technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111212438B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113542124B (en) * | 2021-06-25 | 2022-12-09 | 西安交通大学 | Credit-driven cooperative transmission method in D2D cache network |
CN113938917A (en) * | 2021-08-30 | 2022-01-14 | 北京工业大学 | Heterogeneous B5G/RFID intelligent resource distribution system applied to industrial Internet of things |
TWI812371B (en) * | 2022-07-28 | 2023-08-11 | 國立成功大學 | Resource allocation method in downlink pattern division multiple access system based on artificial intelligence |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105407535A (en) * | 2015-10-22 | 2016-03-16 | 东南大学 | High energy efficiency resource optimization method based on constrained Markov decision process |
CN110113179A (en) * | 2019-02-22 | 2019-08-09 | 华南理工大学 | A kind of resource allocation methods for taking energy NOMA system based on deep learning |
CN110602730A (en) * | 2019-09-19 | 2019-12-20 | 重庆邮电大学 | Resource allocation method of NOMA (non-orthogonal multiple access) heterogeneous network based on wireless energy carrying |
-
2020
- 2020-02-24 CN CN202010113438.6A patent/CN111212438B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105407535A (en) * | 2015-10-22 | 2016-03-16 | 东南大学 | High energy efficiency resource optimization method based on constrained Markov decision process |
CN110113179A (en) * | 2019-02-22 | 2019-08-09 | 华南理工大学 | A kind of resource allocation methods for taking energy NOMA system based on deep learning |
CN110602730A (en) * | 2019-09-19 | 2019-12-20 | 重庆邮电大学 | Resource allocation method of NOMA (non-orthogonal multiple access) heterogeneous network based on wireless energy carrying |
Non-Patent Citations (2)
Title |
---|
A Deep Learning-Based Approach to Power Minimization in Multi-Carrier NOMA With SWIPT;JINGCI LUO et al.;《IEEE Access》;20190214;摘要、正文第Ⅰ-Ⅲ部分 * |
Learning-Aided Resource Allocation for Pattern Division Multiple Access Based SWIPT Systems;Lixin Li et al.;《IEEE Wireless Communications Letters(Early Access)》;20200910;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111212438A (en) | 2020-05-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111212438B (en) | Resource allocation method of wireless energy-carrying communication technology | |
CN104703270B (en) | User's access suitable for isomery wireless cellular network and power distribution method | |
CN111446992B (en) | Method for allocating resources with maximized minimum energy efficiency in wireless power supply large-scale MIMO network | |
CN109831808B (en) | Resource allocation method of hybrid power supply C-RAN based on machine learning | |
CN110113179A (en) | A kind of resource allocation methods for taking energy NOMA system based on deep learning | |
CN107135544A (en) | A kind of efficiency resource allocation methods updated based on interference dynamic | |
CN110418360B (en) | Multi-user subcarrier bit joint distribution method for wireless energy-carrying network | |
CN109039504A (en) | Cognitive radio efficiency power distribution method based on non-orthogonal multiple access | |
CN113825159A (en) | Wireless energy-carrying communication system robust resource allocation method based on intelligent reflector | |
CN110677175B (en) | Sub-channel scheduling and power distribution joint optimization method | |
CN109661034B (en) | Antenna selection and resource allocation method in wireless energy supply communication network | |
CN108462996A (en) | A kind of non-orthogonal multiple network resource allocation method | |
CN106160991B (en) | A kind of distributed dynamic resource allocation methods for cognitive radio ofdm system | |
CN106851726A (en) | A kind of cross-layer resource allocation method based on minimum speed limit constraint | |
Li et al. | Learning-aided resource allocation for pattern division multiple access-based SWIPT systems | |
CN112702792B (en) | Wireless energy-carrying network uplink and downlink resource joint allocation method based on GFDM | |
CN102256301A (en) | User selection method for simultaneously meeting unicast and multicast services | |
CN110061826B (en) | Resource allocation method for maximizing energy efficiency of multi-carrier distributed antenna system | |
CN107733488B (en) | Water injection power distribution improvement method and system in large-scale MIMO system | |
CN111246560B (en) | Wireless energy-carrying communication time slot and power joint optimization method | |
CN116321186A (en) | IRS (inter-range request System) auxiliary cognition SWIPT (SWIPT) system maximum and rate resource optimization method | |
CN113194542B (en) | Power distribution method of non-circular signal assisted NOMA system | |
CN111010697B (en) | Multi-antenna system power optimization method based on wireless energy carrying technology | |
CN109413731A (en) | A kind of information transmission of wireless energy supply and method of reseptance | |
CN115767703A (en) | Long-term power control method for SWIPT-assisted de-cellular large-scale MIMO network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |