CN115052355B - Network-assisted full duplex mode optimization method under mass terminals URLLC - Google Patents
Network-assisted full duplex mode optimization method under mass terminals URLLC Download PDFInfo
- Publication number
- CN115052355B CN115052355B CN202210649515.9A CN202210649515A CN115052355B CN 115052355 B CN115052355 B CN 115052355B CN 202210649515 A CN202210649515 A CN 202210649515A CN 115052355 B CN115052355 B CN 115052355B
- Authority
- CN
- China
- Prior art keywords
- rau
- downlink
- algorithm
- uplink
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000005457 optimization Methods 0.000 title claims abstract description 13
- 239000003795 chemical substances by application Substances 0.000 claims description 43
- 230000006870 function Effects 0.000 claims description 36
- 239000013598 vector Substances 0.000 claims description 18
- 230000005540 biological transmission Effects 0.000 claims description 9
- 241000282461 Canis lupus Species 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000008447 perception Effects 0.000 claims description 4
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 239000006185 dispersion Substances 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 230000007774 longterm Effects 0.000 claims description 3
- 230000002787 reinforcement Effects 0.000 claims description 3
- 238000009825 accumulation Methods 0.000 claims description 2
- 230000001413 cellular effect Effects 0.000 abstract description 4
- 238000004891 communication Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L5/00—Arrangements affording multiple use of the transmission path
- H04L5/14—Two-way operation using the same type of signal, i.e. duplex
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention relates to a network auxiliary full duplex mode optimization method under a massive terminal URLLC, which aims at the problem of maximizing the resource utilization efficiency of an uplink and a downlink of network auxiliary full duplex under a non-cellular large-scale MIMO scene and provides a Wolf-PHC intelligent algorithm with expandability for distributed operation. Wherein logical scalability is achieved in that each Remote Antenna Unit (RAU) is considered as an agent, and comprises a local processor that can perform its own associated data processing tasks and optimize local performance metrics based on decisions of other RAUs. When a new RAU is added into the system, the computing power of a Central Processing Unit (CPU) is not required to be upgraded, the data information transmitted by all RAUs is not required to be retrained at the CPU, and the distributed execution operation enables the system algorithm to have expandability. The intelligent distributed operation algorithm provided by the invention is more suitable for dynamic scenes of mass terminals URLLC, has lower complexity and has lower storage space requirement than the traditional centralized Q-learning algorithm.
Description
Technical Field
The invention relates to a network-assisted full-duplex honeycomb-free large-scale MIMO scene duplex mode optimization method with expandability, which is applicable to a scene of high-reliability low-delay communication (URLLC) of a mass terminal, and belongs to the technical field of mobile communication.
Background
Full Duplex (FD) technology helps to improve system throughput and reduce system latency, which is important in a high reliability and low latency communication (URLLC) scenario. The Network Assisted Full Duplex (NAFD) technology under the architecture of the non-cellular distributed massive MIMO system is a flexible novel duplex technology, and the system comprises a central processing unit, a plurality of remote antenna units and a plurality of users. Each RAU (remote antenna unit ) can perform uplink reception or downlink transmission, and the specific transmission mode to be selected is determined by the CPU. NAFD can provide low latency services compared to conventional time division duplexing; NAFD can support asymmetric traffic without degrading spectrum utilization compared to conventional frequency division duplexing. The NAFD flexible duplex technology can support URLLC communication of mass terminals, and can reduce collision delay caused by unlicensed access in URLLC and ensure reliability of user access by scheduling uplink or downlink modes of Remote Antenna Units (RAUs) through a CPU, NAFD has no self-interference of RAUs, and can reduce self-interference elimination delay in traditional duplex.
With the explosive growth of the number of mobile terminal users, the problem of system resource utilization rate and reliable and rapid access mechanism brought by mass terminals are yet to be researched. In URLLC scenes of mass terminals, the node mode selection mechanism in NAFD flexible duplex is associated with the action mechanisms of factors such as special short packet transmission, bit error rate constraint and the like in URLLC to be researched. The requirements of massive terminals and actual systems on the algorithm theoretical expandability are to be researched.
Disclosure of Invention
Technical problems: aiming at the problem that the network-assisted full duplex mode optimization technology based on load perception is suitable for massive terminals URLLC to maximize the resource utilization rate of the system, the invention provides a network-assisted full duplex mode optimization method under massive terminals URLLC.
The technical scheme is as follows: the network-assisted full duplex mode optimization method under the mass terminal URLLC provided by the invention comprises the following steps:
the method adopts an intelligent algorithm based on WoLF-PHC to optimize, and comprises the following steps:
step 1: defining a load-aware utility function for each user i:
Wherein U i is a utility function of load perception, which is used for representing the resource utilization rate of the system and used for representing the resource utilization rate of the system; k is the number of allocable total resource blocks of each remote antenna unit RAU, K is the total number of users, n m,i is the number of resource blocks allocated to user i by the mth remote antenna unit RAU, n m,a is the number of resource blocks allocated to user a by the mth RAU, and the sum term is then calculated The sum of the number of resource blocks allocated to all users by the mth remote antenna unit RAU is called RAU m for short, and can be calculated by the following formula:
Wherein, Is the bandwidth required by user i according to its own quality of service QoS, b is the bandwidth occupied by each resource block, gamma i is the signal-to-interference-and-noise ratio SINR of user i, R i is the short packet achievable rate in the URLLC scenario, V (gamma i) is the dispersion of user channels, m is the length of the short packet block, Q -1 (·) is the Q function inverse, epsilon 0 is the error decoding probability DEP, e is the natural logarithm,Meaning round up;
step 2: the optimization objective is to maximize the user's resource utility function based on load awareness:
Wherein, U U,i is the load sensing utility function value of the uplink, U D,j is the load sensing utility function value of the downlink, K u is the number of users in the uplink, and K d is the number of users in the downlink; u and d are the uplink and downlink identifiers, i and j are the ith uplink user and the jth downlink user, respectively; to determine in which mode each remote antenna unit RAU should operate, two binary allocation vectors x u,xd∈{0,1}M×1 are defined, M being the total number of RAUs, the binary allocation vector of the ith uplink RAU or the binary allocation vector of the jth downlink RAU if the RAU is used for uplink or downlink The value is 1, otherwise, the value is 0; the effective load-sensing utility function values for the uplink and downlink can be represented by equations (5) and (6), respectively:
wherein X u=diag(xu),Xd=diag(xd is defined), diag (a) represents the formation of a diagonal matrix with element a; m u is the number of uplink RAUs, M d is the number of downlink RAUs, k U,m is the number of resource blocks available for allocation by uplink RAU M, k D,m is the number of resource blocks available for allocation by downlink RAUm, n m,i is the number of resource blocks allocated to uplink user i when the requirement of QoS of uplink user i is satisfied, n m,j is the number of resource blocks allocated to downlink user j when the requirement of QoS of downlink user j is satisfied by RAUm;
Step 3: and optimizing the resource utility function by using an intelligent algorithm, and storing a final state set and rewards of the algorithm as an optimal RAU duplex mode and maximized resource utilization efficiency.
Wherein:
The intelligent algorithm based on WoLF-PHC is as follows:
WoLF means a carefully slow adjustment parameter when the agent is doing better than the expected value, and a fast pace adjustment parameter when the agent is doing worse than the expected value;
PHC is a learning algorithm of a single agent under a stable environment, the core of the algorithm is the idea of normal reinforcement learning, the selection probability of the action capable of obtaining the maximum accumulation expected is increased, and the algorithm can be converged to an optimal strategy;
The WoLF-PHC intelligent algorithm is an expandability algorithm suitable for distributed execution of multiple intelligent agents, and combines the WoLF and PHC algorithms, so that rewards obtained by the intelligent agents can be quickly adjusted to adapt to strategy changes of other intelligent agents when the rewards are worse than expected, and the rewards are carefully learned when the rewards are better than expected, so that the time for adapting to the strategy changes of the other intelligent agents can be shortened; the WoLF-PHC algorithm can converge to a Nash equilibrium strategy, and when other intelligent agents adopt a certain fixed strategy, the WoLF-PHC algorithm can also converge to an optimal strategy under the current condition instead of converging to a Nash equilibrium strategy with poor possible effect; the WoLF-PHC algorithm does not need to observe strategies, actions and rewarding values of other agents, less space is needed to record the Q value, and the WoLF-PHC algorithm carries out learning improvement strategy through the PHC algorithm, so that no linear programming or quadratic programming is needed to solve Nash equilibrium, and the algorithm speed is improved; in the context of mass terminals URLLC, distributed operations may make the algorithm logically scalable.
The Wolf-PHC algorithm comprises an average estimation strategyThe update of (c) follows the following equation:
Where pi i(s,ai) is the policy to be taken under a particular state-action pair, C(s) is the number of times that state s occurs, pi i(s,ai) is updated as follows:
wherein Q (s, a) represents a cost function obtained by taking action a in state s, and Q value is updated to update formula reference formula (8); For increment or decrement of update policy, when the currently selected action a i is not the action to maximize Q value, decrement is used Updating, if the currently selected action a i is the action to maximize Q, by deltaUpdating; the value of δ sa in turn depends on the estimation strategy pi i(s,ai) and the equationDelta is an updated auxiliary parameter, and A i is the size of the action space, and the specific value of delta is shown as (5); delta w is the positive updating auxiliary parameter adopted when the rewards obtained by the intelligent agent are better than expected, delta l is the negative updating auxiliary parameter adopted when the rewards obtained by the intelligent agent are worse than expected;
In the intelligent algorithm based on Wolf-PHC, each remote antenna unit RAU in the system is independently regarded as an intelligent body, data detection and node mode selection are carried out locally, and the data detection and node mode selection do not need to be uploaded to a central processing center CPU for centralized calculation; for each agent, the state space has only two states s t={s1,s2},s1 to indicate that the RAU working mode is uplink receiving, s 2 indicates that the RAU working mode is downlink transmitting, the action space is set to have only two actions a t={a1,a2},a1 to indicate that the RAU changes the original working mode, a 2 indicates that the RAU keeps the original working mode unchanged, so that the size of the Q table is 2×2, if the total number of RAUs is M, the total space for storing Q values only needs m×2×2, which is far smaller than the storage space of the Q table of 2 M ×m required for centralized processing uploaded to the CPU, and the complexity is lower.
The RAU is received in the uplink, and the following formula is awarded:
when the RAU is downlink, the following formula is awarded:
the Q value is updated according to the following formula:
Where α is the learning rate, s t and a t are the state and action at time t, respectively, and the reward R t+1 is the feedback from the environment obtained after the agent took action a t in state s t at time t, the discount factor γ defining the importance of the future reward, a value of 0 meaning that only short term rewards are considered, a value of 1 more emphasizes long term rewards.
The specific steps of the WoLF-PHC algorithm are as follows:
Step 1, M is the total number of RAU, M Q tables with the size of 2 multiplied by 2 and all zero values are generated, and initialization is carried out Wherein,Representing the channel vectors between all downlink RAUs and the downlink user j receiving the signal,G i,j denotes a channel vector between the i-th uplink user and the j-th downlink user for a downlink precoding vector,The method comprises the steps of representing channels between an ith uplink user and all uplink RAUs, wherein G I is a real interference channel matrix between a downlink RAUs and an uplink RAUs, initializing a learning rate alpha and an attenuation factor gamma, and initializing a positive update auxiliary parameter delta w and a negative update auxiliary parameter delta l; initialization strategyInitializing average estimation strategyThe |a i | is the size of the motion space, and the number of times C(s) =0 that the initialization motion s occurs;
step 2, if the state of the RAU is uplink reception at this time, calculating the rewards according to the formula (6) after selecting the action according to the strategy, if the state of the RAU is downlink transmission at this time, calculating the rewards according to the formula (7) after selecting the action according to the strategy,
Step 3, the current state jumps to the next state according to the selected action;
step 4, updating the Q values of the exterior and the interior of the Q according to the step 8;
step 5, for each action, updating the average estimation strategy according to formula (1);
step 6, updating the strategy according to the Q value and each action and the formula (2-5);
step 7, returning to the step 2 to perform learning training until the strategy and the values of the Q surfaces and the Q insides converge;
And 8, returning the state and rewards of the optimal solution of each agent, and corresponding to the uplink and downlink modes of each RAU and the maximum user resource utility function value.
The beneficial effects are that: the invention provides a network-assisted full duplex non-honeycomb large-scale MIMO scene with expandability duplex mode optimization method suitable for a high-reliability low-delay communication (URLLC) scene of a massive terminal, and provides a WoLF-PHC intelligent algorithm with strong expandability for multi-agent distributed operation aiming at the problem of maximizing the resource utilization efficiency of a network-assisted full duplex uplink and downlink in the non-honeycomb large-scale MIMO scene. Wherein logical scalability is achieved in that each RAU is considered as an agent, comprising a local processor, which can perform its own associated data processing tasks and optimize local performance metrics based on decisions of other RAUs. When a new Remote Antenna Unit (RAU) is added to the system, the computing power of the CPU does not need to be upgraded, the data information transmitted by all RAUs does not need to be retrained at the CPU, and the distributed execution operation enables the system algorithm to be extensible. The intelligent distributed operation algorithm provided by the invention is more suitable for dynamic scenes of mass terminals URLLC, has lower complexity and has lower storage space requirement than the traditional centralized Q-learning algorithm.
Drawings
The drawing is a scene built in an example problem, and a resource utility function comparison drawing based on a Wolf-PHC algorithm and other algorithms is provided.
Fig. 1 is a position distribution diagram of uniformly distributed RAUs and randomly distributed uplink and downlink users;
FIG. 2 is a graph comparing CDFs of resource utility functions under different algorithms.
Fig. 3 is a schematic flow chart of the present invention.
Detailed Description
The invention is described in detail below with reference to examples:
Assuming a cell-free massive MIMO scenario within a circle, the M RAUs are evenly distributed within a circle with a radius of 600M. The system comprises K randomly distributed users, including K u uplink users and K d downlink users. The K uplink and downlink users are randomly distributed in a circle with a radius of 1000 m. Let us assume that m= 6,K u =20 and kd=20, a specific scene distribution diagram is shown in fig. 1 of the drawings of the specification. The power of the noise is set to-90 dbm, the transmission power of the uplink is 30dbm, the transmission power of the downlink is 23dbm, and the path loss is 128.1+37.6log10 (d).
The implementation method of the invention in the system is as follows:
(1) Defining a load-aware utility function for each user i:
Wherein U i is a load-aware utility function used to characterize the resource utilization of the system. K is the number of allocable total resource blocks each Remote Antenna Unit (RAU) has and K is the total number of users. n m,i is the number of resource blocks allocated to user i by RAU m, which can be calculated by the following equation:
Wherein, Is the bandwidth used by user i according to its quality of service (QoS). b is the bandwidth occupied by each resource block, gamma i is the signal-to-interference-and-noise ratio SINR of user i, R i is the short packet achievable rate in the URLLC scenario, V (gamma i) is the dispersion of user channels, m is the length of the short packet block, Q -1 (·) is the Q function inverse, epsilon 0 is the error decoding probability (DEP), and e is the natural logarithm.Meaning rounded up. According to the property of the logarithmic function, when the whole network is overloaded, the user can preferentially select RAUs with smaller quantity of resource blocks to be allocated to the user on the premise of ensuring that the service quality of the user can be met, so that the utilization rate of the whole resource blocks of the system is improved. As the overall load of the network increases, the value of the user's load-aware utility function decreases. Furthermore, as the number of available allocated resource blocks owned by the RAU increases, the value of the load-aware utility function also increases. If the RAU cannot guarantee to meet the quality of service of the user i, the RAU will not provide the resource block for the user i at this time, and U i =0. U i, as a load-aware utility function, can therefore be used to characterize the resource utilization of the system.
(2) The optimization objective is to maximize the user's resource utility function based on load awareness:
Wherein, U U,i is the load sensing utility function value of the uplink, U D,j is the load sensing utility function value of the downlink, K u is the number of users in the uplink, and K d is the number of users in the downlink; u and d are the uplink and downlink identifiers, i and j are the ith uplink user and the jth downlink user, respectively; to determine in which mode each remote antenna unit RAU (remote antennaunit) should operate, two binary allocation vectors x u,xd∈{0,1}M×1, M are defined as the total number of RAUs, the binary allocation vector of the ith uplink RAU or the binary allocation vector of the jth downlink RAU if the RAU is used for uplink or downlink The value is 1, otherwise, the value is 0; the effective load-sensing utility function values for the uplink and downlink can be represented by equations (5) and (6), respectively:
Wherein X u=diag(xu),Xd=diag(xd is defined), diag (a) represents the formation of a diagonal matrix with element a; m u is the number of uplink RAUs, M d is the number of downlink RAUs, k U,m is the number of resource blocks available for allocation by uplink RAUm, k D,m is the number of resource blocks available for allocation by downlink RAUm, n m,i is the number of resource blocks allocated to uplink user i by RAUm when the QoS requirement of uplink user i is met, and n m,j is the number of resource blocks allocated to downlink user j by RAUm when the QoS requirement of downlink user j is met.
(3) And optimizing the resource utility function by using an intelligent algorithm, and storing a final state set and rewards of the algorithm as an optimal RAU duplex mode and maximized resource utilization efficiency.
In order to realize a network-assisted full duplex mode optimization technology based on load perception under a scene suitable for mass terminals URLLC, an intelligent algorithm based on Wolf-PHC q-learning is provided:
WoLF refers to a carefully slow adjustment parameter when the agent is doing better than the desired value, and a fast pace adjustment parameter when the agent is doing worse than the desired value.
PHC is a learning algorithm of single agent under stable environment. The core of the algorithm is the idea of general reinforcement learning, which increases the probability of choosing the action that can get the maximum cumulative expectation. The algorithm has rationality and can converge to an optimal strategy.
The Wolf-PHC intelligent algorithm is an extensible algorithm suitable for distributed execution of multiple agents. The WoLF algorithm and the PHC algorithm are combined, so that rewards obtained by the agents can be quickly adjusted to adapt to policy changes of other agents when the rewards are worse than expected, and the rewards are carefully learned when the rewards are better than expected, so that the other agents are given time for adapting to policy changes. And the WoLF-PHC algorithm can converge to a Nash equilibrium strategy, and when other agents adopt a certain fixed strategy, the WoLF-PHC algorithm can also converge to an optimal strategy under the current condition instead of converging to a Nash equilibrium strategy with possibly poor effect. The WoLF-PHC algorithm does not need to observe strategies, actions and rewarding values of other agents, less space is needed to record the Q value, and the WoLF-PHC algorithm learns and improves the strategies through the PHC algorithm, so that no linear programming or quadratic programming is needed to solve Nash equilibrium, and the algorithm speed is improved. In the context of mass terminals URLLC, distributed operations may make the algorithm logically scalable. Wherein the average estimation strategyThe update of (c) follows the following equation:
Where pi i(s,ai) is the policy to be taken under a particular state-action pair, and C(s) is the number of times state s occurs. Pi i(s,ai) is updated as follows:
Where Q (s, a) represents the cost function resulting from action a being taken in state s, updating formula reference (14).
In the Wolf-PHC algorithm, each RAU in the system is regarded as an agent independently, data detection and node mode selection are performed locally, and the RAU does not need to be uploaded to a central processing Center (CPU) for centralized calculation. For each agent, the state space has only two states s t={s1,s2},s1 to indicate that the RAU working mode is uplink receiving, s 2 indicates that the RAU working mode is downlink transmitting, the action space is set to have only two actions a t={a1,a2},a1 to indicate that the RAU changes the original working mode, a 2 indicates that the RAU keeps the original working mode unchanged, so that the size of the Q table is 2×2, if the total number of RAUs is M, the total space for storing Q values only needs m×2×2, which is far smaller than the storage space of the Q table of 2 M ×m required for centralized processing uploaded to the CPU, and the complexity is lower. To reward when the RAU is in the uplink reception mode, the following formula is:
When the RAU is in the downlink transmission mode, the following equation is awarded:
the update formula of the Q value is as follows:
Where α is the learning rate, s t and a t are the state and action at time t, respectively, and the reward R t+1 is the feedback from the environment obtained after the agent took action a t in state s t at time t, the discount factor γ defining the importance of the future reward, a value of 0 meaning that only short term rewards are considered, a value of 1 more emphasizes long term rewards. The specific steps of the algorithm are as follows:
① M is the total number of RAUs, M Q tables with the size of 2 multiplied by 2 and all zero values are generated, and initialization is carried out Wherein,Representing the channel vectors between all downlink RAUs and the downlink user j receiving the signal,G i,j denotes a channel vector between the i-th uplink user and the j-th downlink user for a downlink precoding vector,Representing the channel between the ith uplink user and all uplink RAUs, G I is the real interference channel matrix between the downlink RAUs and the uplink RAUs, and the initialization strategyInitializing average estimation strategyThe |a i | is the size of the motion space, and C(s) =0 is initialized;
② If the state of the RAU is uplink reception at this time, the prize is calculated according to the formula (12) after the policy selection action, if the state of the RAU is downlink transmission at this time, the prize is calculated according to the formula (13) after the policy selection action,
③ The current state jumps to the next state according to the selected action;
④ Updating the Q values of the Q surfaces and the Q surfaces according to the formula (14);
⑤ For each action, updating the average estimation strategy according to equation (7);
⑥ Updating the strategy according to the Q value and each action and the formula (8-11);
⑦ Returning to the step ② to perform learning training until the strategy and the values in the Q surface are converged;
⑧ And returning the state and rewards of the optimal solution of each agent, and corresponding to the uplink and downlink modes of each RAU and the maximum user resource utility function value.
Fig. 2 shows that the network-assisted full duplex non-cellular massive MIMO scenario with scalability under the high reliability and low delay communication (URLLC) scenario for massive terminals provided by the invention is higher than the fixed-mode scenario with the equal-division uplink and downlink RAU scheme and the time-division full duplex TDD scheme, and is close to the exhaustion algorithm with optimal performance theory, because the performance is slightly lower than the exhaustion algorithm on the network system level due to convergence to the nash equalization strategy in part of cases, the complexity of the algorithm provided by the invention is much lower than that of the exhaustion method, and compared with the centralized Q-learning scenario, the network-assisted full duplex non-cellular MIMO scenario with scalability has smaller calculation storage space and higher scalability, and is more suitable for the high reliability and low delay communication (URLLC) scenario for massive terminals.
Claims (6)
1. A network auxiliary full duplex mode optimization method under a mass terminal URLLC is characterized in that: the method adopts an intelligent algorithm based on WoLF-PHC to optimize, and comprises the following steps:
step 1: defining a load-aware utility function for each user i:
Wherein U i is a utility function of load perception, which is used for representing the resource utilization rate of the system and used for representing the resource utilization rate of the system; k is the number of allocable total resource blocks of each remote antenna unit RAU, K is the total number of users, n m,i is the number of resource blocks allocated to user i by the mth remote antenna unit RAU, n m,a is the number of resource blocks allocated to user a by the mth RAU, and the sum term is then calculated The sum of the number of resource blocks allocated to all users by the mth remote antenna unit RAU is called RAU m for short, and can be calculated by the following formula:
Wherein, Is the bandwidth required by user i according to its own quality of service QoS, b is the bandwidth occupied by each resource block, gamma i is the signal-to-interference-and-noise ratio SINR of user i, R i is the short packet achievable rate in the URLLC scenario, V (gamma i) is the dispersion of user channels, m is the length of the short packet block, Q -1 (·) is the Q function inverse, epsilon 0 is the error decoding probability DEP, e is the natural logarithm,Meaning round up;
step 2: the optimization objective is to maximize the user's resource utility function based on load awareness:
Wherein, U U,i is the load sensing utility function value of the uplink, U D,j is the load sensing utility function value of the downlink, K u is the number of users in the uplink, and K d is the number of users in the downlink; u and d are the uplink and downlink identifiers, i and j are the ith uplink user and the jth downlink user, respectively; to determine in which mode each remote antenna unit RAU should operate, two binary allocation vectors x u,xd∈{0,1}M×1 are defined, M being the total number of RAUs, the binary allocation vector of the ith uplink RAU or the binary allocation vector of the jth downlink RAU if the RAU is used for uplink or downlink The value is 1, otherwise, the value is 0; the effective load-sensing utility function values for the uplink and downlink can be represented by equations (5) and (6), respectively:
Wherein X u=diag(xu),Xd=diag(xd is defined), diag (a) represents the formation of a diagonal matrix with element a; m u is the number of uplink RAUs, M d is the number of downlink RAUs, k U,m is the number of resource blocks available for allocation by uplink RAUm, k D,m is the number of resource blocks available for allocation by downlink RAUm, n m,i is the number of resource blocks allocated to uplink user i by RAUm if the QoS requirement of uplink user i is met, and n m,j is the number of resource blocks allocated to downlink user j by RAUm if the QoS requirement of downlink user j is met;
Step 3: and optimizing the resource utility function by using an intelligent algorithm, and storing a final state set and rewards of the algorithm as an optimal RAU duplex mode and maximized resource utilization efficiency.
2. The method for optimizing the network-assisted full duplex mode under the mass terminal URLLC according to claim 1, wherein the intelligent algorithm based on WoLF-PHC is:
WoLF means a carefully slow adjustment parameter when the agent is doing better than the expected value, and a fast pace adjustment parameter when the agent is doing worse than the expected value;
PHC is a learning algorithm of a single agent under a stable environment, the core of the algorithm is the idea of normal reinforcement learning, the selection probability of the action capable of obtaining the maximum accumulation expected is increased, and the algorithm can be converged to an optimal strategy;
The WoLF-PHC intelligent algorithm is an expandability algorithm suitable for distributed execution of multiple intelligent agents, and combines the WoLF and PHC algorithms, so that rewards obtained by the intelligent agents can be quickly adjusted to adapt to strategy changes of other intelligent agents when the rewards are worse than expected, and the rewards are carefully learned when the rewards are better than expected, so that the time for adapting to the strategy changes of the other intelligent agents can be shortened; the WoLF-PHC algorithm can converge to a Nash equilibrium strategy, and when other intelligent agents adopt a certain fixed strategy, the WoLF-PHC algorithm can also converge to an optimal strategy under the current condition instead of converging to a Nash equilibrium strategy with poor possible effect; the WoLF-PHC algorithm does not need to observe strategies, actions and rewarding values of other agents, less space is needed to record the Q value, and the WoLF-PHC algorithm carries out learning improvement strategy through the PHC algorithm, so that no linear programming or quadratic programming is needed to solve Nash equilibrium, and the algorithm speed is improved; in the context of mass terminals URLLC, distributed operations may make the algorithm logically scalable.
3. The method for optimizing network-assisted full duplex mode under a massive terminal URLLC according to claim 2, wherein the WoLF-PHC algorithm includes an average estimation strategyThe update of (c) follows the following equation:
Where pi i(s,ai) is the policy to be taken under a particular state-action pair, C(s) is the number of times that state s occurs, pi i(s,ai) is updated as follows:
wherein Q (s, a) represents a cost function obtained by taking action a in state s, and Q value is updated to update formula reference formula (8); For increment or decrement of update policy, when the currently selected action a i is not the action to maximize Q value, decrement is used Updating, if the currently selected action a i is the action to maximize Q, by deltaUpdating; the value of δ sa in turn depends on the estimation strategy pi i(s,ai) and the equationDelta is an updated auxiliary parameter, and A i is the size of the action space, and the specific value of delta is shown as (5); delta w is the positive updating auxiliary parameter adopted when the rewards obtained by the intelligent agent are better than expected, delta l is the negative updating auxiliary parameter adopted when the rewards obtained by the intelligent agent are worse than expected;
In the intelligent algorithm based on Wolf-PHC, each remote antenna unit RAU in the system is independently regarded as an intelligent body, data detection and node mode selection are carried out locally, and the data detection and node mode selection do not need to be uploaded to a central processing center CPU for centralized calculation; for each agent, the state space has only two states s t={s1,s2},s1 to indicate that the RAU working mode is uplink receiving, s 2 indicates that the RAU working mode is downlink transmitting, the action space is set to have only two actions a t={a1,a2},a1 to indicate that the RAU changes the original working mode, a 2 indicates that the RAU keeps the original working mode unchanged, so that the size of the Q table is 2×2, if the total number of RAUs is M, the total space for storing Q values only needs m×2×2, which is far smaller than the storage space of the Q table of 2 M ×m required for centralized processing uploaded to the CPU, and the complexity is lower.
4. The method for optimizing network assisted full duplex mode under a massive terminal URLLC according to claim 3, wherein the RAU is uplink received, and the following formula is awarded:
when the RAU is downlink, the following formula is awarded:
5. the method for optimizing network assisted full duplex mode under a mass terminal URLLC according to claim 3, wherein the Q value is updated according to the following formula:
Where α is the learning rate, s t and a t are the state and action at time t, respectively, and the reward R t+1 is the feedback from the environment obtained after the agent took action a t in state s t at time t, the discount factor γ defining the importance of the future reward, a value of 0 meaning that only short term rewards are considered, a value of 1 more emphasizes long term rewards.
6. The method for optimizing the network-assisted full duplex mode under the mass terminal URLLC according to claim 3, wherein the specific steps of the WoLF-PHC algorithm are as follows:
Step 1, M is the total number of RAU, M Q tables with the size of 2 multiplied by 2 and all zero values are generated, and initialization is carried out Wherein,Representing the channel vectors between all downlink RAUs and the downlink user j receiving the signal,G i,j denotes a channel vector between the i-th uplink user and the j-th downlink user for a downlink precoding vector,The method comprises the steps of representing channels between an ith uplink user and all uplink RAUs, wherein G I is a real interference channel matrix between a downlink RAUs and an uplink RAUs, initializing a learning rate alpha and an attenuation factor gamma, and initializing a positive update auxiliary parameter delta w and a negative update auxiliary parameter delta l; initialization strategyInitializing average estimation strategyThe |a i | is the size of the motion space, and the number of times C(s) =0 that the initialization motion s occurs;
step 2, if the state of the RAU is uplink reception at this time, calculating the rewards according to the formula (6) after selecting the action according to the strategy, if the state of the RAU is downlink transmission at this time, calculating the rewards according to the formula (7) after selecting the action according to the strategy,
Step 3, the current state jumps to the next state according to the selected action;
step 4, updating the Q values of the exterior and the interior of the Q according to the step 8;
step 5, for each action, updating the average estimation strategy according to formula (1);
step 6, updating the strategy according to the Q value and each action and the formula (2-5);
step 7, returning to the step 2 to perform learning training until the strategy and the values of the Q surfaces and the Q insides converge;
And 8, returning the state and rewards of the optimal solution of each agent, and corresponding to the uplink and downlink modes of each RAU and the maximum user resource utility function value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210649515.9A CN115052355B (en) | 2022-06-09 | 2022-06-09 | Network-assisted full duplex mode optimization method under mass terminals URLLC |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210649515.9A CN115052355B (en) | 2022-06-09 | 2022-06-09 | Network-assisted full duplex mode optimization method under mass terminals URLLC |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115052355A CN115052355A (en) | 2022-09-13 |
CN115052355B true CN115052355B (en) | 2024-07-05 |
Family
ID=83161112
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210649515.9A Active CN115052355B (en) | 2022-06-09 | 2022-06-09 | Network-assisted full duplex mode optimization method under mass terminals URLLC |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115052355B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110248402A (en) * | 2018-03-09 | 2019-09-17 | 华为技术有限公司 | A kind of Poewr control method and equipment |
CN110267338A (en) * | 2019-07-08 | 2019-09-20 | 西安电子科技大学 | Federated resource distribution and Poewr control method in a kind of D2D communication |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10452440B1 (en) * | 2016-06-07 | 2019-10-22 | PC Drivers Headquarters, Inc. | Systems and methods of optimized tuning of resources |
ES2975279T3 (en) * | 2017-09-28 | 2024-07-04 | Zte Corp | Method and systems for exchanging messages on a wireless network |
US11012112B2 (en) * | 2018-02-09 | 2021-05-18 | Qualcomm Incorporated | Techniques for flexible resource allocation |
CN114258138B (en) * | 2021-12-20 | 2024-07-05 | 东南大学 | Network-assisted full duplex mode optimization method based on load perception |
-
2022
- 2022-06-09 CN CN202210649515.9A patent/CN115052355B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110248402A (en) * | 2018-03-09 | 2019-09-17 | 华为技术有限公司 | A kind of Poewr control method and equipment |
CN110267338A (en) * | 2019-07-08 | 2019-09-20 | 西安电子科技大学 | Federated resource distribution and Poewr control method in a kind of D2D communication |
Also Published As
Publication number | Publication date |
---|---|
CN115052355A (en) | 2022-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109729528B (en) | D2D resource allocation method based on multi-agent deep reinforcement learning | |
WO2023179010A1 (en) | User packet and resource allocation method and apparatus in noma-mec system | |
CN109962728B (en) | Multi-node joint power control method based on deep reinforcement learning | |
Balevi et al. | A clustering algorithm that maximizes throughput in 5G heterogeneous F-RAN networks | |
CN114867030B (en) | Dual-time scale intelligent wireless access network slicing method | |
CN114727318A (en) | Multi-RIS communication network rate increasing method based on MADDPG | |
Iqbal et al. | Convolutional neural network-based deep Q-network (CNN-DQN) resource management in cloud radio access network | |
CN113573363A (en) | MEC calculation unloading and resource allocation method based on deep reinforcement learning | |
CN111212438B (en) | Resource allocation method of wireless energy-carrying communication technology | |
Liu et al. | Deep reinforcement learning-based MEC offloading and resource allocation in uplink NOMA heterogeneous network | |
Ahsan et al. | Reinforcement learning for user clustering in NOMA-enabled uplink IoT | |
CN115052355B (en) | Network-assisted full duplex mode optimization method under mass terminals URLLC | |
CN115866787A (en) | Network resource allocation method integrating terminal direct transmission communication and multi-access edge calculation | |
Mendoza et al. | Deep reinforcement learning for dynamic access point activation in cell-free MIMO networks | |
Nouruzi et al. | Toward a smart resource allocation policy via artificial intelligence in 6G networks: Centralized or decentralized? | |
CN115086964A (en) | Dynamic spectrum allocation method and system based on multi-dimensional vector space optimization | |
CN114258138B (en) | Network-assisted full duplex mode optimization method based on load perception | |
CN114051205B (en) | Edge optimization method based on reinforcement learning dynamic multi-user wireless communication scene | |
CN106301501B (en) | A kind of instant data transfer optimization method of combined coding modulation | |
CN115633402A (en) | Resource scheduling method for mixed service throughput optimization | |
CN116170052A (en) | Hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method | |
CN115767703A (en) | Long-term power control method for SWIPT-assisted de-cellular large-scale MIMO network | |
CN114980156A (en) | AP switch switching method of large-scale MIMO system without cellular millimeter waves | |
CN107995034A (en) | A kind of dense cellular network energy and business collaboration method | |
CN117793909A (en) | Multi-domain resource joint allocation method and device in honeycomb-free mMIMO network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |