[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115052355B - Network-assisted full duplex mode optimization method under mass terminals URLLC - Google Patents

Network-assisted full duplex mode optimization method under mass terminals URLLC Download PDF

Info

Publication number
CN115052355B
CN115052355B CN202210649515.9A CN202210649515A CN115052355B CN 115052355 B CN115052355 B CN 115052355B CN 202210649515 A CN202210649515 A CN 202210649515A CN 115052355 B CN115052355 B CN 115052355B
Authority
CN
China
Prior art keywords
rau
downlink
algorithm
uplink
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210649515.9A
Other languages
Chinese (zh)
Other versions
CN115052355A (en
Inventor
李佳珉
朱悦
朱鹏程
王东明
尤肖虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210649515.9A priority Critical patent/CN115052355B/en
Publication of CN115052355A publication Critical patent/CN115052355A/en
Application granted granted Critical
Publication of CN115052355B publication Critical patent/CN115052355B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/14Two-way operation using the same type of signal, i.e. duplex
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to a network auxiliary full duplex mode optimization method under a massive terminal URLLC, which aims at the problem of maximizing the resource utilization efficiency of an uplink and a downlink of network auxiliary full duplex under a non-cellular large-scale MIMO scene and provides a Wolf-PHC intelligent algorithm with expandability for distributed operation. Wherein logical scalability is achieved in that each Remote Antenna Unit (RAU) is considered as an agent, and comprises a local processor that can perform its own associated data processing tasks and optimize local performance metrics based on decisions of other RAUs. When a new RAU is added into the system, the computing power of a Central Processing Unit (CPU) is not required to be upgraded, the data information transmitted by all RAUs is not required to be retrained at the CPU, and the distributed execution operation enables the system algorithm to have expandability. The intelligent distributed operation algorithm provided by the invention is more suitable for dynamic scenes of mass terminals URLLC, has lower complexity and has lower storage space requirement than the traditional centralized Q-learning algorithm.

Description

Network-assisted full duplex mode optimization method under mass terminals URLLC
Technical Field
The invention relates to a network-assisted full-duplex honeycomb-free large-scale MIMO scene duplex mode optimization method with expandability, which is applicable to a scene of high-reliability low-delay communication (URLLC) of a mass terminal, and belongs to the technical field of mobile communication.
Background
Full Duplex (FD) technology helps to improve system throughput and reduce system latency, which is important in a high reliability and low latency communication (URLLC) scenario. The Network Assisted Full Duplex (NAFD) technology under the architecture of the non-cellular distributed massive MIMO system is a flexible novel duplex technology, and the system comprises a central processing unit, a plurality of remote antenna units and a plurality of users. Each RAU (remote antenna unit ) can perform uplink reception or downlink transmission, and the specific transmission mode to be selected is determined by the CPU. NAFD can provide low latency services compared to conventional time division duplexing; NAFD can support asymmetric traffic without degrading spectrum utilization compared to conventional frequency division duplexing. The NAFD flexible duplex technology can support URLLC communication of mass terminals, and can reduce collision delay caused by unlicensed access in URLLC and ensure reliability of user access by scheduling uplink or downlink modes of Remote Antenna Units (RAUs) through a CPU, NAFD has no self-interference of RAUs, and can reduce self-interference elimination delay in traditional duplex.
With the explosive growth of the number of mobile terminal users, the problem of system resource utilization rate and reliable and rapid access mechanism brought by mass terminals are yet to be researched. In URLLC scenes of mass terminals, the node mode selection mechanism in NAFD flexible duplex is associated with the action mechanisms of factors such as special short packet transmission, bit error rate constraint and the like in URLLC to be researched. The requirements of massive terminals and actual systems on the algorithm theoretical expandability are to be researched.
Disclosure of Invention
Technical problems: aiming at the problem that the network-assisted full duplex mode optimization technology based on load perception is suitable for massive terminals URLLC to maximize the resource utilization rate of the system, the invention provides a network-assisted full duplex mode optimization method under massive terminals URLLC.
The technical scheme is as follows: the network-assisted full duplex mode optimization method under the mass terminal URLLC provided by the invention comprises the following steps:
the method adopts an intelligent algorithm based on WoLF-PHC to optimize, and comprises the following steps:
step 1: defining a load-aware utility function for each user i:
Wherein U i is a utility function of load perception, which is used for representing the resource utilization rate of the system and used for representing the resource utilization rate of the system; k is the number of allocable total resource blocks of each remote antenna unit RAU, K is the total number of users, n m,i is the number of resource blocks allocated to user i by the mth remote antenna unit RAU, n m,a is the number of resource blocks allocated to user a by the mth RAU, and the sum term is then calculated The sum of the number of resource blocks allocated to all users by the mth remote antenna unit RAU is called RAU m for short, and can be calculated by the following formula:
Wherein, Is the bandwidth required by user i according to its own quality of service QoS, b is the bandwidth occupied by each resource block, gamma i is the signal-to-interference-and-noise ratio SINR of user i, R i is the short packet achievable rate in the URLLC scenario, V (gamma i) is the dispersion of user channels, m is the length of the short packet block, Q -1 (·) is the Q function inverse, epsilon 0 is the error decoding probability DEP, e is the natural logarithm,Meaning round up;
step 2: the optimization objective is to maximize the user's resource utility function based on load awareness:
Wherein, U U,i is the load sensing utility function value of the uplink, U D,j is the load sensing utility function value of the downlink, K u is the number of users in the uplink, and K d is the number of users in the downlink; u and d are the uplink and downlink identifiers, i and j are the ith uplink user and the jth downlink user, respectively; to determine in which mode each remote antenna unit RAU should operate, two binary allocation vectors x u,xd∈{0,1}M×1 are defined, M being the total number of RAUs, the binary allocation vector of the ith uplink RAU or the binary allocation vector of the jth downlink RAU if the RAU is used for uplink or downlink The value is 1, otherwise, the value is 0; the effective load-sensing utility function values for the uplink and downlink can be represented by equations (5) and (6), respectively:
wherein X u=diag(xu),Xd=diag(xd is defined), diag (a) represents the formation of a diagonal matrix with element a; m u is the number of uplink RAUs, M d is the number of downlink RAUs, k U,m is the number of resource blocks available for allocation by uplink RAU M, k D,m is the number of resource blocks available for allocation by downlink RAUm, n m,i is the number of resource blocks allocated to uplink user i when the requirement of QoS of uplink user i is satisfied, n m,j is the number of resource blocks allocated to downlink user j when the requirement of QoS of downlink user j is satisfied by RAUm;
Step 3: and optimizing the resource utility function by using an intelligent algorithm, and storing a final state set and rewards of the algorithm as an optimal RAU duplex mode and maximized resource utilization efficiency.
Wherein:
The intelligent algorithm based on WoLF-PHC is as follows:
WoLF means a carefully slow adjustment parameter when the agent is doing better than the expected value, and a fast pace adjustment parameter when the agent is doing worse than the expected value;
PHC is a learning algorithm of a single agent under a stable environment, the core of the algorithm is the idea of normal reinforcement learning, the selection probability of the action capable of obtaining the maximum accumulation expected is increased, and the algorithm can be converged to an optimal strategy;
The WoLF-PHC intelligent algorithm is an expandability algorithm suitable for distributed execution of multiple intelligent agents, and combines the WoLF and PHC algorithms, so that rewards obtained by the intelligent agents can be quickly adjusted to adapt to strategy changes of other intelligent agents when the rewards are worse than expected, and the rewards are carefully learned when the rewards are better than expected, so that the time for adapting to the strategy changes of the other intelligent agents can be shortened; the WoLF-PHC algorithm can converge to a Nash equilibrium strategy, and when other intelligent agents adopt a certain fixed strategy, the WoLF-PHC algorithm can also converge to an optimal strategy under the current condition instead of converging to a Nash equilibrium strategy with poor possible effect; the WoLF-PHC algorithm does not need to observe strategies, actions and rewarding values of other agents, less space is needed to record the Q value, and the WoLF-PHC algorithm carries out learning improvement strategy through the PHC algorithm, so that no linear programming or quadratic programming is needed to solve Nash equilibrium, and the algorithm speed is improved; in the context of mass terminals URLLC, distributed operations may make the algorithm logically scalable.
The Wolf-PHC algorithm comprises an average estimation strategyThe update of (c) follows the following equation:
Where pi i(s,ai) is the policy to be taken under a particular state-action pair, C(s) is the number of times that state s occurs, pi i(s,ai) is updated as follows:
wherein Q (s, a) represents a cost function obtained by taking action a in state s, and Q value is updated to update formula reference formula (8); For increment or decrement of update policy, when the currently selected action a i is not the action to maximize Q value, decrement is used Updating, if the currently selected action a i is the action to maximize Q, by deltaUpdating; the value of δ sa in turn depends on the estimation strategy pi i(s,ai) and the equationDelta is an updated auxiliary parameter, and A i is the size of the action space, and the specific value of delta is shown as (5); delta w is the positive updating auxiliary parameter adopted when the rewards obtained by the intelligent agent are better than expected, delta l is the negative updating auxiliary parameter adopted when the rewards obtained by the intelligent agent are worse than expected;
In the intelligent algorithm based on Wolf-PHC, each remote antenna unit RAU in the system is independently regarded as an intelligent body, data detection and node mode selection are carried out locally, and the data detection and node mode selection do not need to be uploaded to a central processing center CPU for centralized calculation; for each agent, the state space has only two states s t={s1,s2},s1 to indicate that the RAU working mode is uplink receiving, s 2 indicates that the RAU working mode is downlink transmitting, the action space is set to have only two actions a t={a1,a2},a1 to indicate that the RAU changes the original working mode, a 2 indicates that the RAU keeps the original working mode unchanged, so that the size of the Q table is 2×2, if the total number of RAUs is M, the total space for storing Q values only needs m×2×2, which is far smaller than the storage space of the Q table of 2 M ×m required for centralized processing uploaded to the CPU, and the complexity is lower.
The RAU is received in the uplink, and the following formula is awarded:
when the RAU is downlink, the following formula is awarded:
the Q value is updated according to the following formula:
Where α is the learning rate, s t and a t are the state and action at time t, respectively, and the reward R t+1 is the feedback from the environment obtained after the agent took action a t in state s t at time t, the discount factor γ defining the importance of the future reward, a value of 0 meaning that only short term rewards are considered, a value of 1 more emphasizes long term rewards.
The specific steps of the WoLF-PHC algorithm are as follows:
Step 1, M is the total number of RAU, M Q tables with the size of 2 multiplied by 2 and all zero values are generated, and initialization is carried out Wherein,Representing the channel vectors between all downlink RAUs and the downlink user j receiving the signal,G i,j denotes a channel vector between the i-th uplink user and the j-th downlink user for a downlink precoding vector,The method comprises the steps of representing channels between an ith uplink user and all uplink RAUs, wherein G I is a real interference channel matrix between a downlink RAUs and an uplink RAUs, initializing a learning rate alpha and an attenuation factor gamma, and initializing a positive update auxiliary parameter delta w and a negative update auxiliary parameter delta l; initialization strategyInitializing average estimation strategyThe |a i | is the size of the motion space, and the number of times C(s) =0 that the initialization motion s occurs;
step 2, if the state of the RAU is uplink reception at this time, calculating the rewards according to the formula (6) after selecting the action according to the strategy, if the state of the RAU is downlink transmission at this time, calculating the rewards according to the formula (7) after selecting the action according to the strategy,
Step 3, the current state jumps to the next state according to the selected action;
step 4, updating the Q values of the exterior and the interior of the Q according to the step 8;
step 5, for each action, updating the average estimation strategy according to formula (1);
step 6, updating the strategy according to the Q value and each action and the formula (2-5);
step 7, returning to the step 2 to perform learning training until the strategy and the values of the Q surfaces and the Q insides converge;
And 8, returning the state and rewards of the optimal solution of each agent, and corresponding to the uplink and downlink modes of each RAU and the maximum user resource utility function value.
The beneficial effects are that: the invention provides a network-assisted full duplex non-honeycomb large-scale MIMO scene with expandability duplex mode optimization method suitable for a high-reliability low-delay communication (URLLC) scene of a massive terminal, and provides a WoLF-PHC intelligent algorithm with strong expandability for multi-agent distributed operation aiming at the problem of maximizing the resource utilization efficiency of a network-assisted full duplex uplink and downlink in the non-honeycomb large-scale MIMO scene. Wherein logical scalability is achieved in that each RAU is considered as an agent, comprising a local processor, which can perform its own associated data processing tasks and optimize local performance metrics based on decisions of other RAUs. When a new Remote Antenna Unit (RAU) is added to the system, the computing power of the CPU does not need to be upgraded, the data information transmitted by all RAUs does not need to be retrained at the CPU, and the distributed execution operation enables the system algorithm to be extensible. The intelligent distributed operation algorithm provided by the invention is more suitable for dynamic scenes of mass terminals URLLC, has lower complexity and has lower storage space requirement than the traditional centralized Q-learning algorithm.
Drawings
The drawing is a scene built in an example problem, and a resource utility function comparison drawing based on a Wolf-PHC algorithm and other algorithms is provided.
Fig. 1 is a position distribution diagram of uniformly distributed RAUs and randomly distributed uplink and downlink users;
FIG. 2 is a graph comparing CDFs of resource utility functions under different algorithms.
Fig. 3 is a schematic flow chart of the present invention.
Detailed Description
The invention is described in detail below with reference to examples:
Assuming a cell-free massive MIMO scenario within a circle, the M RAUs are evenly distributed within a circle with a radius of 600M. The system comprises K randomly distributed users, including K u uplink users and K d downlink users. The K uplink and downlink users are randomly distributed in a circle with a radius of 1000 m. Let us assume that m= 6,K u =20 and kd=20, a specific scene distribution diagram is shown in fig. 1 of the drawings of the specification. The power of the noise is set to-90 dbm, the transmission power of the uplink is 30dbm, the transmission power of the downlink is 23dbm, and the path loss is 128.1+37.6log10 (d).
The implementation method of the invention in the system is as follows:
(1) Defining a load-aware utility function for each user i:
Wherein U i is a load-aware utility function used to characterize the resource utilization of the system. K is the number of allocable total resource blocks each Remote Antenna Unit (RAU) has and K is the total number of users. n m,i is the number of resource blocks allocated to user i by RAU m, which can be calculated by the following equation:
Wherein, Is the bandwidth used by user i according to its quality of service (QoS). b is the bandwidth occupied by each resource block, gamma i is the signal-to-interference-and-noise ratio SINR of user i, R i is the short packet achievable rate in the URLLC scenario, V (gamma i) is the dispersion of user channels, m is the length of the short packet block, Q -1 (·) is the Q function inverse, epsilon 0 is the error decoding probability (DEP), and e is the natural logarithm.Meaning rounded up. According to the property of the logarithmic function, when the whole network is overloaded, the user can preferentially select RAUs with smaller quantity of resource blocks to be allocated to the user on the premise of ensuring that the service quality of the user can be met, so that the utilization rate of the whole resource blocks of the system is improved. As the overall load of the network increases, the value of the user's load-aware utility function decreases. Furthermore, as the number of available allocated resource blocks owned by the RAU increases, the value of the load-aware utility function also increases. If the RAU cannot guarantee to meet the quality of service of the user i, the RAU will not provide the resource block for the user i at this time, and U i =0. U i, as a load-aware utility function, can therefore be used to characterize the resource utilization of the system.
(2) The optimization objective is to maximize the user's resource utility function based on load awareness:
Wherein, U U,i is the load sensing utility function value of the uplink, U D,j is the load sensing utility function value of the downlink, K u is the number of users in the uplink, and K d is the number of users in the downlink; u and d are the uplink and downlink identifiers, i and j are the ith uplink user and the jth downlink user, respectively; to determine in which mode each remote antenna unit RAU (remote antennaunit) should operate, two binary allocation vectors x u,xd∈{0,1}M×1, M are defined as the total number of RAUs, the binary allocation vector of the ith uplink RAU or the binary allocation vector of the jth downlink RAU if the RAU is used for uplink or downlink The value is 1, otherwise, the value is 0; the effective load-sensing utility function values for the uplink and downlink can be represented by equations (5) and (6), respectively:
Wherein X u=diag(xu),Xd=diag(xd is defined), diag (a) represents the formation of a diagonal matrix with element a; m u is the number of uplink RAUs, M d is the number of downlink RAUs, k U,m is the number of resource blocks available for allocation by uplink RAUm, k D,m is the number of resource blocks available for allocation by downlink RAUm, n m,i is the number of resource blocks allocated to uplink user i by RAUm when the QoS requirement of uplink user i is met, and n m,j is the number of resource blocks allocated to downlink user j by RAUm when the QoS requirement of downlink user j is met.
(3) And optimizing the resource utility function by using an intelligent algorithm, and storing a final state set and rewards of the algorithm as an optimal RAU duplex mode and maximized resource utilization efficiency.
In order to realize a network-assisted full duplex mode optimization technology based on load perception under a scene suitable for mass terminals URLLC, an intelligent algorithm based on Wolf-PHC q-learning is provided:
WoLF refers to a carefully slow adjustment parameter when the agent is doing better than the desired value, and a fast pace adjustment parameter when the agent is doing worse than the desired value.
PHC is a learning algorithm of single agent under stable environment. The core of the algorithm is the idea of general reinforcement learning, which increases the probability of choosing the action that can get the maximum cumulative expectation. The algorithm has rationality and can converge to an optimal strategy.
The Wolf-PHC intelligent algorithm is an extensible algorithm suitable for distributed execution of multiple agents. The WoLF algorithm and the PHC algorithm are combined, so that rewards obtained by the agents can be quickly adjusted to adapt to policy changes of other agents when the rewards are worse than expected, and the rewards are carefully learned when the rewards are better than expected, so that the other agents are given time for adapting to policy changes. And the WoLF-PHC algorithm can converge to a Nash equilibrium strategy, and when other agents adopt a certain fixed strategy, the WoLF-PHC algorithm can also converge to an optimal strategy under the current condition instead of converging to a Nash equilibrium strategy with possibly poor effect. The WoLF-PHC algorithm does not need to observe strategies, actions and rewarding values of other agents, less space is needed to record the Q value, and the WoLF-PHC algorithm learns and improves the strategies through the PHC algorithm, so that no linear programming or quadratic programming is needed to solve Nash equilibrium, and the algorithm speed is improved. In the context of mass terminals URLLC, distributed operations may make the algorithm logically scalable. Wherein the average estimation strategyThe update of (c) follows the following equation:
Where pi i(s,ai) is the policy to be taken under a particular state-action pair, and C(s) is the number of times state s occurs. Pi i(s,ai) is updated as follows:
Where Q (s, a) represents the cost function resulting from action a being taken in state s, updating formula reference (14).
In the Wolf-PHC algorithm, each RAU in the system is regarded as an agent independently, data detection and node mode selection are performed locally, and the RAU does not need to be uploaded to a central processing Center (CPU) for centralized calculation. For each agent, the state space has only two states s t={s1,s2},s1 to indicate that the RAU working mode is uplink receiving, s 2 indicates that the RAU working mode is downlink transmitting, the action space is set to have only two actions a t={a1,a2},a1 to indicate that the RAU changes the original working mode, a 2 indicates that the RAU keeps the original working mode unchanged, so that the size of the Q table is 2×2, if the total number of RAUs is M, the total space for storing Q values only needs m×2×2, which is far smaller than the storage space of the Q table of 2 M ×m required for centralized processing uploaded to the CPU, and the complexity is lower. To reward when the RAU is in the uplink reception mode, the following formula is:
When the RAU is in the downlink transmission mode, the following equation is awarded:
the update formula of the Q value is as follows:
Where α is the learning rate, s t and a t are the state and action at time t, respectively, and the reward R t+1 is the feedback from the environment obtained after the agent took action a t in state s t at time t, the discount factor γ defining the importance of the future reward, a value of 0 meaning that only short term rewards are considered, a value of 1 more emphasizes long term rewards. The specific steps of the algorithm are as follows:
① M is the total number of RAUs, M Q tables with the size of 2 multiplied by 2 and all zero values are generated, and initialization is carried out Wherein,Representing the channel vectors between all downlink RAUs and the downlink user j receiving the signal,G i,j denotes a channel vector between the i-th uplink user and the j-th downlink user for a downlink precoding vector,Representing the channel between the ith uplink user and all uplink RAUs, G I is the real interference channel matrix between the downlink RAUs and the uplink RAUs, and the initialization strategyInitializing average estimation strategyThe |a i | is the size of the motion space, and C(s) =0 is initialized;
② If the state of the RAU is uplink reception at this time, the prize is calculated according to the formula (12) after the policy selection action, if the state of the RAU is downlink transmission at this time, the prize is calculated according to the formula (13) after the policy selection action,
③ The current state jumps to the next state according to the selected action;
④ Updating the Q values of the Q surfaces and the Q surfaces according to the formula (14);
⑤ For each action, updating the average estimation strategy according to equation (7);
⑥ Updating the strategy according to the Q value and each action and the formula (8-11);
⑦ Returning to the step ② to perform learning training until the strategy and the values in the Q surface are converged;
⑧ And returning the state and rewards of the optimal solution of each agent, and corresponding to the uplink and downlink modes of each RAU and the maximum user resource utility function value.
Fig. 2 shows that the network-assisted full duplex non-cellular massive MIMO scenario with scalability under the high reliability and low delay communication (URLLC) scenario for massive terminals provided by the invention is higher than the fixed-mode scenario with the equal-division uplink and downlink RAU scheme and the time-division full duplex TDD scheme, and is close to the exhaustion algorithm with optimal performance theory, because the performance is slightly lower than the exhaustion algorithm on the network system level due to convergence to the nash equalization strategy in part of cases, the complexity of the algorithm provided by the invention is much lower than that of the exhaustion method, and compared with the centralized Q-learning scenario, the network-assisted full duplex non-cellular MIMO scenario with scalability has smaller calculation storage space and higher scalability, and is more suitable for the high reliability and low delay communication (URLLC) scenario for massive terminals.

Claims (6)

1. A network auxiliary full duplex mode optimization method under a mass terminal URLLC is characterized in that: the method adopts an intelligent algorithm based on WoLF-PHC to optimize, and comprises the following steps:
step 1: defining a load-aware utility function for each user i:
Wherein U i is a utility function of load perception, which is used for representing the resource utilization rate of the system and used for representing the resource utilization rate of the system; k is the number of allocable total resource blocks of each remote antenna unit RAU, K is the total number of users, n m,i is the number of resource blocks allocated to user i by the mth remote antenna unit RAU, n m,a is the number of resource blocks allocated to user a by the mth RAU, and the sum term is then calculated The sum of the number of resource blocks allocated to all users by the mth remote antenna unit RAU is called RAU m for short, and can be calculated by the following formula:
Wherein, Is the bandwidth required by user i according to its own quality of service QoS, b is the bandwidth occupied by each resource block, gamma i is the signal-to-interference-and-noise ratio SINR of user i, R i is the short packet achievable rate in the URLLC scenario, V (gamma i) is the dispersion of user channels, m is the length of the short packet block, Q -1 (·) is the Q function inverse, epsilon 0 is the error decoding probability DEP, e is the natural logarithm,Meaning round up;
step 2: the optimization objective is to maximize the user's resource utility function based on load awareness:
Wherein, U U,i is the load sensing utility function value of the uplink, U D,j is the load sensing utility function value of the downlink, K u is the number of users in the uplink, and K d is the number of users in the downlink; u and d are the uplink and downlink identifiers, i and j are the ith uplink user and the jth downlink user, respectively; to determine in which mode each remote antenna unit RAU should operate, two binary allocation vectors x u,xd∈{0,1}M×1 are defined, M being the total number of RAUs, the binary allocation vector of the ith uplink RAU or the binary allocation vector of the jth downlink RAU if the RAU is used for uplink or downlink The value is 1, otherwise, the value is 0; the effective load-sensing utility function values for the uplink and downlink can be represented by equations (5) and (6), respectively:
Wherein X u=diag(xu),Xd=diag(xd is defined), diag (a) represents the formation of a diagonal matrix with element a; m u is the number of uplink RAUs, M d is the number of downlink RAUs, k U,m is the number of resource blocks available for allocation by uplink RAUm, k D,m is the number of resource blocks available for allocation by downlink RAUm, n m,i is the number of resource blocks allocated to uplink user i by RAUm if the QoS requirement of uplink user i is met, and n m,j is the number of resource blocks allocated to downlink user j by RAUm if the QoS requirement of downlink user j is met;
Step 3: and optimizing the resource utility function by using an intelligent algorithm, and storing a final state set and rewards of the algorithm as an optimal RAU duplex mode and maximized resource utilization efficiency.
2. The method for optimizing the network-assisted full duplex mode under the mass terminal URLLC according to claim 1, wherein the intelligent algorithm based on WoLF-PHC is:
WoLF means a carefully slow adjustment parameter when the agent is doing better than the expected value, and a fast pace adjustment parameter when the agent is doing worse than the expected value;
PHC is a learning algorithm of a single agent under a stable environment, the core of the algorithm is the idea of normal reinforcement learning, the selection probability of the action capable of obtaining the maximum accumulation expected is increased, and the algorithm can be converged to an optimal strategy;
The WoLF-PHC intelligent algorithm is an expandability algorithm suitable for distributed execution of multiple intelligent agents, and combines the WoLF and PHC algorithms, so that rewards obtained by the intelligent agents can be quickly adjusted to adapt to strategy changes of other intelligent agents when the rewards are worse than expected, and the rewards are carefully learned when the rewards are better than expected, so that the time for adapting to the strategy changes of the other intelligent agents can be shortened; the WoLF-PHC algorithm can converge to a Nash equilibrium strategy, and when other intelligent agents adopt a certain fixed strategy, the WoLF-PHC algorithm can also converge to an optimal strategy under the current condition instead of converging to a Nash equilibrium strategy with poor possible effect; the WoLF-PHC algorithm does not need to observe strategies, actions and rewarding values of other agents, less space is needed to record the Q value, and the WoLF-PHC algorithm carries out learning improvement strategy through the PHC algorithm, so that no linear programming or quadratic programming is needed to solve Nash equilibrium, and the algorithm speed is improved; in the context of mass terminals URLLC, distributed operations may make the algorithm logically scalable.
3. The method for optimizing network-assisted full duplex mode under a massive terminal URLLC according to claim 2, wherein the WoLF-PHC algorithm includes an average estimation strategyThe update of (c) follows the following equation:
Where pi i(s,ai) is the policy to be taken under a particular state-action pair, C(s) is the number of times that state s occurs, pi i(s,ai) is updated as follows:
wherein Q (s, a) represents a cost function obtained by taking action a in state s, and Q value is updated to update formula reference formula (8); For increment or decrement of update policy, when the currently selected action a i is not the action to maximize Q value, decrement is used Updating, if the currently selected action a i is the action to maximize Q, by deltaUpdating; the value of δ sa in turn depends on the estimation strategy pi i(s,ai) and the equationDelta is an updated auxiliary parameter, and A i is the size of the action space, and the specific value of delta is shown as (5); delta w is the positive updating auxiliary parameter adopted when the rewards obtained by the intelligent agent are better than expected, delta l is the negative updating auxiliary parameter adopted when the rewards obtained by the intelligent agent are worse than expected;
In the intelligent algorithm based on Wolf-PHC, each remote antenna unit RAU in the system is independently regarded as an intelligent body, data detection and node mode selection are carried out locally, and the data detection and node mode selection do not need to be uploaded to a central processing center CPU for centralized calculation; for each agent, the state space has only two states s t={s1,s2},s1 to indicate that the RAU working mode is uplink receiving, s 2 indicates that the RAU working mode is downlink transmitting, the action space is set to have only two actions a t={a1,a2},a1 to indicate that the RAU changes the original working mode, a 2 indicates that the RAU keeps the original working mode unchanged, so that the size of the Q table is 2×2, if the total number of RAUs is M, the total space for storing Q values only needs m×2×2, which is far smaller than the storage space of the Q table of 2 M ×m required for centralized processing uploaded to the CPU, and the complexity is lower.
4. The method for optimizing network assisted full duplex mode under a massive terminal URLLC according to claim 3, wherein the RAU is uplink received, and the following formula is awarded:
when the RAU is downlink, the following formula is awarded:
5. the method for optimizing network assisted full duplex mode under a mass terminal URLLC according to claim 3, wherein the Q value is updated according to the following formula:
Where α is the learning rate, s t and a t are the state and action at time t, respectively, and the reward R t+1 is the feedback from the environment obtained after the agent took action a t in state s t at time t, the discount factor γ defining the importance of the future reward, a value of 0 meaning that only short term rewards are considered, a value of 1 more emphasizes long term rewards.
6. The method for optimizing the network-assisted full duplex mode under the mass terminal URLLC according to claim 3, wherein the specific steps of the WoLF-PHC algorithm are as follows:
Step 1, M is the total number of RAU, M Q tables with the size of 2 multiplied by 2 and all zero values are generated, and initialization is carried out Wherein,Representing the channel vectors between all downlink RAUs and the downlink user j receiving the signal,G i,j denotes a channel vector between the i-th uplink user and the j-th downlink user for a downlink precoding vector,The method comprises the steps of representing channels between an ith uplink user and all uplink RAUs, wherein G I is a real interference channel matrix between a downlink RAUs and an uplink RAUs, initializing a learning rate alpha and an attenuation factor gamma, and initializing a positive update auxiliary parameter delta w and a negative update auxiliary parameter delta l; initialization strategyInitializing average estimation strategyThe |a i | is the size of the motion space, and the number of times C(s) =0 that the initialization motion s occurs;
step 2, if the state of the RAU is uplink reception at this time, calculating the rewards according to the formula (6) after selecting the action according to the strategy, if the state of the RAU is downlink transmission at this time, calculating the rewards according to the formula (7) after selecting the action according to the strategy,
Step 3, the current state jumps to the next state according to the selected action;
step 4, updating the Q values of the exterior and the interior of the Q according to the step 8;
step 5, for each action, updating the average estimation strategy according to formula (1);
step 6, updating the strategy according to the Q value and each action and the formula (2-5);
step 7, returning to the step 2 to perform learning training until the strategy and the values of the Q surfaces and the Q insides converge;
And 8, returning the state and rewards of the optimal solution of each agent, and corresponding to the uplink and downlink modes of each RAU and the maximum user resource utility function value.
CN202210649515.9A 2022-06-09 2022-06-09 Network-assisted full duplex mode optimization method under mass terminals URLLC Active CN115052355B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210649515.9A CN115052355B (en) 2022-06-09 2022-06-09 Network-assisted full duplex mode optimization method under mass terminals URLLC

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210649515.9A CN115052355B (en) 2022-06-09 2022-06-09 Network-assisted full duplex mode optimization method under mass terminals URLLC

Publications (2)

Publication Number Publication Date
CN115052355A CN115052355A (en) 2022-09-13
CN115052355B true CN115052355B (en) 2024-07-05

Family

ID=83161112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210649515.9A Active CN115052355B (en) 2022-06-09 2022-06-09 Network-assisted full duplex mode optimization method under mass terminals URLLC

Country Status (1)

Country Link
CN (1) CN115052355B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110248402A (en) * 2018-03-09 2019-09-17 华为技术有限公司 A kind of Poewr control method and equipment
CN110267338A (en) * 2019-07-08 2019-09-20 西安电子科技大学 Federated resource distribution and Poewr control method in a kind of D2D communication

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10452440B1 (en) * 2016-06-07 2019-10-22 PC Drivers Headquarters, Inc. Systems and methods of optimized tuning of resources
ES2975279T3 (en) * 2017-09-28 2024-07-04 Zte Corp Method and systems for exchanging messages on a wireless network
US11012112B2 (en) * 2018-02-09 2021-05-18 Qualcomm Incorporated Techniques for flexible resource allocation
CN114258138B (en) * 2021-12-20 2024-07-05 东南大学 Network-assisted full duplex mode optimization method based on load perception

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110248402A (en) * 2018-03-09 2019-09-17 华为技术有限公司 A kind of Poewr control method and equipment
CN110267338A (en) * 2019-07-08 2019-09-20 西安电子科技大学 Federated resource distribution and Poewr control method in a kind of D2D communication

Also Published As

Publication number Publication date
CN115052355A (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN109729528B (en) D2D resource allocation method based on multi-agent deep reinforcement learning
WO2023179010A1 (en) User packet and resource allocation method and apparatus in noma-mec system
CN109962728B (en) Multi-node joint power control method based on deep reinforcement learning
Balevi et al. A clustering algorithm that maximizes throughput in 5G heterogeneous F-RAN networks
CN114867030B (en) Dual-time scale intelligent wireless access network slicing method
CN114727318A (en) Multi-RIS communication network rate increasing method based on MADDPG
Iqbal et al. Convolutional neural network-based deep Q-network (CNN-DQN) resource management in cloud radio access network
CN113573363A (en) MEC calculation unloading and resource allocation method based on deep reinforcement learning
CN111212438B (en) Resource allocation method of wireless energy-carrying communication technology
Liu et al. Deep reinforcement learning-based MEC offloading and resource allocation in uplink NOMA heterogeneous network
Ahsan et al. Reinforcement learning for user clustering in NOMA-enabled uplink IoT
CN115052355B (en) Network-assisted full duplex mode optimization method under mass terminals URLLC
CN115866787A (en) Network resource allocation method integrating terminal direct transmission communication and multi-access edge calculation
Mendoza et al. Deep reinforcement learning for dynamic access point activation in cell-free MIMO networks
Nouruzi et al. Toward a smart resource allocation policy via artificial intelligence in 6G networks: Centralized or decentralized?
CN115086964A (en) Dynamic spectrum allocation method and system based on multi-dimensional vector space optimization
CN114258138B (en) Network-assisted full duplex mode optimization method based on load perception
CN114051205B (en) Edge optimization method based on reinforcement learning dynamic multi-user wireless communication scene
CN106301501B (en) A kind of instant data transfer optimization method of combined coding modulation
CN115633402A (en) Resource scheduling method for mixed service throughput optimization
CN116170052A (en) Hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method
CN115767703A (en) Long-term power control method for SWIPT-assisted de-cellular large-scale MIMO network
CN114980156A (en) AP switch switching method of large-scale MIMO system without cellular millimeter waves
CN107995034A (en) A kind of dense cellular network energy and business collaboration method
CN117793909A (en) Multi-domain resource joint allocation method and device in honeycomb-free mMIMO network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant