CN114867030B

CN114867030B - Dual-time scale intelligent wireless access network slicing method

Info

Publication number: CN114867030B
Application number: CN202210649530.3A
Authority: CN
Inventors: 李佳珉; 王洁; 叶枫; 朱鹏程; 盛彬; 尤肖虎
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-06-09
Filing date: 2022-06-09
Publication date: 2024-07-09
Anticipated expiration: 2042-06-09
Also published as: CN114867030A

Abstract

The invention discloses a slicing method of a double-time-scale intelligent wireless access network. The method is based on a cell-free distributed large-scale MIMO system architecture, combines non-orthogonal multiple access and mass terminal dynamic multiple connection, and respectively performs physical resource block allocation and power allocation on two time scales by utilizing a reinforcement learning algorithm according to the characteristic of long-term change of network state so as to realize self-adaptive resource allocation under different time and resource granularity. Compared with the prior art, the method and the device have the advantages that the upper layer and the lower layer are combined to configure resources, the physical resource block quantity configuration of each piece of the upper layer is given, the lower layer is used for carrying out physical resource block allocation and power allocation on each user according to the environmental change of the physical layer in a small time scale, links are dynamically selected, the spectrum efficiency of the system is improved, the requirements of ultra-high reliability and ultra-low delay service of 6G mass flow in the future are met, and the method and the device have very important significance for researching real-time resource allocation in a mobile scene.

Description

Dual-time scale intelligent wireless access network slicing method

Technical Field

The invention relates to a dual-time-scale intelligent wireless access network slicing method based on a cell-free distributed large-scale MIMO system architecture, and belongs to the technical field of mobile communication.

Background

With the rapid development of the mobile internet, the communication service scale is continuously enlarged, the difference of the user demands is higher and higher, not only the limited frequency spectrum is becoming more and more scarce, but also the demands for high system throughput, ultra-low delay, ultra-high reliability and real-time connection are further improved, and the conventional wireless communication system needs to be further improved. Non-cellular distributed massive MIMO is an innovative scalable network MIMO, where a large number of APs distributed in a region serve all users in the same time-frequency resource. Non-cellular distributed massive MIMO has very high spectral efficiency, energy efficiency, and coverage. In addition, the problem of mobility of the user position in the cellular network is solved through the structure without the honeycomb, and compared with a centralized system, the distributed large-scale MIMO system without the honeycomb has the advantages of channel diversity, no switching, higher coverage rate, no cell deployment in a specific area and the like. In addition, the multi-communication can effectively reduce delay caused by retransmission error transmission under the condition of single link, and meet the high reliability requirement of a 6G system on mass service and the characteristics of a non-cellular distributed large-scale MIMO system; non-orthogonal multiple access supports massive user access to limited spectrum resources, further exploiting the power domain to improve the throughput of the system.

In order to meet the requirement of future 6G large-scale service providing customized services, 6G systems pay more attention to the utilization rate of limited resources, so network slicing technology for realizing resource sharing by utilizing network virtualization technology has been developed. The network slice utilizes independent and flexible virtual resource slices to abstract physical resources into virtual logic networks applicable to different scenes, and provides powerful guarantee for QoS. The research on the core network slice is comprehensive, and mainly focuses on the configuration and management of the network slice; the prior art of wireless access network slicing combines self-optimization of multi-granularity network resources and proposes a hierarchical slicing architecture, but the wireless access network slicing technology and the network slicing technology are highly compatible, so that on one hand, the wireless channel randomness in the network slicing can be reduced by the wireless distributed massive MIMO system without a honeycomb, and on the other hand, the application of the wireless distributed massive MIMO system without a honeycomb is more flexible by the network slicing. Therefore, the research of combining the honeycomb-free distributed large-scale MIMO system with the network slice becomes more and more important, and has important significance for meeting the diversity requirement in the future 6G and realizing the dynamic allocation of limited resources.

Disclosure of Invention

Technical problems: accordingly, an object of the present invention is to provide a dual-time-scale intelligent radio access network slicing method based on a cell-free distributed massive MIMO system architecture, which is used to realize efficient and dynamic allocation of limited resources in combination with network slicing technology in the cell-free distributed massive MIMO system architecture.

The technical scheme is as follows: the invention provides a method for jointly optimizing QoS of users by an algorithm under double time scales, namely a slicing method of a double time scale intelligent radio access network under the constraint that PRB allocation and power allocation are carried out for uplink users in a cell-free distributed large-scale MIMO system architecture, user queue time delay is ensured, the minimum average speed requirement of slicing and user speed interruption probability are met, and the like, and the method comprises the following specific steps:

The method is based on a cell-free distributed massive MIMO system architecture, in the distributed massive MIMO system, J Access Points (AP) are connected to a central processing unit, J= {1,2, & gt, J }, each AP is provided with M antennas, users in a coverage range are divided into different slices according to service requirements, a slice set I= {1,2, & gt, i }, and the users in the slice i are U _i＝{1,2,...,u_i }; in a two-time scale network slice structure, a transmission time interval TTI with a small scale time dimension of Δt=1 ms, a large scale time k dimension comprising Δt TTIs, in each TTI, the total bandwidth W is divided into F physical resource blocks PRB shared by all APs, i.e. f= {1,2,., F }, each PRB allocated a bandwidth of b=w/F; the method specifically comprises the following steps:

Step S1, establishing a channel model and an uplink signal transmission model of a distributed large-scale MIMO system to obtain an uplink channel transmission expression and a transmission rate expression of an enhanced mobile broadband eMBB user and a high-reliability low-delay communication URLLC user;

Step S2, a slicing model is established, a buffer data queue transmitted according to a first-come first-served strategy is introduced into each AP by a user of each slice, so that the data packet delay of the user can be divided into processing delay, transmission delay and queuing delay, and two indexes of quality of service QoS, namely communication reliability and packet delay expression are obtained;

step S3, under the constraint that the time delay of a user queue is ensured, the minimum average rate requirement of a slice and the user rate interruption probability are met, a hierarchical optimization model is established;

Step S4, providing a double-time-scale access network slicing method, wherein an upper layer controller firstly observes user service flow in large-scale time by using a depth Q network DQN algorithm, and allocates different numbers of PRBs for each slice; based on the slice configuration method obtained by the upper controller, the lower controller utilizes a multi-agent depth deterministic strategy gradient MADDPG algorithm to allocate specific PRB and power to each user in the slice according to the channel information in the small-scale time.

The step S1 specifically includes:

Step S101, consider a fading channel under a multi-connection scene, and model the uplink channel gain between the user u _i and the f PRB and the j AP in the t TTI as follows

In the formula (1)Representing a large scale fade from the jth AP to user u _i,Representing the distance from user u _i to the jth AP, ζ is the path loss index,Is a logarithmic fading variable which is a function of the time,Representing small scale fading, the elements of which obey a standard Rayleigh distribution

Step S102, consider two slice types in the distributed architecture, one is eMBB slices, the data transmission rate of which accords with shannon capacity theory, and the data transmission rate of eMBB users in the t-th TTI can be modeled as

Another is URLLC slices whose data rate is approximated by finite block length theory, and the data transmission rate of URLLC users in the t-th TTI can be modeled as

In the formula (2) and the formula (3)Representing signal-to-noise ratio, Δt refers to one TTI, and B is bandwidth; in formula (3)Representing channel dispersion, ρ _i is the average packet length of slice i, Q ^-1 (·) is the inverse gaussian Q function, and epsilon is the effective decoding error probability.

The step S2 specifically includes:

Step S201, dividing the data packet delay of the user into processing delay, transmission delay and queuing delay, wherein the total delay D _i,t of the slice i in the t-th TTI is

In the formula (4)Respectively representing the transmission delay, the transmission delay and the queuing delay of the slice i;

Step S202, defining the packet loss rate of the ith slice as the probability that the total delay of the packets in the slice i exceeds a predefined maximum slice delay threshold; then the packet drop rate, i.e., packet loss rate delta _i,t, of the ith slice in the tth TTI can be expressed as

D _i,t in equation (5) is the total delay of slice i,Representing the maximum packet delay acceptable for slice i, pr is a probability symbol; packet delay and reliability will serve as two key indicators for evaluating QoS performance.

The step S3 specifically includes:

Step S301, the upper control policy pi _C converts the dynamic changes of traffic and the dynamic changes of QoS performance observation into the PRB number allocation of each slice, so the upper control policy pi _C can be expressed as a mapping from the global state of the entire network S _k to the appropriate PRB number configuration C _k in the slice in the kth large scale time, and can be modeled as

A _i in equation (6) represents the packet arrival rate of the user in slice i,Is the average packet delay for all active users at which slice i is,The average packet loss rate of all active users in the slice i, and C _i,k is the PRB number configuration of the slice i;

Step S302, in the t TTI of the kth large-scale time, the lower layer controller maps the observed user information X _t and the PRB number configuration information C _k to the overall radio resource allocation method E _t in the physical layer, and the lower layer control policy pi _E can be modeled as

In equation (7), C _k is the PRB number configuration per slice, Δt is a large scale time length,Is the user queue length in slice i,Is the channel state information of the user,Is a binary user association factor, representing AP association and PRB allocation,Indicating that the power allocated to user ui may be one of Z different power levels;

Step S303, in order to maximize the overall utility of the proposed hierarchical network slice optimization system, the utility function of the system is set to include two parts, namely upper control and lower control, so that the utility function U _i,k of the ith slice in the kth large scale time can be modeled as

In formula (8)Is the QOS utility function for slice i, which is the average delay of all active users in slice iAnd average packet discard rateDetermining; Is a spectrum efficiency utility function for slice i, determined by the data rates of all active users in slice i and r _i,t, Δt is a large scale time length, α _i,1、α_i,2、α_i,3 is a positive weighting factor;

The goal of the hierarchical network slice architecture is to achieve optimal system performance based on meeting radio resource constraints, so the optimization problem in hierarchical network slices can be designed as follows:

In equation (9), max is the maximization function, pi _E is the lower control strategy, pi _C is the upper control strategy, pi is the union strategy, U _i,n is the utility function of slice i with respect to index n, X is a discounting factor, X ⁿ goes to zero when n is sufficiently large, the optimization problem has the following constraints:

1) Limiting the total power allocated to each AP to less than the total power of all APs

In the formula (10)Is the total power of APj;

2) Minimum constraint of data rate per slice:

In formula (11) Is u _i to associate the transmission rates of the jth AP and the f PRB,Minimum data rate for a slice;

3) The total data processing rate at each slice of an AP is less than the maximum data processing rate that the AP can achieve:

R _j,i in equation (12) represents the total data processing rate of the jth AP on slice i, Representing the maximum data processing rate of the jth AP;

4) Packet delay constraint per slice:

D _i,t in equation (13) is the total delay of slice i, Indicating the maximum packet delay;

5) Packet loss rate constraint for each slice:

delta _i,t in equation (14) is the packet loss rate for slice i, Representing the minimum packet loss rate;

6)

Equation (15) ensures that each AP can allocate only one PRB for one user, which enables each AP to provide as many users as possible and reduces resource reuse on the same AP to reduce interference;

7)

equation (16) ensures that different APs cannot allocate the same PRB for the same user, Respectively representing association factors of two different APs to a user u _i for the same PRB in t TTIs;

8)

Equation (17) ensures that the same AP can allocate different PRBs for different users, Respectively representing two allocation factors of different PRB pairs to a user u _i aiming at the same AP in t TTIs;

9)

equation (18) ensures that active users in the system must connect to at least one AP and allocated resources, The association factor of APj to user u _i in the t-th TTI is indicated.

The step S4 specifically includes:

step S401, under the configuration of the PRB number of each slice C _k E C, the goal of the lower control strategy learning is to find an optimal strategy capable of obtaining the maximum expected rewards of all states The optimization problem of the underlying control strategy is thus designed as follows to achieve the maximum desired jackpot;

pi _E in the formula (19) is a lower layer control strategy, and C _k is a PRB number configuration of slices;

Step S402, the optimization problem of the lower control strategy can be solved by utilizing MADDPG algorithm, and the AP and the communication network can be respectively used as an intelligent agent and an environment; for the underlying controller, the observed physical layer should dynamically perform actions of radio resource allocation to achieve the maximum expected jackpot for the system;

thus, for an agent

1) State s _j: user channel state information H _j (t) and user queue information Q _j (t) connected to the agent;

s_j＝{Q_j(t),H_j(t)} (20)

2) Action a _j: for APj, the actions correspond to a radio resource allocation method, including power allocation and PRB allocation, and thus the role of the agent at the current time t is expressed as

3) Prize r _j: the rewarding function of each agent is defined as the sum of the spectral efficiency at the AP after each AP allocates PRB and power under constraint, otherwise as negative feedback, and thus, the rewarding function of each agent can be expressed as

R _reg in the formula (22) represents a fixed value;

step S403, the optimization problem of the upper control strategy can be solved by utilizing the DQN algorithm, and for the upper controller, the number of PRBs in each slice should be dynamically configured according to the service flow so as to maximize the overall utility of the system;

thus, for the upper layer controller

1) State s _k: the global state information includes the average arrival rate A _i and the average delay rate of the userAnd average packet loss rate

2) Action a _k: the action space of the upper layer controller corresponds to the PRB number allocation C _k,C_i,k of each slice, which is the PRB configuration number of the slice i; since there is a total of one I slices in the system, the action space can be represented by an I-dimensional vector;

3) Prize r _k: optimal control strategy at a given lower level The convergence goal of the upper control strategy is to maximize the overall utility of the system, thus defining the reward function as the utility of a system that meets the constraint, while a system that does not meet the constraint is negative feedback, specifically expressed as

Equation (25) shows a fixed value, U _i,k is the utility function of the ith slice in the kth large scale time.

The beneficial effects are that: the invention provides a double-time scale wireless access network slicing method in a cell-free distributed large-scale MIMO architecture, which is expanded from a network slicing method in the cell architecture to the cell-free architecture, and combines a layered time model, thereby effectively improving the utilization rate of limited resources, enhancing the real-time property of resource allocation and meeting the diversity of requirements in the future 6G.

Drawings

FIG. 1 is a graph of spectral efficiency of a lower layer controller at a PRB number of 2:4 allocated to URLLC slices (slice 0) and eMBB slices (slice 1), where the red plot represents spectral efficiency of the static resource allocation method;

FIG. 2 is a graph of spectral efficiency of the lower layer controller at 3:3 PRBs allocated to URLLC slices (slice 0) and eMBB slices (slice 1), where the red plot represents spectral efficiency of the static resource allocation method;

FIG. 3 is a graph of spectral efficiency of the lower layer controller at a PRB number of 4:2 allocated to URLLC slices (slice 0) and eMBB slices (slice 1), where the red plot represents spectral efficiency of the static resource allocation method;

fig. 4 is a simulation result of the configuration of the number of control slices PRBs of the upper layer controller.

Detailed Description

The invention is described in detail below with reference to examples:

Assume a 0.5 x 0.5m ² non-cellular distributed massive MIMO system with 2 APs each with 50 antennas. In this coverage area, there are two types of users with different service types, i.e. users with high reliability and ultra-low delay transmission service requirements are divided into URLLC slices, namely slice 0; users requiring high data rate services are divided into eMBB slices, slice 1.

The channel model consists of three parts: path loss, shadowing and small scale fading, which can be expressed asWherein the method comprises the steps ofLet the path fading factor a=3.6, the reference distance is 1,To meet the shadow fading variation of an exponential normal distribution,Representing small scale fading, the elements of which obey a standard Rayleigh distribution

Within the dual time scale network slice structure, the small scale time T dimension is Δt=1 ms transmission time interval, the large scale time k dimension includes Δt TTIs, Δt=10 ms, in each TTI, the total bandwidth W is divided into 6 PRBs shared by all APs, f= {1,2,... The method is characterized by comprising the following steps of:

Step S1, establishing a channel model and an uplink signal transmission model of the distributed massive MIMO system to obtain an uplink channel transmission expression and transmission rate expressions of two types (URLLC and EMBB) of users.

In this embodiment, step S1 specifically includes:

Step S11, consider a fading channel under a multi-connection scene, and model the uplink channel gain between the user u _i and the f PRB and the j AP in the t TTI as follows

Step S12, eMBB, the data transmission rate of which accords with shannon capacity theory, the data transmission rate of eMBB users in the t-th TTI can be modeled as

In formula (3)Representing channel dispersion, Q ^-1 (·) is an inverse gaussian Q function, ρ _i is the average packet length of slice i, ε is the effective decoding error probability, set to 0.05; in the formula (2) and the formula (3)Representing the signal-to-noise ratio, can be modeled as

The additive white gaussian noise power σ ² = -174dBm/Hz in equation (4); Is the power allocated from APj to user u _i in slice i on the f-th PRB in the t-th TTI, and the powers of 0,9, 19, 29 can be selected.

And S2, establishing a slice model, wherein a buffer data queue transmitted according to a first-come first-served strategy is introduced into each AP by a user of each slice, so that two indexes of service quality, namely communication reliability and packet delay expression are obtained.

In this embodiment, step S2 specifically includes:

Step S21, assuming that each user has a data queue on the AP to buffer the incoming data packet, the total data packet length in slice i is Ω _i, where Ω ₀＝1000Byte,Ω₁ =5000 bytes is set, and the data queue is transferred according to the first-come-first-serve policy. In the t-th TTI, the queue length waiting to be sent in the buffer of user u _i in slice i is Q _ui (t), then the queue update procedure of user u _i is

A _i in equation (5) represents the packet arrival rate of the user in slice i, where a ₀＝0.2packets/s,A₁ = 1packets/s,Is the transmission rate of user u _i.

Step S22, dividing the data packet delay of the user into processing delay, transmission delay and queuing delay, wherein the total delay of the slice i in the t-th TTI is

1) The transmission delay refers to the time required to transmit a data packet over the link between the AP and the slice. Thus, the first and second substrates are bonded together, transmission delay for slice i in the t-th TTICan be expressed as

R _i,t in equation (7) is the total transmission rate of slice i;

2) The processing delay refers to the time required for processing the data packet after the AP receives the data request of the corresponding user. Processing delay of slice i in the t-th TTI Can be expressed as

R _j,i in equation (8) represents the total data processing rate of the jth AP on slice i, where R _j,0＝1Mbit/s,R_j,1 = 0.5Mbit/s is set;

3) According to queuing theory, the average waiting time (including waiting time and service time) of arrival of data packets in slice i, namely queuing delay of slice i in TTI Is that

Mu _i in the formula (9) represents the service rate of the user in the slice i, θ _i is the average service rate of each PRB in the slice i, θ ₀＝50bit/s,θ₁＝30bit/s,C_i is the PRB configuration of the slice i, and U _i is the number of users of the slice i, and is set to 3.

Step S23, defining the packet loss rate of the ith slice as the probability that the total delay of packets in slice i exceeds a predefined maximum slice delay threshold. Then the packet drop rate, i.e., the packet loss rate, of the ith slice in the tth TTI may be expressed as

D _i,t in equation (10) is the total delay of slice i,Representing the maximum packet delay acceptable for slice i, pr is a probability symbol; packet delay and reliability will serve as two key indicators for evaluating QoS performance.

And step S3, under the condition that the time delay of the user queue is ensured, the minimum average rate requirement of the slice and the user rate interruption probability are met, and the like, a hierarchical optimization model is established.

In this embodiment, step S3 specifically includes:

Step S31, the upper control policy pi _C converts the dynamic changes of traffic and QoS performance observation into PRB number allocation of each slice, so the upper control policy pi _C can be expressed as a mapping from the global state of the whole network S _k to the appropriate PRB number configuration C _k in the slice, and can be modeled as

A _i in equation (11) represents the packet arrival rate of the user in slice i,Is the average packet delay for the slice i user,The average packet loss rate at the user of slice i, C _i,k is the PRB number configuration for slice i.

Step S32, in each TTI of the kth large-scale time, the lower layer controller maps the observed user information X _t and the PRB number configuration information C _k to the overall radio resource allocation method E _t in the physical layer, and the lower layer control policy pi _E can be modeled as

In equation (12), C _k is the PRB number configuration per slice, Δt is a large scale time length,Is the user queue length in slice i,Is channel state information of the user.Is a binary user association factor, representing AP association and PRB allocation,The power that indicates allocation to user u _i may be one of Z different power levels.

Step S33, in order to maximize the overall utility of the proposed hierarchical network slice optimization system, the utility function of the system is set to include two parts, namely upper control and lower control, so that the utility function of the ith slice in the kth large scale time can be modeled as

In formula (13)Is the QOS utility function for slice i, which is the average delay of all active users in slice iAnd average packet discard rateDetermining; Is a spectrum efficiency utility function for slice i, determined by the data rates of all active users in slice i and r _i,t, Δt is a large scale time length, α _i,1、α_i,2、α_i,3 is a positive weighting factor, set to 1, 10 ⁶,10⁵, respectively.

The goal of the hierarchical network slice architecture is to achieve optimal system performance based on meeting radio resource constraints. Thus, the optimization problem in hierarchical network slicing can be designed as follows:

In equation (14), pi _E is the lower control strategy, pi _C is the upper control strategy, pi is the joint strategy, U _i,n is the utility function of slice i with respect to index n, X is a discounting factor, and X ⁿ tends to zero when n is sufficiently large. The optimization problem has the following constraints:

In formula (15)The total power of all the APs;

2) Minimum constraint of data rate per slice:

In formula (16) Is u _i to associate the transmission rates of the jth AP and the f PRB,Minimum data rate for a slice, where

R _j,i in equation (17) represents the total data processing rate of the jth AP on slice i, Representing the maximum data processing rate of the jth AP, wherein is set

4) Packet delay constraint per slice:

d _i,t in equation (18) is the total delay of slice i, Indicating maximum packet delay, wherein is set

5) Packet loss rate constraint for each slice:

delta _i,t in equation (19) is the packet loss rate for slice i, Representing minimum packet loss rate, wherein is set

6)

Equation (20) ensures that each AP can allocate only one PRB for one user, which enables each AP to provide as many users as possible and reduces resource reuse on the same AP to reduce interference;

7)

equation (21) ensures that different APs cannot allocate the same PRB for the same user, Respectively representing association factors of two different APs to a user u _i for the same PRB in t TTIs;

8)

Equation (22) ensures that the same AP can allocate different PRBs for different users, Respectively representing allocation factors of two different PRB pairs to a user u _i for the same AP in t TTIs;

9)

equation (23) ensures that active users in the system must connect to at least one AP and allocated resources, The association factor of APj to user u _i in the t-th TTI is indicated.

Step S4, providing a double-time scale network slicing method, wherein an upper controller firstly observes user service flow in large scale time by using a DQN algorithm to allocate different numbers of PRBs for each slice, so that PRB resources can be shared among the slices; based on the slice configuration method obtained by the upper controller, the lower controller utilizes MADDPG algorithm to allocate specific PRB and power to each user in the slice according to the channel state and the user queue information in the small-scale time.

In this embodiment, step S4 specifically includes:

Step S41, under the configuration of the PRB number of each slice C _k E C, the goal of the lower control strategy learning is to find an optimal strategy capable of obtaining the maximum expected rewards of all states The optimization problem of the underlying control strategy is thus designed as follows to achieve the maximum desired jackpot;

In step S42, the optimization problem of the underlying control policy can be solved by using MADDPG algorithm, and the AP and the communication network can be respectively used as an agent and an environment. For the underlying controller, the observed physical layer should dynamically perform actions of radio resource allocation to achieve the maximum desired jackpot for the system.

Thus, for an agent

1) State s _j: considering that the packet arrival rate of each slice is always the same and the user queue remains in the same state, the state formula of the agent at the current time t can be simplified to

2) Action a _j: for APj, the actions correspond to one radio resource allocation method, including power allocation and PRB allocation. The effect of the agent at the current instant t is therefore expressed as

3) Prize r _j: the rewarding function of the agent is defined as the sum of the spectral efficiency at the AP after each AP allocates PRBs and power under constraints, otherwise it is defined as negative feedback. Thus, the reward function for each agent may be expressed as

R _reg in equation (27) represents a fixed value.

Step S43, the lower layer controller allocates PRB and power by utilizing MADDPG algorithm, comprising the following steps:

1) Initializing a neural network with random parameters, and setting tracking_ episode =1;

2) Initializing an environment state during each training, observing an initial state s by all APs, and setting time_slot=1;

3) All APs in each TTI perform action selection a according to the observed state, namely PRB allocation and power allocation are performed on the user, and then the environment gives an agent reward r according to whether the action meets the constraint condition or not, and the environment enters the next state s';

4) After transmitting the state transition sequences (s, a, r, s') of all APs, storing them in an experience buffer;

5) The lower layer controller passes through Updating the criticizing network and calculating action gradients of all agents, whereinAs a function of the action value of agent j,A loss function that is a function of the action value;

6) All APs according to Receiving an updated action gradient of an action network;

7) Traversing time_slot1-T _L, wherein time_slot=time_slot+1, updating the user position, and returning to the execution 3);

8) Walk through tracking_ episode 1-K _L, tracking_ episode =tracking_ episode +1, return to execution 2) until the algorithm converges.

Step S44, controlling the strategy at the converged lower layerLower, upper layer control strategyThe optimization problem of (2) is designed as follows to learn an optimal upper control strategy;

in step S45, the optimization problem of the upper layer control strategy can be solved by using the DQN algorithm, and for the upper layer controller, the number of PRBs in each slice should be dynamically configured according to the service traffic, so as to maximize the overall utility of the system.

Thus, for the upper layer controller

1) State s _k: since the user packet arrival rate per slice is a fixed value and the average packet loss rate is determined by the average delay, the state can be reduced to

2) Action a _k: the upper layer controller's action space allocation C _k,C_i,k corresponding to the number of PRBs per slice is the number of PRB configurations for slice i. Since there is a total of one I slices in the system, the action space can be represented by an I-dimensional vector;

3) Prize r _k: optimal control strategy at a given lower level The convergence goal of the upper control strategy is to maximize the overall utility of the system. Thus, the reward function is defined as the utility of a system that satisfies the constraint, whereas a system that does not satisfy the constraint is negative feedback, specifically expressed as

R _reg in the formula (31) represents a fixed value.

Step S46, the upper layer controller controls the PRB number configuration of each slice by using DQN algorithm, including the following steps:

2) Initializing an environment state during each training, observing an initial state s by an upper layer controller, and setting time_slot=1;

3) The upper controller takes an action a based on an epsilon-greedy algorithm according to the observed state, obtains a corresponding reward r, and enters the next state s';

4) After all state transition sequences (s, a, r, s'), storing them in an experience buffer;

5) Updating weights of Q-functions in DQN by performing random gradient descent To minimize the loss function

6) Traversing time_slot1-T _U, time_slot=time_slot+1, returning to execution 3);

7) Walk through tracking_ episode 1-K _U, tracking_ episode =tracking_ episode +1, return to execution 2) until the algorithm converges.

The whole process of dynamic resource allocation of the non-cellular massive MIMO wireless access network by using the method of the invention is presented above.

FIG. 1 is a graph of spectral efficiency of a lower layer controller at a PRB number of 2:4 allocated to URLLC slices (slice 0) and eMBB slices (slice 1), where the red plot represents the spectral efficiency of the static resource allocation method (SRA);

FIG. 2 is a graph of spectral efficiency of the lower layer controller at 3:3 PRBs allocated to URLLC slices (slice 0) and eMBB slices (slice 1), where the red plot represents spectral efficiency of the static resource allocation method (SRA);

FIG. 3 is a graph of spectral efficiency of the lower layer controller at a PRB number of 4:2 allocated to URLLC slices (slice 0) and eMBB slices (slice 1), where the red plot represents the spectral efficiency of the static resource allocation method (SRA);

As can be seen from the above figures, when the MADDPG algorithm is used to learn the underlying control strategy, the optimal performance can be learned in all PRB number configurations. The learning of the lower control strategy converges to about 10000episode, and the performance of the lower control strategy is almost twice that of the SRA strategy.

Fig. 4 is a simulation result of the configuration of the number of control slices PRBs of the upper layer controller, that is, the system utility. As can be seen from the figure, as the learning steps iterate, the DQN algorithm converges to the action of highest rewards, and the PRB resource configuration that maximizes the overall utility of the system is selected according to the set weights, and 6 PRBs are allocated to URLLC tiles and eMBB tiles. Therefore, the upper layer control strategy can obtain an optimal method by solving the upper layer PRB quantity configuration of the slice by using the DQN algorithm.

The invention provides a double-time scale wireless access network slicing method in a cell-free distributed large-scale MIMO architecture, which is expanded from the network slicing method in the cell architecture to the cell-free architecture, and combines a layered time model, thereby effectively improving the utilization rate of limited resources, enhancing the real-time property of resource allocation, meeting the diversity of requirements in the future 6G, serving a plurality of communication scenes and having a certain use value and re-research value.

Claims

1. The method is characterized in that the method is based on a cell-free distributed massive MIMO system architecture, in the distributed massive MIMO system, J Access Points (AP) are connected to a central processing unit, J= {1, 2., J }, each AP is provided with M antennas, users in a coverage range are divided into different slices according to service requirements, and a slice set I= {1, 2., i }, and the users in the slice i are U _i＝{1,2,...,u_i }; in a two-time scale network slice structure, a transmission time interval TTI with a small scale time dimension of Δt=1 ms, a large scale time k dimension comprising Δt TTIs, in each TTI, the total bandwidth W is divided into F physical resource blocks PRB shared by all APs, i.e. f= {1,2,., F }, each PRB allocated a bandwidth of b=w/F; the method specifically comprises the following steps:

step S2, a slicing model is built, a buffer data queue transmitted according to a first-come first-served strategy is introduced into each AP by a user of each slice, so that the data packet delay of the user is divided into processing delay, transmission delay and queuing delay, and two indexes of quality of service QoS, namely communication reliability and packet delay expression are obtained;

Step S4, providing a double-time-scale access network slicing method, wherein an upper layer controller firstly observes user service flow in large-scale time by using a depth Q network DQN algorithm, and allocates different numbers of PRBs for each slice; based on a slice configuration method obtained by an upper controller, a lower controller utilizes a multi-agent depth deterministic strategy gradient MADDPG algorithm to allocate specific PRB and power to each user in a slice according to channel information in small-scale time;

the step S1 specifically includes:

In the formula (2) and the formula (3)Representing signal-to-noise ratio, Δt refers to one TTI, and B is bandwidth; v _j,f,ui,t in equation (3) represents the channel dispersion, ρ _i is the average packet length of slice i, Q ^-1 (·) is the inverse gaussian Q function, and epsilon is the effective decoding error probability;

the step S2 specifically includes:

D _i,t in equation (5) is the total delay of slice i,Representing the maximum packet delay acceptable for slice i, pr is a probability symbol; packet delay and reliability will serve as two key indicators for evaluating QoS performance;

the step S3 specifically includes:

Step S301, the upper control policy pi _C converts the dynamic changes of traffic and QoS performance observation into PRB number allocation of each slice, so the upper control policy pi _C is represented as a mapping from the global state of the whole network S _k to the appropriate PRB number configuration C _k in the slice in the kth large scale time, and can be modeled as

In equation (7), C _k is the PRB number configuration per slice, Δt is a large scale time length,Is the user queue length in slice i,Is the channel state information of the user,Is a binary user association factor, representing AP association and PRB allocation,Indicating that the power allocated to user u _i is one of Z different power levels;

The goal of the hierarchical network slice architecture is to achieve optimal system performance based on meeting radio resource constraints, so the optimization problem in hierarchical network slices is designed as follows:

In the formula (10)Is the total power of APj;

2) Minimum constraint of data rate per slice:

3) The total data processing rate at each slice of an AP is less than the maximum data processing rate achieved by the AP:

4) Packet delay constraint per slice:

5) Packet loss rate constraint for each slice:

6)

7)

8)

equation (17) ensures that the same AP allocates different PRBs for different users, Respectively representing two allocation factors of different PRB pairs to a user u _i aiming at the same AP in t TTIs;

9)

equation (18) ensures that active users in the system must connect to at least one AP and allocated resources, Representing the association factor of APj to user u _i in the t-th TTI;

the step S4 specifically includes:

step S402, utilizing MADDPG algorithm to solve the optimization problem of the lower control strategy, wherein the AP and the communication network are respectively used as an agent and an environment; for the underlying controller, the observed physical layer should dynamically perform actions of radio resource allocation to achieve the maximum expected jackpot for the system;

thus, for an agent

s_j＝{Q_j(t),H_j(t)} (20)

3) Prize r _j: defining the rewarding function of each agent as the sum of the spectral efficiency at the AP after each AP allocates PRB and power under constraint, otherwise as negative feedback, thus the rewarding function of each agent is expressed as

R _reg in the formula (22) represents a fixed value;

Step S403, solving the optimization problem of the upper control strategy by utilizing the DQN algorithm, wherein for the upper controller, the number of PRBs in each slice should be dynamically configured according to the service flow so as to maximize the overall utility of the system;

thus, for the upper layer controller

2) Action a _k: the action space of the upper layer controller corresponds to the PRB number allocation C _k,C_i,k of each slice, which is the PRB configuration number of the slice i; since there is a total of one I slices in the system, the motion space is represented by an I-dimensional vector;

In equation (25), r _reg represents a fixed value, and U _i,k is the utility function of the ith slice in the kth large scale time.