CN115866787A

CN115866787A - Network resource allocation method integrating terminal direct transmission communication and multi-access edge calculation

Info

Publication number: CN115866787A
Application number: CN202211334660.4A
Authority: CN
Inventors: 姜华; 窦增; 丛犁; 黄成斌; 隋吉生; 李佳; 葛晓楠; 苏丛哲
Original assignee: State Grid Jilin Electric Power Corp; Information and Telecommunication Branch of State Grid Jilin Electric Power Co Ltd
Current assignee: State Grid Jilin Electric Power Corp; Information and Telecommunication Branch of State Grid Jilin Electric Power Co Ltd
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2023-03-28

Abstract

The invention provides a network resource allocation method fusing terminal direct transmission communication and multi-access edge calculation, belonging to the technical field of power grid system data transmission and equipment inspection, comprising the following steps: iteratively selecting equipment for terminal direct transmission communication; and according to the selected equipment, selecting unloading positions and spectrum resources by utilizing a pre-trained resource allocation strategy of the DDQN-based deep reinforcement learning framework. The invention realizes multi-stage unloading through MEC, and realizes multiplexing and distributed scheduling of communication resources by using D2D communication technology; a system benefit function combining indexes such as network throughput, power consumption and calculation time delay is established, the problem of benefit maximization under the conditions of link interference and power constraint is solved, and optimal unloading selection and resource allocation are achieved; and a DDQN-based deep reinforcement learning framework is adopted to realize the joint optimization of 5G resource block allocation and calculation unloading, realize the maximization of network throughput and reduce the calculation delay.

Description

Network resource allocation method integrating terminal direct transmission communication and multi-access edge calculation

Technical Field

The invention relates to the technical field of power grid system data transmission and equipment inspection, in particular to a distributed network resource allocation method based on deep reinforcement learning and integrating terminal direct transmission communication and multi-access edge calculation.

Background

The fifth Generation Mobile Communication technology (5th Generation Mobile Communication technology, 5G) high reliability, large connection and low delay characteristics will enable the rapid development of the power industry. With the development of the intelligent power grid, the method has important practical significance for realizing the intelligentization and efficient routing inspection of the transformer substation by using the 5G and the machine learning algorithm. The 5G service smart grid capability can be effectively improved by key technologies such as terminal to Device (D2D) communication and multi-access edge computing (MEC), but the problem of communication resource optimization allocation under D2D spectrum multiplexing and interference conditions needs to be solved.

The MEC may provide sunk cloud computing capabilities within a wireless access network proximate to the terminal device. Applications and services run at the edge of the mobile network, reducing service delays and congestion in other parts of the mobile core network. Wenhe Li et al propose an intelligent control method of an electric working robot based on cloud computing and edge computing. By setting a typical scene of working of the live working robot of the transformer substation, the example verifies that the provided intelligent control method can meet the computing capacity requirement of a transformer substation task. Han et al uses unmanned aerial vehicle as the edge node to assist in carrying out thing networking device task uninstallation and relaying, obtains the maximum system safety capacity through jointly optimizing unmanned aerial vehicle position, task uninstallation rate and uninstallation user allocation, proposes to train the intelligent patrol task allocation mechanism based on deep reinforcement learning, reduces the time delay and the energy consumption of task uninstallation. However, the MEC policy offloading described above only optimizes offloading latency and energy consumption for offloading location and computing power, and does not consider allocation and optimization problems of communication resources. Aiming at the problems of complexity of a transmission environment of a transformer substation, and diversification of data and unloading modes of the data, a high-efficiency wireless resource allocation and scheduling mechanism needs to be researched so as to meet the requirements of non-interference, stability and reliability in data unloading.

The novel cognitive-based D2D network formed by combining the D2D communication and the wireless network obtains the adjacent gain and the channel multiplexing gain through frequency spectrum resource multiplexing, thereby improving the data transmission efficiency of the 5G communication network and meeting the requirement of concurrent access of equipment. Unlike MEC systems, distributed computing D2D networks have more complex topology management requirements and require efficient resource scheduling strategies. Aiming at the problem of cooperative D2D communication resource optimization in an uplink cellular network, under the constraint condition of energy consumption, the frequency spectrum and power resources are optimally distributed by taking the maximization of the average throughput of the network as a target. Emna Fakhfakh et al propose a D2D mode selection scheme based on a new standard, and by introducing noise parameters related to resource allocation, system throughput and cellular traffic offloading efficiency are improved to the maximum extent. In addition, a researcher analyzes the resource allocation problem of maximum cognitive D2D network energy efficiency by adopting a game theory, and the balance of energy efficiency and spectrum efficiency is realized under the constraint of a user communication interference threshold. And the problems of the mode and the resource allocation of the D2D user accessing the cellular network are researched on the basis of the evolution theory, so that the maximum D2D total user data rate is realized. The resource optimization method can obtain the optimal solution when the data volume is small, when the system resource quantity is large, the algorithm solving complexity is increased, and the deep reinforcement learning shows good performance in solving the resource optimization problem. Under the condition of channel interference, the D2D technology is used to realize spectrum multiplexing, and reliable transmission of data of the inspection equipment and efficient utilization of MEC network resources are critical problems to be solved urgently, and are also important for research in this embodiment.

Disclosure of Invention

The invention aims to provide a network resource allocation method based on deep reinforcement learning and integrating terminal direct transfer communication and multistage edge unloading, so as to solve at least one technical problem in the background technology.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides a network resource allocation method for fusing terminal direct transmission communication and multi-access edge calculation, which comprises the following steps:

iteratively selecting equipment for terminal direct transmission communication;

according to the selected equipment, the unloading position, the frequency spectrum resource and the power resource are selected by utilizing a pre-trained resource allocation strategy of a DDQN-based deep reinforcement learning framework;

wherein the training of the resource allocation model comprises:

initializing a strategy randomly, starting an environment simulator, and generating equipment for terminal direct transmission communication, a routing inspection target and a base station integrating multi-access edge calculation; initializing Q-network and Target Q-network, and generating initial weight;

iteratively selecting terminal direct transmission communication equipment, selecting an unloading position according to a resource optimization strategy, and determining power and frequency spectrum to be transmitted;

the environment simulator selects an action from the Q-network according to the epsilon-greedy, enters a new state, calculates the network throughput and energy consumption according to the current spectrum occupation, generates rewards according to a set reward function, calculates a new Q value, and stores the calculated network throughput, energy consumption and the updated Q value in an Experience Replay;

sampling data from an Experience Replay to perform network training; and updating the weight of the Target Q-network at intervals until the LOSS function converges to obtain the finally trained resource allocation model.

Preferably, the resource optimization strategy includes comprehensively considering the requirements of throughput, energy consumption and calculation delay index, and the resource optimization allocation model based on the comprehensive benefit function maximization is established as follows:

s.t.C1:

C2:

m belongs to {1,2, \ 8230, M } represents a set formed by M inspection equipment, and corresponds to transmission links from M equipment to a base station; c _m Represents the capacity of the mth device to base station transmission link;

represents the capacity of the ith receiver of the mth device; tau is _m Representing a calculated time delay; alpha is alpha _k,m Representing the channel multiplexing coefficient, when the kth device-to-device transmission link reuses the spectrum of the mth device-to-base transmission link, then α _k,m =1, otherwise α _k,m ＝0；χ _{[j,m′][m,i]} =1 denotes that the m' th receiver of the jth patrol equipment and the ith receiver of the mth patrol equipment use the same spectrum resource, otherwise χ _{[j,m′][m,i]} ＝0；P _m Representing the transmission power of the mth device to the base station transmission link;

representing the transmission power consumption of the kth device-to-device transmission link;

Representing power consumption when the mth device task is unloaded; p' represents device to deviceThe circuit power consumption of the transmission link is prepared;

Represents the maximum transmit power that the mth device can provide;

Represents the transmission power of the transmission link from the m-th device to the m' -th device;

The transmission power of a D2D link from the jth polling device to the mth polling device;

Represents the peak interference power that the channel can tolerate;

And representing the interference power gain of the D2D link from the jth patrol equipment to the mth' patrol equipment. .

Preferably, at a known transmit power p _m Sum noise power σ ² Under the condition of (1), the signal-to-interference-and-noise ratio gamma of the mth equipment to base station transmission link _m In relation to spectrum resource allocation for device-to-device transmission links:

where K = {1,2, \ 8230; (M-1)/2 } denotes all possible link sets; p is _m And

transmission power, h, of the mth device-to-base station (D2B) link and the kth device-to-device (D2D) transmission link, respectively _m Is the power gain, h, for the mth device to base station transmission link channel _k Representing the interference power gain of the kth D2D transmission link; when the k D2D link reuses the frequency spectrum of the m D2B linkThen α is _k,m =1, otherwise α _k,m ＝0。

According to the expression of the signal to interference and noise ratio, the mth equipment-to-base station transmission link capacity C _m Comprises the following steps:

C _m ＝w·log ₂ (1+γ _m )

where w is the subchannel bandwidth.

Preferably, for the ith receiver of the mth routing inspection equipment, the signal to interference and noise ratio of the ith receiver is higher than that of the mth routing inspection equipment

Expressed as:

wherein,

is the transmission power of the ith receiver of the mth patrol device, g _m,i Is the power gain of the ith receiver of the mth inspection device;

Is the noise power in the received signal, ρ is the interference power of the device-to-base station transmission link multiplexing the same resource block, ρ _D All device-to-device transmission links share the total interference power of the same resource block;

wherein

Represents a spectral multiplexing coefficient, is selected>

Indicating that the nth device-to-base station transmission link and the ith receiver of the mth patrol device share the sameSpectrum, otherwise>

Is the interference power gain of the nth device to base station transmission link; p _n Transmitting power for a device-to-base station transmission link;

wherein

The transmission link transmission power from the jth polling device to the mth polling device;

And the interference power gain of the D2D link from the jth polling device to the mth polling device. Chi shape _[j,m'],[m,i] =1 denotes that the m' th receiver of the jth patrol equipment and the ith receiver of the mth patrol equipment use the same spectrum resource, otherwise χ _[j,m'],[m,i] ＝0。

Finally, the capacity of the ith receiver of the mth patrol inspection device is as follows:

preferably, under the condition of satisfying the rate and delay constraints of the device-to-base station transmission link, the device-to-device transmission link and the device-to-base station transmission link are considered comprehensively, and the network throughput is as follows:

the total consumption E of the system is then:

wherein, tau _m In order to calculate the time delay,

to calculate power consumption, when a task performs local calculation, the processing delay is:

wherein u _m Calculating the data volume for the local; xi shape _m The calculation complexity of the D2D equipment, namely the number of cycles of a central processing unit required for processing 1bit data; f. of _m Representing the CPU frequency of the device.

According to the local calculated amount and the CPU parameter of the inspection equipment, the power consumption of the equipment during task unloading can be calculated

Comprises the following steps:

wherein, κ _m Representing the switched capacitance factor, η _m Is a coefficient factor.

Preferably, since power consumption affects network throughput, a balance adjustment needs to be made to power consumption and throughput in the reward function, the calculation delay condition is used as a penalty to reduce the impact on the reward, and the reward function is:

order to

Representing the energy efficiency of the system, the system reward function can be simplified as:

after normalization processing, the following steps are carried out:

and balancing the power consumption and the throughput by adopting a balance factor lambda to obtain a weighted benefit function.

Preferably, a state space S = V of DDQN is defined _t ×C _t ×G _t ×H _t-1 (ii) a Wherein, V _t ＝{v ₁ ,v ₂ Denotes the unloading position, v ₁ Indicating local offload, v ₂ Represents integrated MEC server offload; c _t ＝{c ₁ ,c ₂ ,……c _g Denotes the information set of g sub-channels, c _g =0 representing that the current sub-channel is unoccupied, c _g = x represents that the subchannel is repeatedly occupied x times at the current time; g _t ＝{g ₁ ,g ₂ ,……,g _v Denotes a set of v link power gains; interference signal strength H received in previous time slot _t-1 Indicating that the results are observed locally at each sub-channel.

Preferably, the action selection of the DDQN includes offloading location, spectrum and power information; define action a = { a ₁ ，A ₂ ，a ₃ In which a is ₁ ∈{0，1}，a ₁ =0 denotes selecting local offload, a ₁ =1 represents selecting an integrated MEC server offload; a. The ₂ A set of assigned subchannels representing channel selection vectors; a is ₁ ∈{p ₁ ,...p _i ,...p _l }，a ₁ ＝p _i Expressed as the ith allocated power p _i L is the number of subchannels; the agent interacts with the environment to generate rewards and update the state after selecting an action.

Preferably, the Loss function is a mean square error function:

the invention has the beneficial effects that: the method comprises the steps of realizing multi-stage unloading through the MEC, realizing multiplexing and distributed scheduling of communication resources by using a D2D communication technology, establishing a system benefit function combining indexes such as network throughput, power consumption and calculation delay, solving the benefit maximization problem under the link interference and power constraint conditions, and realizing optimal unloading selection and resource allocation; and a DDQN-based deep reinforcement learning framework is adopted to realize the joint optimization of 5G resource block allocation and calculation unloading, realize the maximization of network throughput and reduce calculation delay.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of a system model of a D2D auxiliary MEC network according to an embodiment of the present invention.

Fig. 2 is a schematic diagram illustrating an influence of the number of inspection devices on a benefit function according to an embodiment of the present invention.

Fig. 3 is a schematic diagram illustrating an influence of the number of inspection devices on system throughput according to an embodiment of the present invention.

Fig. 4 is a schematic diagram illustrating an influence of the number of subcarriers on the system throughput according to the embodiment of the present invention.

Fig. 5 is a schematic diagram illustrating an influence of the polling device and the number of subcarriers on the MEC offloading probability according to the embodiment of the present invention.

Fig. 6 is a diagram illustrating a variation of a reward function with an epsilon according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below by way of the drawings are illustrative only and are not to be construed as limiting the invention.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

For the purpose of facilitating an understanding of the present invention, the present invention will be further explained by way of specific embodiments with reference to the accompanying drawings, which are not intended to limit the present invention.

It should be understood by those skilled in the art that the drawings are merely schematic representations of embodiments and that the elements shown in the drawings are not necessarily required to practice the invention.

Examples

In this embodiment, for the problem that the 5G technology exists in the application of the substation inspection equipment, the characteristics of the MEC and the D2D technology are considered, a D2D-assisted MEC network offloading algorithm is provided, multi-stage offloading is achieved through the MEC, and multiplexing and distributed scheduling of communication resources are achieved by using the D2D communication technology. In order to realize optimal unloading selection and resource allocation, a system benefit function combining indexes such as network throughput, power consumption and calculation delay is established, and the problem of benefit maximization under the conditions of link interference and power constraint is solved. And finally, a DDQN-based deep reinforcement learning framework is adopted to realize the joint optimization of 5G resource block allocation and calculation unloading, the network throughput is maximized, and the calculation delay is reduced as much as possible.

The D2D-based MEC system resource allocation model is shown in figure 1, and M inspection devices in a substation are matched to finish inspection work by considering that a certain substation exists in the range of a base station with an integrated MEC server, and M D2B links are correspondingly arranged. The Device set and the Device to Base Station (D2B) link set are defined as

The inspection equipment can acquire the position information of other inspection equipment in a D2D communication mode. The inspection equipment collects sensing data and can select to be processed on local equipment or unloaded to a base station (D2B link). The interference at the base station is more controllable and the uplink resources are less used, so it is assumed that each message is processed by a group of receiver machines, which can communicate with other devices separately (the total number of receivers does not exceed M). The uplink spectrum of the D2B link is multiplexed with the D2D link.

The allocation of radio resources is divided into two dimensions, time domain and frequency domain. The Time domain dimension is mainly resource allocation at each Transmission Time Interval (TTI). The total bandwidth is divided into a plurality of equal-bandwidth sub-channels in the frequency domain dimension, and the sub-channels are required to be allocated. A single TTI and a single subchannel constitute a system Resource Block (RB), which is the minimum radio Resource unit required for device data transmission. Thus, the interference to the D2B link comes from background noise and from the D2D link signal sharing the same subband.

At a known transmit power and noise power σ ² Under the condition of (1), the signal-to-interference-and-noise ratio γ of the mth D2B link _m The spectrum resource allocation of the D2D link is closely related, and can be expressed as:

where κ = {1,2, \8230: (M-1)/2 } represents all possible link sets; p _m And

respectively representing the transmission power of the mth D2B link and the kth D2D link; h is _m Is the power gain, h, for the mth D2B channel _k Representing the interference power gain of the k < th > D2D link; alpha is alpha _k,m Represents the channel multiplexing coefficient, when the k D2D link reuses the frequency spectrum of the m D2B link, then alpha _k,m =1, otherwise α _k,m ＝0。

According to the expression of the signal to interference and noise ratio, the mth D2B link capacity C _m Comprises the following steps:

C _m ＝w·log ₂ (1+γ _m ) (2)

where w is the subchannel bandwidth.

Similarly, for the ith receiver of the mth polling device, the signal-to-interference-and-noise ratio thereof

Expressed as:

in the formula (3), the reaction mixture is,

is the transmission power of the ith receiver of the mth patrol device, g _m,i Is the power gain of the ith receiver of the mth patrol equipment;

Is the received noise power, ρ is the interference power of the D2B link multiplexing the same RB, ρ _D Is that all D2D links share the same RB total interference power.

ρ in the formula (3) is shown as (4):

wherein

Represents a spectral multiplexing coefficient, is selected>

Indicating that the nth D2B link and the ith receiver of the mth patrol equipment share the same spectrum, and otherwise->

Is the interference power gain of the nth D2B link; _Pn transmit power for the D2B link.

ρ in the formula (3) _D As shown in (5):

wherein

The interference power gain of the D2D link from the jth polling device to the mth polling device; chi shape _{[j,m′][m,i]} Also represents the spectral reuse factor, χ _{[j,m′][m,i]} =1 denotes that the m' th receiver of the jth patrol equipment and the ith receiver of the mth patrol equipment use the same spectrum resource, otherwise χ _{[j,m′][m,i]} ＝0。

according to a network mathematical model, the D2D technology improves the utilization rate of resources through frequency spectrum multiplexing, but link interference cannot be avoided. Therefore, the quality of the D2D link is to be improved as much as possible under the condition that the rate and delay constraints of the D2B link are satisfied. Considering the D2D and D2B links together, the D2D assisted MEC network throughput can be expressed as:

in this embodiment, the system energy consumption and calculation model is as follows:

the patrol equipment is mostly power limited, so the power consumption of the MEC task calculation and offloading needs to be considered. Since the integrated MEC server is deployed in the network management center and is active, the power consumption limitation of the MEC server can be ignored, and the embodiment mainly calculates the power consumption of the inspection equipment.

Defining the circuit power consumption of the D2D device as P' and the transmission power consumption of the D2D device as

Calculate power consumption as->

The total consumption E of the MEC system is directly related to various power consumptions of the D2D inspection equipment, and is expressed as:

the calculation time delay is another key index of task processing, is closely related to the calculation resources of the inspection equipment or the server, and the more the calculation resources are, the smaller the processing time delay is. Task offloading is mainly divided into two levels, D2D device local offloading and integrated MEC server offloading. Compared with local unloading, the energy supply and calculation capacity of the integrated server end is high, and calculation delay and power consumption generated by the local unloading are mainly considered during algorithm optimization.

Definition of tau _m In order to calculate the time delay,

wherein u is _m Calculating the data volume for the local; xi _m The calculation complexity of the D2D equipment, namely the number of cycles of a central processing unit required for processing 1bit data; f. of _m And the CPU frequency of the D2D inspection equipment is represented.

Is->

In the formula kappa _m Representing the switched capacitance factor, η _m Is a coefficient factor.

In this embodiment, the resource optimization allocation model is specifically as follows:

considering the limitation of the battery capacity of the D2D inspection equipment, the transmission power cannot be infinite, so the transmission power satisfies the following constraint:

wherein

The maximum transmitting power that equipment can provide is patrolled and examined for mth D2D.

In addition, since the interference power may affect the D2D device, resulting in transmission interruption and affecting communication quality, the constraint of the interference power also needs to be satisfied in the resource allocation process:

wherein

Represents the peak interference power that the channel can tolerate, and the other variables are defined as in equation (5).

In view of intelligent routing inspection data transmission and task execution, the resource optimization algorithm in the embodiment needs to improve D2D-assisted MEC network throughput under basic constraint conditions such as power and interference, and guarantees minimum computation delay of sensing data through reasonable task unloading and resource allocation algorithm. Therefore, index requirements such as throughput, energy consumption and time delay are comprehensively considered, and a resource optimization allocation model based on comprehensive benefit function maximization is established, and is as follows:

s.t.C1:

C2:

due to the fact that the value ranges of indexes such as capacity, power consumption and time delay are different, measurement is different, and normalization processing is needed in the optimization solving process. Normalizing the original data x to obtain a result

The specific method comprises the following steps:

wherein x _max Is the maximum value of data, x _mid Half the maximum value of the data.

The optimization problem is a mixed integer nonlinear programming problem, and a plurality of optimization variables are coupled with each other, so that the problem is difficult to solve by using a traditional convex optimization scheme even under all the statistical distributions. In addition, the relationship between the observed value and the optimal resource allocation solution is often implicit and difficult to establish by an analytical method. Therefore, an unloading decision optimization and resource allocation algorithm based on a Deep Reinforcement Learning (DRL) framework is provided, and online interaction between states and a system is realized by using an implicit relation between an observed value and optimal resource allocation.

The DDQN algorithm is specifically as follows:

decision-making capability of deep reinforcement learning and comprehensive reinforcement learning and strong data analysis capability of deep neural network ^[16] The problem of dimension explosion caused by large state space in the Q-Learning algorithm can be solved. The updated mathematical expression is:

wherein R is _t+1 Is a reward, s _t+1 Is the next state, a is the selected action, γ is the attenuation factor for R,

is a parameter of the Q network.

DQN algorithm is updating

A maximum value is chosen and this max operation causes the value function to be overestimated. Thus, dual networks may be employed to select actions and evaluate current state values ^[10] I.e., the DDQN algorithm. The algorithm update process is as follows:

wherein theta is _t And

parameters for the Q network and Target Q network, respectively. The DDQN selects actions from the Q network in a greedy manner, evaluating the Q values in the Target Q network.

In this embodiment, the optimization strategy based on DDQN specifically includes:

there are strict delay and reliability requirements for D2B links, and in DDQN these constraints are expressed directly as reward functions. The goal of the resource management scheme proposed by the present embodiment is to ensure that the delay constraint of the D2B link is met while minimizing the interference of the D2D link to the D2B link.

Since power consumption has an impact on network throughput, a balance between power consumption and throughput needs to be adjusted in the reward function. The delay condition is calculated as a penalty to reduce the impact on the reward. The reward function may be expressed as:

order to

Representing the energy efficiency of the system, the system reward function can be reduced to the following form:

after normalization processing:

it can be seen that the reward function is similar to but not identical to the benefit function, and we balance the power consumption and throughput with an equalization factor λ to obtain a weighted benefit function. Similarly, the data needs to be normalized, and the processing rule is the same as the formula (14).

The observations related to resource allocation are channel and interference information. Defining a state space S = V for DDQN _t ×C _t ×G _t ×H _t-1 Wherein:

1)V _t ＝{v ₁ ,v ₂ denotes the unloading position, v ₁ Denotes local offload, v ₂ Representing integrated MEC server offload; 2) C _t ＝{c ₁ ,c ₂ ,……c _g Denotes the information set of g subchannels, c _g =0 denotes that the current subchannel is unoccupied, c _g = x represents that the subchannel is repeatedly occupied x times at the current time; 3) G _t ＝{g ₁ ,g ₂ ,……,g _v Denotes a set representing v link power gains; 4) Interference signal strength H received in previous time slot _t-1 Indicating local observations at each sub-channel and also including information shared by neighbors, such as the channel index selected by the neighbor in the previous slot.

The action selection of the DDQN includes the offload location, spectrum and power information. Define action a = { a ₁ ，A ₂ ，a ₃ In which a is ₁ ∈{0，1}，a ₁ =0 denotes selecting local offload, a ₁ =1 denotes selecting an integrated MEC server offload. A. The ₂ The representation channel selection vector is a set of allocated subchannels. a is ₁ ∈{p ₁ ,…p _i ,…p _l }，a ₁ ＝p _i Expressed as the ith allocated power p _i And l is the number of subchannels. The agent interacts with the environment to generate rewards and update status after selecting an action. The practice of the DDQN algorithm is performed according to the set reward function, state space and actions, and specific environmental settings and parameters are described below.

The resource optimization allocation method provided by the embodiment is as follows:

the method comprises two stages, namely a training stage and a testing stage. Training and testing data is generated through interaction between the environment simulator and the agent and is used for optimizing the Q-network and the Target Q-network. The initial phase comprises s for each training sample _t 、s _t+1 、a _t And r _t And generating an Experience pool expert Replay, wherein the action selection adopts epsilon-greedy, the action is randomly selected with the probability of 10%, and the action with the maximum Q value is selected with the probability of 90%.

The environment simulator comprises D2D devices and an integrated MEC server and channels thereof, wherein D2D device positions are randomly generated. By selecting the spectrum and power of the D2D link, the simulator can provide s to the agent _t+1 And R _t . In each iteration of the training phase, 50 data are sampled from the Experience Replay, so that the time correlation of the generated data can be suppressed. Then, through Q-network selection action, the Target Q-network is used for evaluation and weight updating, and the Loss function adopts a mean square error function:

initialization of the frequency spectrum and power selection strategy of each D2D link is random, and a utility function is calculated by iteration through Q-network. In the testing phase, actions in the D2D link are selected according to the trained network and evaluated accordingly.

The main steps of resource optimization allocation comprise the following parts:

1) Modeling a system: the system comprises M D2D devices, a routing inspection target and a base station with an integrated MEC server.

2) Parameter definition: parameters such as channel, fading and noise are defined (specific values are shown in table 1), and system resource parameters or variables (optimization targets) are defined.

3) And (3) index calculation: calculating the signal-to-interference-and-noise ratio of the mth D2B according to the model and the parameters; D2B link capacity; calculating the signal-to-interference-and-noise ratio and the capacity of the ith receiver of the mth D2D device; network throughput and power consumption.

4) Describing an algorithm: then, the channel and power distribution is performed by using the dual-network DQN, and the specific process is summarized as follows:

in this embodiment, simulation and analysis of the resource allocation method are provided:

the simulation is configured to: the simulation is based on the Tensorflow 1.0 framework of python. Considering a 500M × 500M substation environment, M D2D inspection devices are randomly generated with a base station with an integrated MEC server at a location 2km from the center of the substation. The channel adopts a rice model, and simulation parameters are shown in table 1.

The network model adopts a BP neural network and comprises an input layer, three hidden layers and an output layer. The number of the neurons of the three hidden layers is respectively 64, 128 and 128, and the activation function is a Rule function.

Table 1 d2d auxiliary MEC network parameters

Comparing the DDQN algorithm provided by the embodiment with an MEC-U algorithm and a Random algorithm Random, wherein the MEC-U algorithm indicates that tasks are unloaded in an integrated MEC server, and the rest part of the tasks are consistent with the algorithm of the embodiment; the Random algorithm represents a Random selection of communication resources and offload locations. The results are shown in fig. 2, which shows that the algorithm used in this example has good performance.

When the number of the inspection equipment is small, the number of the resources provided by the system can meet the communication requirement, and the system benefit functions brought by the two algorithms are close to each other. With the increase of the number of routing inspection equipment, the increase of communication requirements leads to a shortage of resource quantity, and the reuse of spectrum resources leads to a smaller utility function, but compared with the MEC-U algorithm and the Random algorithm, the DDQN framework proposed by the embodiment optimizes the resource allocation strategy to reduce channel interference by deeply mining the invisible relationship between interference and the allocation strategy, and simultaneously reduces the computation delay by offloading the decision to keep the utility function at a higher level. Data simulation results show that the DDQN algorithm provided by the embodiment has certain reliability and effectiveness.

Fig. 3 shows the relationship between system throughput and the number of patrol devices, compared to MEC-U and Random selection. The result shows that the system throughput increases and then decreases with the increase of the number of the routing inspection equipment, because when the number of the routing inspection equipment is small, system resources are not fully utilized, and the amount of data to be transmitted in the network is limited. The system throughput decreases after the number of patrol devices reaches a certain number because the channel interference increases due to limited network resources. The unloading and resource allocation strategy obtained by the DDQN algorithm is obviously superior to MEC-U and Random allocation, and channel interference can be better resisted by reasonably selecting an unloading position and an efficient scheduling strategy, so that the DDQN algorithm has excellent performance.

The present embodiment then studies the variation of system throughput with the number of subcarriers and compares it with the Random and AFSA algorithms. As a result, as shown in fig. 4, the greater the number of subcarriers, the greater the system throughput. This is because when the resources are sufficient, the interference between channels is less, more data is selected to offload computation at the integrated MEC server, and the system throughput increases. Compared with the resource allocation strategies of the AFSA algorithm and the Random algorithm, the DDQN algorithm provided by the embodiment is better, the channel interference of the system is less, and the throughput is higher.

The present embodiment researches the influence of the number of polling devices and the number of subcarriers d on the offloading policy, and the result is shown in fig. 5. The probability of selecting for offloading at the integrated MEC server increases with the number of subcarriers and decreases with the number of patrol devices. This is because when the resource is relatively loose relative to the communication requirement, the selection of the integrated MEC server for offloading can reduce the computation delay, when the resource is relatively short, the interference between the systems increases, and the selection of the local offloading with more tasks can reduce the interference to ensure the reliability and effectiveness of the system.

Fig. 6 researches the training results of rewards of different algorithms, and it can be seen that the DDQN algorithm can select actions in a training set, thereby improving rewards, mining implicit relations between resource allocation and rewards, having higher rewards than random allocation, and showing good performance.

In the intelligent power inspection process, the resource sinking is realized through the MEC, so that the pressure of a core network can be relieved, and the quick calculation service is provided. In the embodiment, for the requirement of interconnection and intercommunication between routing inspection devices, the MEC and D2D technologies are combined to establish a D2D-assisted MEC network, in order to reduce interference between different links, a 5G resource optimization problem with throughput, power consumption and computation delay as indexes is established, and the effectiveness of an algorithm is solved and verified through a DDQN framework and simulation. In the following work, game calculation is carried out on data transmission of different routing inspection equipment, and the driving track is optimally designed.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts based on the technical solutions disclosed in the present invention.

Claims

1. A network resource allocation method for integrating terminal direct transmission communication and multi-access edge calculation is characterized by comprising the following steps:

iteratively selecting equipment for terminal direct transmission communication;

according to the selected equipment, the unloading position and the spectrum resource are selected by utilizing a pre-trained resource allocation strategy of a DDQN-based deep reinforcement learning framework;

wherein the training of the resource allocation strategy comprises:

initializing a strategy randomly, starting an environment simulator, and generating equipment for terminal direct transmission communication, a routing inspection target and a base station for integrating multi-access edge calculation; initializing Q-network and Target Q-network to generate initial weight;

the environment simulator selects an action from the Q-network according to an epsilon-greedy method, enters a new state, calculates network throughput and energy consumption according to the current spectrum occupation, generates rewards according to a set reward function and calculates a new Q value, and stores the calculated network throughput, energy consumption and the updated Q value in an Experience Replay;

sampling data from an expeience Replay to perform network training; and updating the weight of the Target Q-network at intervals until the LOSS function converges to obtain the finally trained resource allocation model.

2. The network resource allocation method fusing terminal direct transfer communication and multi-access edge computing according to claim 1, wherein the resource optimization strategy includes comprehensively considering the requirements of throughput, energy consumption and time delay indexes, and establishing a resource optimization allocation model based on the maximization of a comprehensive benefit function is as follows:

s.t.C1:

C2:

m pieces of inspection equipment in the transformer substation are matched to finish inspection work, and M pieces of transmission links from the equipment to a base station are correspondingly arranged; c _m Represents the capacity of the mth device to base station transmission link;

represents the capacity of the ith receiver of the mth device; tau is _m Representing a calculated time delay; alpha is alpha _k,m Representing the channel multiplexing coefficient, alpha when the kth device-to-device transmission link reuses the spectrum of the mth device-to-base station transmission link _k,m =1, otherwise α _k,m ＝0；χ _{[j,m′][m,i]} =1 denotes that the mth receiver of the jth inspection device and the ith receiver of the mth inspection device use the same spectrum resource, otherwise χ _{[j,m′][m,i]} ＝0；P _m Representing the transmission power of the mth device to the base station transmission link;

Representing power consumption when the mth device task is unloaded; p' represents the circuit power consumption of the device-to-device transmission link;

Represents the maximum transmit power that the mth device can provide;

Represents a transmission power of a transmission link of the mth device to the mth' device;

Represents the peak interference power that the channel can tolerate;

And representing the interference power gain of the D2D link from the jth patrol equipment to the mth' patrol equipment.

3. The method of claim 2, wherein the method for allocating network resources is implemented at a known transmit power and noise power σ ² Under the condition of (1), the signal-to-interference-and-noise ratio gamma of the mth equipment to base station transmission link _m In relation to spectrum resource allocation for device-to-device transmission links:

where K = {1,2, \8230;, K = M · (M-1)/2 } represents all possible sets of links; h is _m Is the power gain, h, for the mth device to base station transmission link channel _k Representing an interference power gain of the kth device-to-device transmission link;

according to the expression of signal-to-interference-and-noise ratio, the mth equipment-to-base station transmission link capacity C _m Comprises the following steps:

C _m ＝w·log ₂ (1+γ _m )

where w is the subchannel bandwidth.

4. The method according to claim 3, wherein the SINR of the ith receiver of the mth patrol equipment is higher than that of the ith receiver of the mth patrol equipment

Expressed as:

wherein,

transmission power of ith receiver of mth patrol equipment, g _m,i Is the power gain of the ith receiver of the mth patrol equipment;

Is the received noise power, ρ is the interference power of the device-to-base station transmission link multiplexing the same resource block, ρ _D All device-to-device transmission links share the total interference power of the same resource block;

wherein

Represents a spectral multiplexing coefficient, is selected>

Indicating that the nth device to base station transmission link and the ith receiver of the mth patrol device share the same spectrum, and otherwise ≥ is present>

Is the interference power gain of the nth device to base station transmission link; p is _n Transmitting power for the device-to-base station transmission link;

wherein

And the interference power gain of the D2D link from the jth polling device to the mth polling device.

5. the method for allocating network resources integrating terminal direct transfer communication and multi-access edge computing as claimed in claim 4, wherein, under the condition of satisfying rate and delay constraints of the device-to-base station transmission link, the device-to-device transmission link and the device-to-base station transmission link are considered comprehensively, and the network throughput is:

the total consumption E of the system is:

wherein, tau _m In order to calculate the time delay,

to calculate the power consumption, when the task performs local calculation, the processing delay is:

wherein u is _m Calculating the data volume for the local; xi _m The calculation complexity of the D2D equipment, namely the number of cycles of a central processing unit required for processing 1bit data; f. of _m Representing the CPU frequency of the device.

Comprises the following steps:

wherein, κ _m To representSwitched capacitance factor, η _m Is a coefficient factor.

6. The method according to claim 5, wherein the power consumption affects the throughput of the network, and a proportional adjustment needs to be made to the power consumption and the throughput in the reward function, and the calculation delay condition is used as a penalty to reduce the impact on the reward, and the reward function is:

order to

after normalization processing, the following steps are carried out:

7. The method according to claim 6, wherein a state space S = V of DDQN is defined _t ×C _t ×G _t ×H _t-1 (ii) a Wherein, V _t ＝{v ₁ ,v ₂ Denotes the unloading position, v ₁ Indicating local offload, v ₂ Represents integrated MEC server offload; c _t ＝{c ₁ ,c ₂ ,……c _g Denotes the information set of g sub-channels, c _g =0 denotes that the current subchannel is unoccupied, c _g = x represents that the subchannel is repeatedly occupied x times at the current time; g _t ＝{g ₁ ,g ₂ ,……,g _v Denotes a set of v link power gains; the strength H of the interference signal received in the previous time slot _t-1 Indicating that the results are observed locally at each sub-channel.

8. The method for allocating network resources according to claim 7, wherein the action selection of DDQN includes offloading location, spectrum and power information; define action a = { a ₁ ，A ₂ ，a ₃ In which a is ₁ ∈{0，1}，a ₁ =0 denotes selecting local offload, a ₁ =1 denotes selecting integrated MEC server offload; a. The ₂ A set of assigned subchannels representing channel selection vectors; a is ₁ ∈{p ₁ ,…p _i ,…p _l }，a ₁ ＝p _i Denotes the allocated power as p _i L is the number of subchannels; the agent interacts with the environment to generate rewards and update the state after selecting an action.

9. The method for distributing network resources based on the combination of terminal direct communication and multiple access edge computing as claimed in claim 8, wherein the Loss function is a mean square error function: