CN116600344A

CN116600344A - Multi-layer MEC resource unloading method with power cost difference

Info

Publication number: CN116600344A
Application number: CN202310443772.1A
Authority: CN
Inventors: 王波; 郭城瑞; 黄冬艳; 王旭; 谢心颖; 吕佳齐; 卢泽林; 方宇航; 谢杰成
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2023-08-15

Abstract

The application discloses a multi-layer MEC resource unloading method with electric power cost difference, which comprises the steps of firstly establishing a network model of the multi-layer MEC resource with the electric power cost difference; then establishing a communication model and a calculation model under different resource levels; allocating channel resources by using a sub-channel user recombination algorithm based on NOMA; and finally, converting the optimization problem into an equivalent reinforcement Learning problem by using a Q-Learning-based calculation unloading and resource allocation algorithm, and converging a Q table through training of the intelligent agent so as to guide the unloading decision of the base station intelligent agent. The problem is expressed as a mixed integer programming problem by combining unloading decision and resource allocation with the weighted sum of the time cost and unloading cost of all users as an optimization target, and a solution scheme for optimizing transmission and unloading based on NOMA and Q-Learning is provided. Simulation results show that the multi-layer MEC architecture is superior to the traditional single-layer MEC architecture, and meanwhile, the algorithm is superior to other basic algorithms in solving.

Description

A multi-layer MEC resource offloading method with power cost difference

技术领域Technical Field

本发明涉及物联网和资源分配领域，具体是一种具有电力成本差异的多层MEC资源卸载方法。The present invention relates to the field of Internet of Things and resource allocation, and in particular to a multi-layer MEC resource unloading method with power cost difference.

背景技术Background Art

近年来，随着物联网服务的爆炸式增长，催生出许多新兴的业务类型，如3D游戏、远程医疗以及AI训练等。由于用户设备(User Equipment,UE)受计算能力和电池容量的限制，很难执行需要大量计算能力的计算任务。为了解决这一问题，多址边缘计算(Multi-Access Edge Computing,MEC)被提出以拓展用户设备的计算能力。通过将计算任务卸载到计算能力更强的MEC服务器上，可以显著的降低用户设备的计算时延和能耗。但随着用户接入量不断增加，现有的本地MEC设施已经无法承载如此庞大规模的计算需求。因此，积极拓展更多的MEC资源为用户提供服务成为我们亟需解决的问题。In recent years, with the explosive growth of IoT services, many emerging business types have emerged, such as 3D games, telemedicine, and AI training. Due to the limitations of computing power and battery capacity of user equipment (UE), it is difficult to perform computing tasks that require a lot of computing power. In order to solve this problem, Multi-Access Edge Computing (MEC) was proposed to expand the computing power of user equipment. By offloading computing tasks to MEC servers with stronger computing power, the computing latency and energy consumption of user equipment can be significantly reduced. However, with the increasing number of user access, the existing local MEC facilities can no longer bear such a large-scale computing demand. Therefore, actively expanding more MEC resources to provide services to users has become an issue that we urgently need to solve.

低轨道(Low earth orbit,LEO)卫星通过太阳能和化学能发电来维持自身的电力需求，电力成本较低。Low earth orbit (LEO) satellites maintain their own power needs through solar and chemical power generation, and the power cost is relatively low.

随着LEO-MEC和W-CS的接入，将东部用户的部分算力需求进行迁移，可以有效缓解本地MEC的算力负担。由于不同层次间的计算资源在响应时延和电力成本上存在很大差异，这对问题建模提出了很大的挑战。With the access of LEO-MEC and W-CS, migrating part of the computing power requirements of users in the east can effectively alleviate the computing power burden of local MEC. Since computing resources at different levels vary greatly in response latency and power costs, this poses a great challenge to problem modeling.

文献“Xie R,Tang Q,Wang Q,et al.Satellite-terrestrial integrated edgecomputing networks:Architecture,challenges,and open issues[J].Ieee Network,2020,34(3):224-231”中对星地融合下MEC的体系架构进行了研究，将卫星计算资源列入多层异构边缘计算集群中并提出了有前景的技术挑战，包括协同计算卸载、多节点任务调度等。文献“Tang Z,Zhou H,Ma T,et al.Leveraging LEO Assisted Cloud-EdgeCollaboration for Energy Efficient Computation Offloading[C]//2021IEEE GlobalCommunications Conference.IEEE,2021:1-6.”和“Cheng N,Lyu F,Quan W,et al.Space/aerial-assisted computing offloading for IoT applications:Alearning-basedapproach[J].IEEE Journal on Selected Areas in Communications,2019,37(5):1117-1129”分别展示了一种包含LEO卫星的MEC综合网络，将计算任务通过无人机或LEO卫星中继卸载到云服务器上，但没有考虑星上的处理能力，造成了空间资源的浪费，同时以中继的方式卸载到云服务器会带来额外的传播时延，影响用户的体验质量。文献“Song Z,Hao Y,LiuY,et al.Energy-efficient multiaccess edge computing for terrestrial-satelliteInternet of Things[J].IEEE Internet of Things Journal,2021,8(18):14202-14218”通过地面卫星终端(TST)将本地的部分计算任务卸载到卫星MEC服务器处理，并提出了一种节能计算卸载与资源分配算法，在不违反任务容忍延迟的情况下，使本地设备的加权能量和最小化。文献“Tang Q,Fei Z,Li B,et al.Computation offloading in leo satellitenetworks with hybrid cloud and edge computing[J].IEEE Internet of ThingsJournal,2021,8(11):9164-9176”展示了一种具有三层计算架构的混合云和LEO卫星的MEC网络，提出了一种利用乘子交替方向法逼近最优解的分布式算法，使地面用户计算卸载的总能耗最小化。可以看出，这部分学者都是通过一些优化策略将部分任务卸载到卫星或地面云，以此来降低本地设备的能耗，并没有考虑卫星和地面云服务器计算处理的能耗。The paper "Xie R, Tang Q, Wang Q, et al. Satellite-terrestrial integrated edge computing networks: Architecture, challenges, and open issues [J]. Ieee Network, 2020, 34 (3): 224-231" studied the architecture of MEC under satellite-ground integration, included satellite computing resources in multi-layer heterogeneous edge computing clusters, and proposed promising technical challenges, including collaborative computing offloading and multi-node task scheduling. The papers "Tang Z, Zhou H, Ma T, et al. Leveraging LEO Assisted Cloud-Edge Collaboration for Energy Efficient Computation Offloading[C]//2021IEEE Global Communications Conference. IEEE, 2021: 1-6." and "Cheng N, Lyu F, Quan W, et al. Space/aerial-assisted computing offloading for IoT applications: A learning-based approach[J]. IEEE Journal on Selected Areas in Communications, 2019, 37(5): 1117-1129" respectively present a MEC integrated network including LEO satellites, which offloads computing tasks to cloud servers via drones or LEO satellite relays. However, the processing capacity on the satellite is not considered, resulting in a waste of space resources. At the same time, offloading to cloud servers via relays will bring additional propagation delays, affecting the user experience quality. The paper "Song Z, Hao Y, Liu Y, et al. Energy-efficient multiaccess edge computing for terrestrial-satellite Internet of Things [J]. IEEE Internet of Things Journal, 2021, 8(18): 14202-14218" offloads some local computing tasks to the satellite MEC server through the ground satellite terminal (TST), and proposes an energy-saving computing offloading and resource allocation algorithm to minimize the weighted energy sum of local devices without violating the task tolerance delay. The paper "Tang Q, Fei Z, Li B, et al. Computation offloading in leo satellite networks with hybrid cloud and edge computing [J]. IEEE Internet of Things Journal, 2021, 8(11): 9164-9176" presents a hybrid cloud and LEO satellite MEC network with a three-layer computing architecture, and proposes a distributed algorithm that uses the alternating direction method of multipliers to approximate the optimal solution to minimize the total energy consumption of ground user computing offloading. It can be seen that these scholars use some optimization strategies to offload some tasks to satellites or ground clouds to reduce the energy consumption of local devices, but do not consider the energy consumption of satellite and ground cloud server computing and processing.

文献“Wu J,Jia M,Zhang L,et al.DNNs Based Computation Offloading forLEO Satellite Edge Computing[J].Electronics,2022,11(24):4108”在系统中考虑了LEO-MEC的计算能耗，提出了一种基于深度学习的LEO卫星边缘计算网络卸载算法，对系统能耗和时延的加权和进行了优化。文献“Cao X,Yang B,Shen Y,etal.Edge-AssistedMulti-Layer Offloading Optimization of LEO Satellite-Terrestrial IntegratedNetworks[J].IEEE Journal on Selected Areas in Communications,2022”研究了一种多层多接入的MEC系统，将MEC理念扩展到LEO卫星边缘，并制定了计算和通信资源联合优化问题，通过采用经典的交替优化方法对原问题进行分解求解，验证了该方案低计算时延和低能量消耗的构想。The paper "Wu J, Jia M, Zhang L, et al. DNNs Based Computation Offloading for LEO Satellite Edge Computing [J]. Electronics, 2022, 11 (24): 4108" considers the computational energy consumption of LEO-MEC in the system, proposes a deep learning-based LEO satellite edge computing network offloading algorithm, and optimizes the weighted sum of system energy consumption and latency. The paper "Cao X, Yang B, Shen Y, et al. Edge-Assisted Multi-Layer Offloading Optimization of LEO Satellite-Terrestrial Integrated Networks [J]. IEEE Journal on Selected Areas in Communications, 2022" studies a multi-layer multi-access MEC system, extends the MEC concept to the edge of LEO satellites, and formulates a joint optimization problem of computing and communication resources. By using the classic alternating optimization method to decompose and solve the original problem, the concept of low computational latency and low energy consumption of the scheme is verified.

文献“Hossain M D,Sultana T,Hossain M A,et al.Fuzzy decision-basedefficient task offloading management scheme in multi-tier MEC-enablednetworks[J].Sensors,2021,21(4):1484”提出了一种基于模糊决策的云-MEC协同任务卸载方案，用户根据服务器容量、延迟敏感性和网络条件选择最优的任务卸载目标节点，显著提高了任务卸载成功的执行率，同时减少了任务的完成时间。文献“Tong M,Wang X,Li S,et al.Joint Offloading Decision and Resource Allocation in Mobile EdgeComputing-Enabled Satellite-Terrestrial Network[J].Symmetry,2022,14(3):564.”提出了一种星地MEC网络的联合卸载决策和资源分配方案，使所有终端任务的完成延迟最小化，确保了大多数用户的时延需求。The paper "Hossain M D, Sultana T, Hossain M A, et al. Fuzzy decision-based efficient task offloading management scheme in multi-tier MEC-enabled networks[J]. Sensors, 2021, 21(4): 1484" proposed a cloud-MEC collaborative task offloading scheme based on fuzzy decision-making. Users select the optimal task offloading target node based on server capacity, delay sensitivity and network conditions, which significantly improves the successful execution rate of task offloading and reduces the task completion time. The paper "Tong M, Wang X, Li S, et al. Joint Offloading Decision and Resource Allocation in Mobile Edge Computing-Enabled Satellite-Terrestrial Network[J]. Symmetry, 2022, 14(3): 564" proposed a joint offloading decision and resource allocation scheme for satellite-terrestrial MEC networks, which minimizes the completion delay of all terminal tasks and ensures the latency requirements of most users.

发明内容Summary of the invention

为了充分利用LEO卫星以及西部地区廉价的电力优势，为用户设备提供更多计算卸载的机会，同时解决多层次计算资源卸载差异所带来的卸载选择问题，本发明提供了一种具有电力成本差异的多层MEC资源卸载方法。In order to make full use of the advantages of LEO satellites and cheap electricity in the western region, provide more computing offloading opportunities for user devices, and solve the offloading selection problem caused by the differences in multi-level computing resource offloading, the present invention provides a multi-layer MEC resource offloading method with electricity cost differences.

实现本发明目的的技术方案是：The technical solution for achieving the purpose of the present invention is:

一种具有电力成本差异的多层MEC资源卸载方法，包括如下步骤：A multi-layer MEC resource offloading method with power cost difference comprises the following steps:

(1)建立具有电力成本差异的多层MEC资源的网络模型；(1) Establish a network model with multiple layers of MEC resources with different power costs;

(2)建立不同资源层次下的通信模型和计算模型；(2) Establish communication models and computational models at different resource levels;

(3)利用基于NOMA的子信道用户重组算法，对信道资源进行分配；(3) Allocate channel resources using the NOMA-based sub-channel user reorganization algorithm;

(4)利用基于Q-Learning的计算卸载和资源分配算法，将优化问题转化为等效的强化学习问题，通过对智能体的训练，使Q表收敛，以此指导基站智能体的卸载决策。(4) Using the Q-Learning-based computation offloading and resource allocation algorithm, the optimization problem is transformed into an equivalent reinforcement learning problem. By training the intelligent agent, the Q table is converged to guide the offloading decision of the base station agent.

进一步的：Further:

步骤(1)所述建立具有电力成本差异的多层MEC资源的网络模型，包括：The step (1) of establishing a network model of multi-layer MEC resources with different power costs includes:

1)用户设备通过基站将计算任务卸载到目标服务器中进行处理，其中，基站(BaseStation,BS)作为用户设备的接入点，集成了传统蜂窝基站和地面卫星终端两种设备，分别采用C波段和Ka波段与地面设备、低轨卫星进行通信；1) The user equipment offloads the computing tasks to the target server for processing through the base station. The base station (BS) is the access point of the user equipment and integrates two types of equipment: traditional cellular base stations and ground satellite terminals. It uses C-band and Ka-band to communicate with ground equipment and low-orbit satellites respectively;

2)设基站的覆盖范围内有M个用户，每个用户都有一个计算任务d_i(i∈M)需要卸载处理且计算任务不可划分，用户的任务数据集合记为共有三层计算资源为用户提供计算服务；2) Assume that there are M users within the coverage of the base station. Each user has a computing task d _i (i∈M) that needs to be offloaded and the computing task cannot be divided. The user's task data set is recorded as There are three layers of computing resources that provide computing services to users;

3)用户设备采用NOMA技术将计算任务上传到基站，基站控制器根据用户的任务属性和各层次资源的实时状态对用户任务进行统一调度；当计算任务在本地MEC服务器卸载时记a＝1，反之a＝0；当计算任务在卫星MEC服务器卸载时记b＝1，反之b＝0；当计算任务在西部云服务器卸载时记c＝1，反之c＝0。将卸载决策记为 3) User equipment uses NOMA technology to upload computing tasks to the base station. The base station controller uniformly schedules user tasks according to the user's task attributes and the real-time status of resources at all levels; when the computing task is unloaded on the local MEC server, a = 1 is recorded, otherwise a = 0; when the computing task is unloaded on the satellite MEC server, b = 1 is recorded, otherwise b = 0; when the computing task is unloaded on the western cloud server, c = 1 is recorded, otherwise c = 0. The unloading decision is recorded as

所述为用户提供计算服务的三层计算资源分别为：The three layers of computing resources that provide computing services to users are:

①基站侧的本地MEC服务器；① Local MEC server on the base station side;

②低轨卫星上搭载的卫星MEC服务器；② Satellite MEC server carried on low-orbit satellite;

③西部地区搭建的西部云服务器。③Western cloud server built in the western region.

步骤(2)所述建立不同资源层次下的通信模型，包括：The step (2) of establishing communication models at different resource levels includes:

1)采用NOMA传输方案，假设每个子信道可同时被两个用户占用，且子信道内的用户相互正交，因此M个用户被分为N对，即有N个子信道，记为 1) Using the NOMA transmission scheme, it is assumed that each subchannel can be occupied by two users at the same time, and the users in the subchannel are orthogonal to each other. Therefore, M users are divided into N pairs, that is, there are N subchannels, denoted as

2)在接收端，根据NOMA上行链路信道增益降序解码的原则进行解码；在解码第一个用户的时候将第二个用户视为干扰，故第一个解码用户的传输速率为：2) At the receiving end, decoding is performed according to the principle of NOMA uplink channel gain descending decoding; when decoding the first user, the second user is regarded as interference, so the transmission rate of the first decoded user is:

在解码第二个用户的时候便无用户信号进行干扰，其传输速率为：When decoding the second user, there is no user signal to interfere, and the transmission rate is:

其中,B是子载波带宽；P表示用户的发射功率；n₀表示噪声功率。Among them, B is the subcarrier bandwidth; P represents the user's transmission power; n ₀ represents the noise power.

3)对于基站覆盖范围内的用户设备，统一采用基站控制器对用户任务进行批处理，由于基站与卫星的距离为定值，故将地星通信的传输速率设为常数R_GS，星地通信的传输速率为常数R_SG。3) For user equipment within the coverage of the base station, the base station controller is uniformly used to batch process user tasks. Since the distance between the base station and the satellite is a fixed value, the transmission rate of the ground-satellite communication is set to a constant R _GS , and the transmission rate of the satellite-ground communication is set to a constant R _SG .

步骤(2)所述建立不同资源层次下的计算模型，包括：The step (2) of establishing a computing model at different resource levels includes:

1)将用户任务卸载的总成本分为时间成本和卸载成本，并通过权重因子ω和υ来表示其占总成本的比重。1) The total cost of user task offloading is divided into time cost and offloading cost, and the weight factors ω and υ are used to represent their proportion in the total cost.

其中时间成本由传输、传播和处理时延组成，与用户到目标服务器的距离有关；卸载成本由传输开销和计算开销组成，与能耗和当地的单位电价有关；The time cost is composed of transmission, propagation and processing delays, which is related to the distance from the user to the target server; the offloading cost is composed of transmission overhead and computing overhead, which is related to energy consumption and local unit electricity price;

2)本地MEC服务器给UEi分配的计算能力为f_i ^L，每比特数据所需要的CPU周期数为β(Cycles/bit)，故本地卸载的时间成本为：2) The computing power allocated by the local MEC server to UEi is f _i ^L , and the number of CPU cycles required for each bit of data is β (Cycles/bit). Therefore, the time cost of local offloading is:

式中第一项为UE到BS的传输时延；第二项为计算任务的处理时延；The first term is the transmission delay from UE to BS; the second term is the processing delay of the computing task;

UEi的传输功率为P_i，则UE到BS的传输能耗为：The transmission power of UEi is _Pi , and the transmission energy consumption from UE to BS is:

采用动态电压和频率缩放技术，动态功耗P与V²f成正比，在低电压的限制下CPU芯片的计算频率f与电源电压V近似成线性关系，即V＝af；将CPU的功耗建模为P＝εf³，其中f为CPU的计算频率，ε是芯片架构的系数，则用户i在本地MEC服务器l的计算能耗为：Using dynamic voltage and frequency scaling technology, the dynamic power consumption P is proportional to V ² f. Under the low voltage limit, the computing frequency f of the CPU chip is approximately linearly related to the power supply voltage V, that is, V = af. The power consumption of the CPU is modeled as P = εf ³ , where f is the computing frequency of the CPU and ε is the coefficient of the chip architecture. The computing energy consumption of user i on the local MEC server l is:

E_i,l ^C＝ε(f_i ^L)²d_iβ (6)E _i,l ^C =ε(f _i ^L ) ² d _i β (6)

本地的单位电价为p^L，可得本地卸载的成本函数为：The local unit electricity price is p ^L , and the cost function of local unloading is:

U_i ^L(d,f)＝ωT_i ^L+υ(E_i ^Tp^L+E_i,l ^Cp^L) (7)U _i ^L (d,f)＝ωT _i ^L +υ(E _i ^T p ^L +E _i,l ^C p ^L ) (7)

3)BS到接入卫星的距离为H，卫星MEC服务器给UEi分配的计算能力为f_i ^S，故卫星卸载的时间成本为：3) The distance from BS to the access satellite is H, and the computing power allocated by the satellite MEC server to UEi is _fi ^S , so the time cost of satellite unloading is:

式中第一项为UE到BS的传输时延；第二、第三项为地星通信的上行传输、传播时延，第四项为计算任务的处理时延；The first term is the transmission delay from UE to BS; the second and third terms are the uplink transmission and propagation delays of earth-satellite communication; the fourth term is the processing delay of the computing task;

BS的传输功率为P_GS,则BS到卫星s的传输能耗为：The transmission power of BS is P _GS , then the transmission energy consumption from BS to satellite s is:

UEi在卫星MEC服务器s的计算能耗为：The computing energy consumption of UEi on satellite MEC server s is:

E_i,s ^C＝ε(f_i ^S)²d_iβ (10)E _i,s ^C =ε(f _i ^S ) ² d _i β (10)

卫星服务器的单位电价为p^S,可得卫星卸载的成本函数为：The unit electricity price of the satellite server is p ^S , and the cost function of satellite unloading is:

U_i ^S(d,f)＝ωT_i ^S+υ(E_i ^Tp^L+E_gs ^Tp^L+E_i,s ^Cp^S) (11)U _i ^S (d,f)＝ωT _i ^S +υ(E _i ^T p ^L +E _gs ^T p ^L +E _i,s ^C p ^S ) (11)

4)西部云服务器给UEi分配的计算能力为f_i ^W,故西部卸载的时间成本为：4) The computing power allocated by the western cloud server to UEi ^is _fiW , so the time cost of unloading in the west is:

式中第一项为UE到BS的传输时延；第二、第三项为地星、星地通信的传输时延，第四项为BS经卫星中继到西部云服务器的传播时延，第五项为计算任务在西部云服务器的处理时延。The first item in the formula is the transmission delay from UE to BS; the second and third items are the transmission delays of ground-satellite and satellite-ground communications; the fourth item is the propagation delay from BS to the western cloud server via satellite relay; and the fifth item is the processing delay of the computing task in the western cloud server.

卫星的传输功率为P_SG,则卫星s到西部云服务器w的传输能耗为：The transmission power of the satellite is P _SG , and the transmission energy consumption from satellite s to western cloud server w is:

UEi在西部云服务器w的计算能耗为：The computing energy consumption of UEi on the western cloud server w is:

E_i,w ^C＝ε(f_i ^W)²d_iβ (13)E _i,w ^C =ε(f _i ^W ) ² d _i β (13)

西部云服务器的单位电价为p^W,可得西部卸载的成本函数为：The unit electricity price of the western cloud server is p ^W , and the cost function of unloading in the west is:

U_i ^W(d,f)＝ωT_i ^W+υ(E_i ^Tp^L+E_gs ^Tp^L+E_sg ^Tp^S+E_i,w ^Cp^W) (14)。U _i ^W (d,f)=ωT _i ^W +υ(E _i ^T p ^L +E _gs ^T p ^L +E _sg ^T p ^S +E _i,w ^C p ^W ) (14).

步骤(3)所述利用基于NOMA的子信道用户重组算法，对信道资源进行分配，包括：The step (3) allocates channel resources by using a NOMA-based sub-channel user reorganization algorithm, including:

1)用表示用户集合；表示子信道集合，其中每个子信道包含2个用户；表示各子信道内用户的速率和，其中 1) Use Represents a collection of users; represents a set of subchannels, where each subchannel contains 2 users; represents the sum of the rates of users in each subchannel, where

2)在任意两个子信道中各选取一个用户进行位置互换，产生新的子信道用户组合n′_i和n′_j(i≠j)，并计算q′_i和q′_j，若满足如下不等式则用户重组成功：2) Select one user from each of the two sub-channels to swap positions, generate new sub-channel user combinations n′ _i and n′ _j (i≠j), and calculate q′ _i and q′ _j . If the following inequality is satisfied, the user reorganization is successful:

步骤(4)所述基于Q-Learning的计算卸载和资源分配算法，将优化问题转化为等效的强化学习问题进行求解，包括：The Q-Learning-based computation offloading and resource allocation algorithm in step (4) converts the optimization problem into an equivalent reinforcement learning problem for solution, including:

1)将优化问题转化为等效的强化学习问题，即：1) Convert the optimization problem into an equivalent reinforcement learning problem, namely:

状态空间：状态空间为目标服务器的可用资源集合，表示为S＝{F_L，F_S，F_W}；State space: The state space is the set of available resources of the target server, expressed as S = { _FL , _FS , _FW };

动作空间：动作空间由卸载决策向量和资源分配向量两部分组成,表示为A＝{af₁…af_M,bf₁…bf_M,cf₁…cf_M}；Action space: The action space consists of the offloading decision vector and resource allocation vector It consists of two parts, expressed as A = {af ₁ …af _M ,bf ₁ …bf _M ,cf ₁ …cf _M };

奖励：将状态s^k下执行动作a^k的奖励函数定义为R(s^k,a^k)＝-U(s^k,a^k)。Reward: The reward function for executing action ^ak in state ^sk is defined as R( ^sk , ^ak ) = -U( ^sk , ^ak ).

2)基站控制器以ε(0＜ε＜1)的概率选择已知动作价值最大的动作；1-ε的概率随机选择一个动作执行：2) The base station controller selects the action with the largest known action value with probability ε (0 < ε < 1); and randomly selects an action with probability 1-ε to execute:

3)在利用或探索的过程中，每执行一个动作都需要通过公式(17)对Q表进行更新；经过反复的训练，直至Q表收敛就可以指导智能体进行最优策略的选择：3) During the process of utilization or exploration, each action needs to be updated through formula (17); after repeated training, until the Q table converges, it can guide the agent to choose the optimal strategy:

式中R表示状态s^k下执行动作a^k所获得的即时奖励；表示状态转移后可采取的所有动作中最大的Q值；α是学习率，指新Q值占整体的比重；γ是折扣率。Where R represents the immediate reward obtained by executing action a ^k in state s ^k ; It represents the maximum Q value among all actions that can be taken after state transfer; α is the learning rate, which refers to the proportion of the new Q value in the whole; γ is the discount rate.

本发明的优点是：面向具有电力成本差异的多层MEC资源，以最小化所有用户时间成本和卸载成本的加权和为优化目标，通过联合卸载决策和资源分配将问题表述为一个混合整数规划问题，并提出了一种基于NOMA和Q-Learning的优化传输与卸载的求解方案。仿真结果表明，本申请的多层MEC架构要优于传统单层MEC架构，同时验证了本算法在求解中要优于其他基本算法。The advantages of the present invention are: for multi-layer MEC resources with different power costs, the weighted sum of minimizing the time cost and unloading cost of all users is taken as the optimization goal, the problem is expressed as a mixed integer programming problem by combining unloading decision and resource allocation, and a solution for optimizing transmission and unloading based on NOMA and Q-Learning is proposed. The simulation results show that the multi-layer MEC architecture of the present application is better than the traditional single-layer MEC architecture, and it is verified that the algorithm is better than other basic algorithms in solving the problem.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明实施例中网络模型的结构示意图；FIG1 is a schematic diagram of the structure of a network model according to an embodiment of the present invention;

图2为本发明实施例中优化平均传输速率的迭代过程曲线图；FIG2 is a graph showing an iterative process of optimizing the average transmission rate in an embodiment of the present invention;

图3为本发明实施例中算法训练过程的总成本变化示意图；FIG3 is a schematic diagram of the total cost change of the algorithm training process according to an embodiment of the present invention;

图4为本发明实施例中不同传输方案下的成本对比示意图；FIG4 is a schematic diagram showing cost comparison under different transmission schemes according to an embodiment of the present invention;

图5为本发明实施例中不同卸载策略下的成本对比示意图。FIG5 is a schematic diagram showing cost comparison under different unloading strategies in an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图和实施例对本发明作进一步的阐述。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

实施例：Example:

如图1所示是一种具有(边、星、云)三层计算资源的网络模型，该模型下用户设备可通过基站将计算任务卸载到目标服务器进行处理。其中，基站作为用户设备的接入点，它集成了传统蜂窝基站和地面卫星终端两种设备，分别采用C波段和Ka波段与地面设备和低轨卫星进行通信。As shown in Figure 1, it is a network model with three layers of computing resources (edge, satellite, and cloud). In this model, user devices can offload computing tasks to the target server for processing through the base station. Among them, the base station serves as the access point of the user device. It integrates two types of equipment, the traditional cellular base station and the ground satellite terminal, and uses the C band and Ka band to communicate with the ground equipment and low-orbit satellites respectively.

假设基站的覆盖范围内有M个用户，每个用户都有一个计算任务d_i(i∈M)需要卸载处理且计算任务不可划分，用户的任务数据集合记为为用户提供计算服务的三层计算资源分别为：Assume that there are M users within the coverage of the base station. Each user has a computing task d _i (i∈M) that needs to be offloaded and the computing task cannot be divided. The user's task data set is recorded as The three layers of computing resources that provide computing services to users are:

(1)基站侧的本地MEC服务器；(1) Local MEC server on the base station side;

(2)低轨卫星上搭载的卫星MEC服务器；(2) Satellite MEC server on low-orbit satellite;

(3)西部地区搭建的西部云服务器。(3) Western cloud server built in the western region.

用户设备采用NOMA技术将计算任务上传到基站，基站控制器根据用户的任务属性和各层次资源的实时状态对用户任务进行统一调度。当计算任务在本地MEC服务器卸载时记a＝1，反之a＝0；当计算任务在卫星MEC服务器卸载时记b＝1，反之b＝0；当计算任务在西部云服务器卸载时记c＝1，反之c＝0。我们将卸载决策记为 The user device uploads the computing task to the base station using NOMA technology. The base station controller schedules the user tasks in a unified manner according to the user's task attributes and the real-time status of resources at each level. When the computing task is unloaded on the local MEC server, a = 1 is recorded, otherwise a = 0; when the computing task is unloaded on the satellite MEC server, b = 1 is recorded, otherwise b = 0; when the computing task is unloaded on the western cloud server, c = 1 is recorded, otherwise c = 0. We record the unloading decision as

在传输阶段采用NOMA传输方案。在该方案中假设每个子信道可同时被两个用户占用，且子信道内的用户相互正交，因此M个用户被分为N对，即有N个子信道，记为在接收端，根据NOMA上行链路信道增益降序解码的原则进行解码。在解码第一个用户的时候将第二个用户视为干扰，故第一个解码用户的传输速率为：The NOMA transmission scheme is adopted in the transmission stage. In this scheme, it is assumed that each subchannel can be occupied by two users at the same time, and the users in the subchannel are orthogonal to each other, so M users are divided into N pairs, that is, there are N subchannels, recorded as At the receiving end, decoding is performed according to the principle of NOMA uplink channel gain descending decoding. When decoding the first user, the second user is regarded as interference, so the transmission rate of the first decoded user is:

由于本文是通过基站对用户任务进行批处理，基站与卫星的距离为定值，因此不能只将物理距离作为衡量信道增益和星地信道用户分组的条件。为了简化模型，我们将地星通信的传输速率设为常数R_GS，星地通信的传输速率为常数R_SG。Since this paper batches user tasks through the base station, the distance between the base station and the satellite is a constant, so the physical distance cannot be used as the only condition to measure the channel gain and the user grouping of the satellite-to-ground channel. In order to simplify the model, we set the transmission rate of the ground-to-satellite communication to a constant R _GS and the transmission rate of the satellite-to-ground communication to a constant R _SG .

在任务卸载的过程中，时间延迟和卸载开销往往是用户最关心的两个问题，我们将用户任务卸载的总成本分为时间成本和卸载成本，并通过权重因子ω和υ来表示其占总成本的比重。In the process of task offloading, time delay and offloading overhead are often the two issues that users are most concerned about. We divide the total cost of user task offloading into time cost and offloading cost, and use weight factors ω and υ to represent their proportion of the total cost.

其中时间成本由传输、传播和处理时延组成，与用户到目标服务器的距离有关；卸载成本由传输开销和计算开销组成，与能耗和当地的单位电价有关。The time cost is composed of transmission, propagation and processing delays, which is related to the distance from the user to the target server; the offloading cost is composed of transmission overhead and computing overhead, which is related to energy consumption and the local unit electricity price.

(1)本地MEC卸载(1) Local MEC offloading

本地卸载方案中，由于计算任务在基站侧的本地MEC服务器上执行，故传播时延可以忽略不记。本地MEC服务器给UEi分配的计算能力为f_i ^L，每比特数据所需要的CPU周期数为β(Cycles/bit)，故本地卸载的时间成本为：In the local offloading solution, since the computing task is executed on the local MEC server on the base station side, the propagation delay can be ignored. The computing power allocated by the local MEC server to UEi is f _i ^L , and the number of CPU cycles required for each bit of data is β (Cycles/bit), so the time cost of local offloading is:

式中第一项为UE到BS的传输时延；第二项为计算任务的处理时延。The first term in the formula is the transmission delay from UE to BS; the second term is the processing delay of the computing task.

最先进的CPU架构通常采用动态电压和频率缩放技术，动态功耗P与V²f成正比，在低电压的限制下CPU芯片的计算频率f与电源电压V近似成线性关系，即V＝af。我们将CPU的功耗建模为P＝εf³，其中f为CPU的计算频率，ε是芯片架构的系数，则用户i在本地MEC服务器l的计算能耗为：The most advanced CPU architecture usually adopts dynamic voltage and frequency scaling technology. The dynamic power consumption P is proportional to V ² f. Under the low voltage limit, the computing frequency f of the CPU chip is approximately linearly related to the power supply voltage V, that is, V = af. We model the power consumption of the CPU as P = εf ³ , where f is the computing frequency of the CPU and ε is the coefficient of the chip architecture. Then the computing energy consumption of user i on the local MEC server l is:

E_i,l ^C＝ε(f_i ^L)²d_iβ (23)E _i,l ^C =ε(f _i ^L ) ² d _i β (23)

U_i ^L(d,f)＝ωT_i ^L+υ(E_i ^Tp^L+E_i,l ^Cp^L) (24)U _i ^L (d,f)＝ωT _i ^L +υ(E _i ^T p ^L +E _i,l ^C p ^L ) (24)

(2)卫星MEC卸载(2) Satellite MEC offloading

卫星卸载方案中，BS到接入卫星的距离为H，卫星MEC服务器给UEi分配的计算能力为f_i ^S，故卫星卸载的时间成本为：In the satellite offloading scheme, the distance between the BS and the access satellite is H, and the computing power allocated by the satellite MEC server to UEi is _fi ^S , so the time cost of satellite offloading is:

式中第一项为UE到BS的传输时延；第二、第三项为地星通信的上行传输、传播时延，第四项为计算任务的处理时延。The first term in the formula is the transmission delay from UE to BS; the second and third terms are the uplink transmission and propagation delays of earth-satellite communication; and the fourth term is the processing delay of the computing task.

E_i,s ^C＝ε(f_i ^S)²d_iβ (27)E _i,s ^C =ε(f _i ^S ) ² d _i β (27)

U_i ^S(d,f)＝ωT_i ^S+υ(E_i ^Tp^L+E_gs ^Tp^L+E_i,s ^Cp^S) (28)U _i ^S (d,f)＝ωT _i ^S +υ(E _i ^T p ^L +E _gs ^T p ^L +E _i,s ^C p ^S ) (28)

(3)西部云卸载(3) Uninstallation of Western Cloud

在西部卸载方案中，计算任务需经卫星中继上传到西部云服务器，当出现BS和云服务器不在同一个LEO卫星覆盖范围内的情形时，需靠星间链路辅助计算任务进行数据传输。由于星间链路状态的不可预测性，因此本文忽略了计算任务从一个LEO卫星到另一个LEO卫星的传输、传播时延。西部云服务器给UEi分配的计算能力为f_i ^W,故西部卸载的时间成本为：In the western offloading scheme, the computing task needs to be uploaded to the western cloud server via satellite relay. When the BS and the cloud server are not within the coverage of the same LEO satellite, the intersatellite link is required to assist the computing task in data transmission. Due to the unpredictability of the intersatellite link state, this paper ignores the transmission and propagation delay of the computing task from one LEO satellite to another. The computing power allocated by the western cloud server to UEi is _fi ^W , so the time cost of western offloading is:

E_i,w ^C＝ε(f_i ^W)²d_iβ (31)E _i,w ^C =ε(f _i ^W ) ² d _i β (31)

U_i ^W(d,f)＝ωT_i ^W+υ(E_i ^Tp^L+E_gs ^Tp^L+E_sg ^Tp^S+E_i,w ^Cp^W) (32)U _i ^W (d,f)＝ωT _i ^W +υ(E _i ^T p ^L +E _gs ^T p ^L +E _sg ^T p ^S +E _i,w ^C p ^W ) (32)

由于计算结果远小于输入数据的大小，因此本文没有考虑数据的回程链路。Since the calculated result is much smaller than the size of the input data, this paper does not consider the data return link.

面向具有时延和电力成本差异的多层MEC资源，我们的目标是在计算能力有限的情况下，使所有用户时间成本和卸载成本的加权和最小化。对此，我们将优化问题表述为：For multi-layer MEC resources with different latency and power costs, our goal is to minimize the weighted sum of all user time costs and offloading costs under limited computing power. To this end, we formulate the optimization problem as:

在问题(33)中，C1表示任务不可分，计算任务只能完全由本地MEC服务器、卫星MEC服务器或西部云服务器执行；C2-C4表示分配给UE的计算资源不能超过目标服务器的算力极限。In problem (33), C1 means that the task is inseparable and the computing task can only be completely executed by the local MEC server, satellite MEC server or western cloud server; C2-C4 means that the computing resources allocated to the UE cannot exceed the computing power limit of the target server.

由于卸载决策是二进制变量，而资源分配向量是动态变化的，因此该问题是一个混合整数规划问题，它是NP-hard的，采用传统方法求解非常复杂，对此我们提出了一种基于Q-Learning的求解策略。Due to uninstall decision is a binary variable, and the resource allocation vector is dynamically changing, so this problem is a mixed integer programming problem, which is NP-hard and very complicated to solve using traditional methods. We propose a solution strategy based on Q-Learning.

基于NOMA的子信道用户重组算法Sub-channel user reorganization algorithm based on NOMA

在传输阶段我们采用NOMA技术对信道资源进行分配，由于不同UE到BS的距离不同，即用户信道增益有差异，同一子信道内不同的用户组合会影响用户的传输速率。我们用表示用户集合；表示子信道集合，其中每个子信道包含2个用户；表示各子信道内用户的速率和，其中 In the transmission phase, we use NOMA technology to allocate channel resources. Since the distances from different UEs to the BS are different, that is, the user channel gains are different, different user combinations in the same subchannel will affect the user's transmission rate. Represents a collection of users; represents a set of subchannels, where each subchannel contains 2 users; represents the sum of the rates of users in each subchannel, where

定义1：在任意两个子信道中各选取一个用户进行位置互换，产生新的子信道用户组合n_i′和n′_j(i≠j)，并计算q_i′和q′_j，若满足如下不等式则用户重组成功。Definition 1: Select one user from each of any two sub-channels to swap their positions, generate new sub-channel user combinations n _i ′ and n′ _j (i≠j), and calculate q _i ′ and q′ _j . If the following inequality is satisfied, the user reorganization is successful.

为了找到各子信道的最佳用户组合，根据定义1我们采用了一种基于NOMA的子信道用户重组算法以提高用户的平均传输速率。In order to find the best user combination for each sub-channel, according to Definition 1, we adopt a NOMA-based sub-channel user reorganization algorithm to improve the average transmission rate of users.

基于Q-Learning的计算卸载和资源分配算法Computation offloading and resource allocation algorithm based on Q-Learning

Q-Learning是一种无模型强化学习过程，它的目的是使智能体在一个陌生的环境中学习到一种策略，从而最大化智能体所获得的累计奖励。面向多层MEC资源，采用了一种基于Q-Learning的计算卸载和资源分配算法(Q-CORAA)。Q-Learning is a model-free reinforcement learning process that aims to enable an agent to learn a strategy in an unfamiliar environment to maximize the cumulative reward obtained by the agent. For multi-layer MEC resources, a Q-Learning-based computation offloading and resource allocation algorithm (Q-CORAA) is adopted.

用状态、动作和奖励三个关键元素来定义系统模型，为了将优化问题转化为等效的强化学习问题，我们将这些关键要素表示为：The system model is defined by three key elements: state, action, and reward. In order to transform the optimization problem into an equivalent reinforcement learning problem, we express these key elements as:

(1)状态空间：状态空间为目标服务器的可用资源集合，表示为(1) State space: The state space is the set of available resources of the target server, expressed as

S＝{F_L，F_S，F_W}。S = { _FL , _FS , _FW }.

(2)动作空间：动作空间由卸载决策向量和资源分配向量两部分组成。因此，动作空间可以表示为(2) Action space: The action space is composed of the unloading decision vector and resource allocation vector Therefore, the action space can be expressed as

A＝{af₁…af_M,bf₁…bf_M,cf₁…cf_M}。A＝{af ₁ ...af _M ,bf ₁ ...bf _M ,cf ₁ ...cf _M }.

(3)奖励：奖励与目标函数有关，由于我们的优化问题是最小化成本，而Q-Learning的目标是最大化奖励，因此我们将状态s^k下执行动作a^k的奖励函数定义为R(s^k,a^k)＝-U(s^k,a^k)。(3) Reward: The reward is related to the objective function. Since our optimization problem is to minimize the cost and the goal of Q-Learning is to maximize the reward, we define the reward function for performing action ^ak in state ^sk as R( ^sk , ^ak )=-U( ^sk , ^ak ).

在Q-Learning中最优策略可以通过Q函数获得，因此，我们的首要任务是如何训练一个Q函数，并让其收敛。本文中状态空间S和动作空间A都是有限的，这是一个有限马尔科夫决策过程，我们可以把Q函数看作一个存储着Q值的|S|行|A|列的表格，称之为Q表，并根据Q表的指导进行动作选择。In Q-Learning, the optimal strategy can be obtained through the Q function. Therefore, our first task is how to train a Q function and make it converge. In this paper, both the state space S and the action space A are finite. This is a finite Markov decision process. We can regard the Q function as a table with |S| rows and |A| columns storing Q values, called a Q table, and select actions according to the guidance of the Q table.

起初Q表是空的，并不能指引智能体选择最优动作，因此需要它去与环境交互采集数据，根据经验填充并更新Q表。在这一过程中，我们既希望智能体能够利用已有的经验执行最优动作，又希望它能够去探索一些未知的动作选项。在权衡利用与探索二者之间，ε-greedy是一种常用的策略。Initially, the Q table is empty and cannot guide the agent to choose the best action. Therefore, it needs to interact with the environment to collect data, fill in and update the Q table based on experience. In this process, we hope that the agent can use the existing experience to perform the best action, and also hope that it can explore some unknown action options. In the trade-off between utilization and exploration, ε-greedy is a commonly used strategy.

定义：ε-greedy策略Definition: ε-greedy strategy

智能体做决策时，以ε(0＜ε＜1)的概率选择已知动作价值最大的动作；1-ε的概率随机选择一个动作执行。When the agent makes a decision, it selects the action with the largest known action value with a probability of ε (0 < ε < 1); and randomly selects an action to execute with a probability of 1-ε.

在利用或探索的过程中，每执行一个动作都需要通过公式(17)对Q表进行更新。经过反复的训练，直至Q表收敛就可以指导智能体进行最优策略的选择。In the process of exploitation or exploration, each action needs to be updated by formula (17). After repeated training, the Q table can guide the agent to choose the optimal strategy until it converges.

式中R表示状态s^k下执行动作a^k所获得的即时奖励；表示状态转移后可采取的所有动作中最大的Q值；α是学习率，指新Q值占整体的比重；γ是折扣率，它定义了未来奖励的重要性，值越大越注重长期奖励。通过采用算法2对优化问题(33)进行求解。Where R represents the immediate reward obtained by executing action a ^k in state s ^k ; represents the maximum Q value among all actions that can be taken after state transition; α is the learning rate, which refers to the proportion of the new Q value in the whole; γ is the discount rate, which defines the importance of future rewards. The larger the value, the more emphasis is placed on long-term rewards. The optimization problem (33) is solved by using Algorithm 2.

仿真设置Simulation Settings

在MATLAB R2018b上进行仿真，考虑了一个半径为500m的圆形仿真区域，BS位于中心处，覆盖整个区域，低轨卫星位于仿真区域上空500km处。UE随机分布在该区域内，每个UE输入任务比特数d_i＝(1～3)Mbits，计算一个任务位所需的CPU周期数β＝120cycles/bit。本地MEC、卫星MEC和西部云服务器的计算能力分别设置为1Gcycles/s、2Gcycles/s和3Gcycles/s。单位电价根据当地电力资源获取的难易程度动态调整，本文中p^L＝0.8,p^S＝0.6,p^W＝0.4。每个UE的发射功率为20dBm，子载波带宽为1MHz，噪声功率为-100dBm，UE到BS的路径损耗为L(d)＝15.3+37.6lgd。假设始终有一颗LEO卫星对该区域提供全覆盖，BS和LEO卫星的发射功率为23dBm，星地通信的速率为10Mbps。The simulation was performed on MATLAB R2018b. A circular simulation area with a radius of 500m was considered. The BS was located at the center, covering the entire area, and the low-orbit satellite was located 500km above the simulation area. The UEs were randomly distributed in the area. Each UE input the number of task bits d _i = (1~3)Mbits, and the number of CPU cycles required to calculate one task bit β = 120cycles/bit. The computing power of the local MEC, satellite MEC, and western cloud server was set to 1Gcycles/s, 2Gcycles/s, and 3Gcycles/s, respectively. The unit electricity price was dynamically adjusted according to the difficulty of obtaining local power resources. In this paper, p ^L = 0.8, p ^S = 0.6, and p ^W = 0.4. The transmit power of each UE was 20dBm, the subcarrier bandwidth was 1MHz, the noise power was -100dBm, and the path loss from UE to BS was L(d) = 15.3+37.6lgd. Assume that there is always a LEO satellite providing full coverage of the area, the transmission power of the BS and the LEO satellite is 23dBm, and the satellite-to-ground communication rate is 10Mbps.

表1主要仿真参数Table 1 Main simulation parameters

性能评估Performance Evaluation

在图2中，我们展示了N-SURA的迭代过程，10个用户经过有限次迭代后平均传输速率达到稳定，可以看出，与NOMA相比，采用N-SURA可以获得更高的平均传输速率。In Figure 2, we show the iteration process of N-SURA. The average transmission rate of 10 users reaches a stable level after a limited number of iterations. It can be seen that compared with NOMA, N-SURA can achieve a higher average transmission rate.

在图3中，我们展示了智能体的训练过程，可以看出随着训练次数的增多，智能体逐渐接近最优卸载策略，使总成本保持在较低的水平。In Figure 3, we show the training process of the agent. It can be seen that as the number of training times increases, the agent gradually approaches the optimal unloading strategy, keeping the total cost at a low level.

在图4中，我们分别采用了不同的传输方案用Q-CORA算法对优化问题进行求解，可以看出，在不同用户数下，采用N-SURA效果最佳，NOMA次之，OFDMA最差。In Figure 4, we used different transmission schemes to solve the optimization problem using the Q-CORA algorithm. It can be seen that under different numbers of users, N-SURA has the best effect, followed by NOMA, and OFDMA is the worst.

从图5我们可以看出，在不同用户数下，传统单层MEC计算卸载的总成本最高，其原因主要是本地的单位电价较其他层次要略高，大大增加了用户卸载的成本，验证了本文所提多层MEC架构的可行性。同时，在多层MEC架构下，采用Q-CORA算法的总成本最低，说明本文所提的算法明显要优于其他卸载算法。From Figure 5, we can see that under different numbers of users, the total cost of traditional single-layer MEC computing offloading is the highest. The main reason is that the local unit electricity price is slightly higher than other levels, which greatly increases the cost of user offloading, verifying the feasibility of the multi-layer MEC architecture proposed in this paper. At the same time, under the multi-layer MEC architecture, the total cost of using the Q-CORA algorithm is the lowest, indicating that the algorithm proposed in this paper is obviously better than other offloading algorithms.

Claims

1. A multi-layer MEC resource offloading method with power cost difference, characterized by comprising the following steps:

(1) Establish a network model with multiple layers of MEC resources with different power costs;

(2) Establish communication models and computational models at different resource levels;

(3) Allocate channel resources using the NOMA-based sub-channel user reorganization algorithm;

(4) Using the Q-Learning-based computation offloading and resource allocation algorithm, the optimization problem is transformed into an equivalent reinforcement learning problem. By training the intelligent agent, the Q table is converged to guide the offloading decision of the base station agent.

2. The multi-layer MEC resource offloading method according to claim 1 is characterized in that: step (1) of establishing a network model of multi-layer MEC resources with different power costs comprises:

1) The user equipment offloads the computing tasks to the target server for processing through the base station. The base station (BS) is the access point of the user equipment and integrates two types of equipment: traditional cellular base stations and ground satellite terminals. It uses C-band and Ka-band to communicate with ground equipment and low-orbit satellites respectively;

2) Assume that there are M users within the coverage of the base station. Each user has a computing task d _i (i∈M) that needs to be offloaded and the computing task cannot be divided. The user's task data set is recorded as There are three layers of computing resources that provide computing services to users;

3) User equipment uses NOMA technology to upload computing tasks to the base station. The base station controller uniformly schedules user tasks according to the user's task attributes and the real-time status of resources at all levels; when the computing task is unloaded on the local MEC server, a = 1 is recorded, otherwise a = 0; when the computing task is unloaded on the satellite MEC server, b = 1 is recorded, otherwise b = 0; when the computing task is unloaded on the western cloud server, c = 1 is recorded, otherwise c = 0. The unloading decision is recorded as

3. The multi-layer MEC resource offloading method according to claim 2 is characterized in that: the three-layer computing resources providing computing services for users are respectively:

① Local MEC server on the base station side;

② Satellite MEC server carried on low-orbit satellite;

③Western cloud server built in the western region.

4. The multi-layer MEC resource offloading method according to claim 1 is characterized in that: step (2) of establishing communication models under different resource levels includes:

1) Using the NOMA transmission scheme, it is assumed that each subchannel can be occupied by two users at the same time, and the users in the subchannel are orthogonal to each other. Therefore, M users are divided into N pairs, that is, there are N subchannels, denoted as

2) At the receiving end, decoding is performed according to the principle of NOMA uplink channel gain descending decoding; when decoding the first user, the second user is regarded as interference, so the transmission rate of the first decoded user is:

When decoding the second user, there is no user signal to interfere, and the transmission rate is:

Among them, B is the subcarrier bandwidth; P represents the user's transmission power; n ₀ represents the noise power.

3) For user equipment within the coverage of the base station, the base station controller is uniformly used to batch process user tasks. Since the distance between the base station and the satellite is a fixed value, the transmission rate of the ground-satellite communication is set to a constant R _GS , and the transmission rate of the satellite-ground communication is set to a constant R _SG .

5. The multi-layer MEC resource offloading method according to claim 1 is characterized in that: step (2) of establishing a computing model at different resource levels includes:

1) The total cost of user task offloading is divided into time cost and offloading cost, and the weight factors ω and υ are used to represent their proportion in the total cost.

The time cost is composed of transmission, propagation and processing delays, which is related to the distance from the user to the target server; the offloading cost is composed of transmission overhead and computing overhead, which is related to energy consumption and local unit electricity price;

2) The computing power allocated by the local MEC server to UEi is f _i ^L , and the number of CPU cycles required for each bit of data is β (Cycles/bit). Therefore, the time cost of local offloading is:

The first term is the transmission delay from UE to BS; the second term is the processing delay of the computing task;

The transmission power of UEi is _Pi , and the transmission energy consumption from UE to BS is:

Using dynamic voltage and frequency scaling technology, the dynamic power consumption P is proportional to V ² f. Under the low voltage limit, the computing frequency f of the CPU chip is approximately linearly related to the power supply voltage V, that is, V = af. The power consumption of the CPU is modeled as P = εf ³ , where f is the computing frequency of the CPU and ε is the coefficient of the chip architecture. The computing energy consumption of user i on the local MEC server l is:

E _i,l ^C =ε(f _i ^L ) ² d _i β (6)

The local unit electricity price is p ^L , and the cost function of local unloading is:

U _i ^L (d,f)＝ωT _i ^L +υ(E _i ^T p ^L +E _i,l ^C p ^L ) (7)

3) The distance from BS to the access satellite is H, and the computing power allocated by the satellite MEC server to UEi is _fi ^S , so the time cost of satellite unloading is:

The first term is the transmission delay from UE to BS; the second and third terms are the uplink transmission and propagation delays of earth-satellite communication; the fourth term is the processing delay of the computing task;

The transmission power of BS is P _GS , then the transmission energy consumption from BS to satellite s is:

The computing energy consumption of UEi on satellite MEC server s is:

E _i,s ^C =ε(f _i ^S ) ² d _i β (10)

The unit electricity price of the satellite server is p ^S , and the cost function of satellite unloading is:

U _i ^S (d,f)＝ωT _i ^S +υ(E _i ^T p ^L +E _gs ^T p ^L +E _i,s ^C p ^S ) (11)

4) The computing power allocated by the western cloud server to UEi ^is _fiW , so the time cost of unloading in the west is:

The first term is the transmission delay from UE to BS; the second and third terms are the transmission delays of ground-satellite and satellite-ground communications; the fourth term is the propagation delay from BS to the western cloud server via satellite relay; and the fifth term is the processing delay of the computing task in the western cloud server.

The transmission power of the satellite is P _SG , and the transmission energy consumption from satellite s to western cloud server w is:

The computing energy consumption of UEi on the western cloud server w is:

E _i,w ^C =ε(f _i ^W ) ² d _i β (13)

The unit electricity price of the western cloud server is p ^W , and the cost function of unloading in the west is:

U _i ^W (d,f)=ωT _i ^W +υ(E _i ^T p ^L +E _gs ^T p ^L +E _sg ^T p ^S +E _i,w ^C p ^W ) (14).

6. The multi-layer MEC resource offloading method according to claim 1 is characterized in that: the step (3) uses the NOMA-based sub-channel user reorganization algorithm to allocate channel resources, including:

1) Use Represents a collection of users; represents a set of subchannels, where each subchannel contains 2 users; represents the sum of the rates of users in each subchannel, where

2) Select one user from each of the two sub-channels to swap positions, generate new sub-channel user combinations n′ _i and n′ _j (i≠j), and calculate q′ _i and q′ _j . If the following inequality is satisfied, the user reorganization is successful:

7. The multi-layer MEC resource offloading method according to claim 1 is characterized in that: the Q-Learning-based computation offloading and resource allocation algorithm in step (4) converts the optimization problem into an equivalent reinforcement learning problem for solution, including:

1) Convert the optimization problem into an equivalent reinforcement learning problem, namely:

State space: The state space is the set of available resources of the target server, expressed as S = { _FL , _FS , _FW };

Action space: The action space consists of the offloading decision vector and resource allocation vector It consists of two parts, expressed as A = {af ₁ …af _M ,bf ₁ …bf _M ,cf ₁ …cf _M };

Reward: The reward function for executing action ^ak in state ^sk is defined as R( ^sk , ^ak ) = -U( ^sk , ^ak ).

2) The base station controller selects the action with the largest known action value with probability ε (0 < ε < 1); and randomly selects an action with probability 1-ε to execute:

3) During the process of utilization or exploration, each action needs to be updated through formula (17); after repeated training, until the Q table converges, it can guide the agent to choose the optimal strategy:

Where R represents the immediate reward obtained by executing action a ^k in state s ^k ; It represents the maximum Q value among all actions that can be taken after state transfer; α is the learning rate, which refers to the proportion of the new Q value in the whole; γ is the discount rate.