[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115499849B - A method for cooperation between a wireless access point and a reconfigurable smart surface - Google Patents

A method for cooperation between a wireless access point and a reconfigurable smart surface Download PDF

Info

Publication number
CN115499849B
CN115499849B CN202211429707.5A CN202211429707A CN115499849B CN 115499849 B CN115499849 B CN 115499849B CN 202211429707 A CN202211429707 A CN 202211429707A CN 115499849 B CN115499849 B CN 115499849B
Authority
CN
China
Prior art keywords
network
graph
access point
reconfigurable smart
agents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211429707.5A
Other languages
Chinese (zh)
Other versions
CN115499849A (en
Inventor
罗弦
廖荣涛
杨荣浩
李想
姚渭箐
董亮
刘芬
张岱
郭岳
王逸兮
李磊
孟浩华
王敬靖
胡欢君
龙霏
袁翔宇
王博涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Telecommunication Branch of State Grid Hubei Electric Power Co Ltd
Original Assignee
Information and Telecommunication Branch of State Grid Hubei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Telecommunication Branch of State Grid Hubei Electric Power Co Ltd filed Critical Information and Telecommunication Branch of State Grid Hubei Electric Power Co Ltd
Priority to CN202211429707.5A priority Critical patent/CN115499849B/en
Publication of CN115499849A publication Critical patent/CN115499849A/en
Application granted granted Critical
Publication of CN115499849B publication Critical patent/CN115499849B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/18Network planning tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本申请涉及一种无线接入点与可重构智能表面协作方法,包括以下步骤:搭建基于电力物联网络的设备通信架构;根据上述所搭建的基于电力物联网络的设备通信架构,设计相应的接入点与智能可重构表面协作方法,以最大化系统能源效率为目标,实现电力物联网络下的海量设备对于传输数据速率和可靠性方面的服务质量需求;各接入点与可重构智能表面根据训练完成的模型进行协作,以满足电力物联网络下海量设备的接入需求。本申请将巨型无线通信网络建模成图,并且使用图嵌入的方法对其降维以获得高效的图表示,可以有效的降低模型训练复杂度,实现了高度定制化的通信。

Figure 202211429707

This application relates to a wireless access point and a reconfigurable smart surface collaboration method, including the following steps: building a device communication architecture based on the power Internet of Things network; designing a corresponding device communication architecture based on the power Internet of Things network built above. The access point and the intelligent reconfigurable surface collaboration method aim at maximizing the energy efficiency of the system, and realize the quality of service requirements for the transmission data rate and reliability of massive devices under the power Internet of Things network; each access point and the reconfigurable surface The reconstructed smart surface cooperates according to the trained model to meet the access requirements of massive devices under the power Internet of Things network. This application models the giant wireless communication network as a graph, and uses the method of graph embedding to reduce its dimension to obtain an efficient graph representation, which can effectively reduce the complexity of model training and realize highly customized communication.

Figure 202211429707

Description

一种无线接入点与可重构智能表面协作方法A collaboration method between wireless access point and reconfigurable intelligent surface

技术领域Technical Field

本申请属于电力物联网技术领域,尤其涉及一种无线接入点与可重构智能表面协作方法。The present application belongs to the technical field of electric power Internet of Things, and in particular, relates to a method for collaboration between a wireless access point and a reconfigurable smart surface.

背景技术Background Art

近年来,随着电力物联网的快速发展,电力物联网的网络边缘部署了海量设备。由于电力网络系统复杂且庞大,仅仅依靠人力管控存在着管理难度大、成本高等问题,因此,需要引入新的信息通信技术,以提升电力系统的运行性能和管控效率。为了实现电力物联网络的智能管控,需要实时的感知测量电力网络的调配情况与性能表现。因此,电力物联网络需要满足网络边缘海联设备接入及海量数据传输,以此保证电力物联网络的高效可靠运作。随着信息通信技术的不断发展,新一代的移动通信技术能够在大量电力设备接入电力网络时提供高速、稳定的服务,但由于网络边缘设备的异构性,目前无法实现高度定制化和智能通信,即动态的配置网络资源,以支持超密集连接。In recent years, with the rapid development of the power Internet of Things, a large number of devices have been deployed at the network edge of the power Internet of Things. Since the power network system is complex and large, relying solely on human control will result in problems such as difficulty in management and high cost. Therefore, new information and communication technologies need to be introduced to improve the operating performance and control efficiency of the power system. In order to realize the intelligent control of the power Internet of Things network, it is necessary to perceive and measure the deployment and performance of the power network in real time. Therefore, the power Internet of Things network needs to meet the access of network edge sea-connected devices and massive data transmission to ensure the efficient and reliable operation of the power Internet of Things network. With the continuous development of information and communication technology, the new generation of mobile communication technology can provide high-speed and stable services when a large number of power devices are connected to the power network. However, due to the heterogeneity of network edge devices, it is currently impossible to achieve high customization and intelligent communication, that is, dynamically configure network resources to support ultra-dense connections.

可重构智能表面是一种全新的革命性技术,它可以通过在平面上集成大量低成本的无源反射元件,智能地重新配置无线传播环境,从而显著提高无线通信网络地性能。可重构智能表面为高度定制化提供了可能,它可以通过高度可控和智能信号反射来重新配置无线传播环境,这为进一步提高无线链路的性能提供了新的自由度,为实现智能可编程无线环境铺平了道路。借助可重构智能表面技术,通过无线接入点与其协同灵活配置混合空间波束,按需增强数据,灵活的进行干扰抑制以及高效的混合空域和功率域复用可以有效的高度的定制化通信及智能通信。因此,在异构电网且具有海量设备的电力物联网场景下,一种有效的无线接入点与可重构智能表面协作技术亟需被设计,以此实现高度定制化通信及智能通信。Reconfigurable smart surface is a new revolutionary technology that can intelligently reconfigure the wireless propagation environment by integrating a large number of low-cost passive reflective elements on a plane, thereby significantly improving the performance of wireless communication networks. Reconfigurable smart surface makes it possible to be highly customized. It can reconfigure the wireless propagation environment through highly controllable and intelligent signal reflection, which provides new degrees of freedom to further improve the performance of wireless links and paves the way for the realization of intelligent programmable wireless environments. With the help of reconfigurable smart surface technology, wireless access points can flexibly configure hybrid spatial beams, enhance data on demand, flexibly suppress interference, and efficiently reuse hybrid air and power domains to effectively achieve highly customized and intelligent communications. Therefore, in the scenario of power Internet of Things with heterogeneous power grids and massive devices, an effective wireless access point and reconfigurable smart surface collaboration technology needs to be designed to achieve highly customized and intelligent communications.

发明内容Summary of the invention

本申请实施例的目的在于提供一种无线接入点与可重构智能表面协作方法,将无线通信网络建模为图表示,并且使用图嵌入的方法获得该网络的嵌入表示,通过图嵌入的方法可以有效获得图的低维表示,并且降低模型训练复杂度,实现了高度定制化通信。The purpose of the embodiments of the present application is to provide a method for collaboration between a wireless access point and a reconfigurable intelligent surface, modeling a wireless communication network as a graph representation, and using a graph embedding method to obtain an embedded representation of the network. The graph embedding method can effectively obtain a low-dimensional representation of the graph, reduce the complexity of model training, and achieve highly customized communication.

为实现上述目的,本申请提供如下技术方案:To achieve the above objectives, this application provides the following technical solutions:

本申请实施例提供一种无线接入点与可重构智能表面协作方法,其特征在于,包括以下步骤:The embodiment of the present application provides a method for cooperation between a wireless access point and a reconfigurable smart surface, characterized by comprising the following steps:

步骤1:搭建基于电力物联网络的设备通信架构,所述网络架构包括:M个预安装的接入点以及J个可重构智能表面,其中每个接入点通过与相邻接入点以及可重构智能表面协作关系建模为智能体之间的相互作用,即图神经网络输入中的边,构建消息传递图神经网络的输入拓扑,利用消息传递图神经网络获得拓扑的嵌入表示,以实现为电力物联网终端提供服务;Step 1: Build a device communication architecture based on the power Internet of Things network, which includes: M pre-installed access points and J reconfigurable smart surfaces, where each access point is modeled as an interaction between intelligent agents through the collaborative relationship with adjacent access points and reconfigurable smart surfaces, that is, the edge in the graph neural network input, and construct the input topology of the message passing graph neural network. The message passing graph neural network is used to obtain the embedded representation of the topology to provide services for the power Internet of Things terminals;

步骤2:根据上述所搭建的基于电力物联网络的设备通信架构,设计相应的接入点与可重构智能表面协作方法,以最大化系统能源效率为目标,实现电力物联网络下的海量设备对于传输数据速率和可靠性方面的服务质量需求;Step 2: Based on the above-mentioned device communication architecture based on the power Internet of Things network, design the corresponding access point and reconfigurable smart surface collaboration method to maximize the system energy efficiency and meet the service quality requirements of massive devices in the power Internet of Things network in terms of transmission data rate and reliability;

步骤3:基于步骤2所提出的接入点与可重构智能表面的协作方法,各接入点与可重构智能表面根据训练完成的模型进行协作,以满足电力物联网络下海量设备的接入需求。Step 3: Based on the collaboration method between access points and reconfigurable smart surfaces proposed in step 2, each access point collaborates with the reconfigurable smart surface according to the trained model to meet the access needs of massive devices in the power Internet of Things network.

所述步骤1具体如下:The step 1 is specifically as follows:

步骤1:在电力物联网络的设备通信架构中,将网络中预装的接入点表示为

Figure 127484DEST_PATH_IMAGE001
,将网络中的可重构智能表面表述为
Figure 687034DEST_PATH_IMAGE002
,将M个无线接入点以及J个可重构智能表面表述为不同的智能体节点,将无线接入点和可重构智能表面表述为图神经网络输入中的节点,将电力物联设备接入信息、多个无线接入点与多个可重构智能表面之间的混合空间波束配置视作图拓扑中的特征,输入到消息传递图神经网络,通过消息传递图神经网络的消息传递机制获得稳定的节点特征图嵌入表示。Step 1: In the device communication architecture of the power IoT network, the pre-installed access points in the network are represented as
Figure 127484DEST_PATH_IMAGE001
, the reconfigurable smart surface in the network is expressed as
Figure 687034DEST_PATH_IMAGE002
, M wireless access points and J reconfigurable smart surfaces are represented as different intelligent agent nodes, the wireless access points and reconfigurable smart surfaces are represented as nodes in the graph neural network input, the access information of power Internet of Things devices and the hybrid spatial beam configuration between multiple wireless access points and multiple reconfigurable smart surfaces are regarded as features in the graph topology and input into the message passing graph neural network. A stable node feature graph embedding representation is obtained through the message passing mechanism of the message passing graph neural network.

所述步骤2具体如下:The step 2 is specifically as follows:

步骤2.1:为了实现动态地最大化无线接入点和可重构智能表面协作的系统能效,系统的目标函数可以表示为:Step 2.1: In order to dynamically maximize the system energy efficiency of the collaboration between wireless access points and reconfigurable smart surfaces, the objective function of the system can be expressed as:

Figure 342138DEST_PATH_IMAGE003
Figure 342138DEST_PATH_IMAGE003

其中

Figure 206188DEST_PATH_IMAGE004
表示时隙t的网络能量效率,
Figure 394462DEST_PATH_IMAGE005
表示用户参数,联合可重构智能表面单元选择、协调离散相移控制和功率分配策略,将上述长期能源效率优化问题建模为去中心化部分可观察马尔可夫决策过程。 将上述优化问题转换为去中心化部分可观察马尔可夫决策过程后,转换后的优化函数如下所示:in
Figure 206188DEST_PATH_IMAGE004
represents the network energy efficiency at time slot t,
Figure 394462DEST_PATH_IMAGE005
Representing user parameters, jointly reconfigurable smart surface unit selection, coordinated discrete phase shift control and power allocation strategy, the above long-term energy efficiency optimization problem is modeled as a decentralized partially observable Markov decision process. After converting the above optimization problem into a decentralized partially observable Markov decision process, the converted optimization function is as follows:

Figure 561132DEST_PATH_IMAGE007
Figure 561132DEST_PATH_IMAGE007

其中

Figure 532893DEST_PATH_IMAGE008
表示控制能量效率和传输可靠性之间权衡的正系数,
Figure 76001DEST_PATH_IMAGE009
为一个非负参数,它对违反数据速率施加惩罚,
Figure 384360DEST_PATH_IMAGE010
表示数据速率限制,
Figure 107948DEST_PATH_IMAGE011
在每个时隙为一个固定值,
Figure 65540DEST_PATH_IMAGE012
表示在每个时隙的数据速率,
Figure 677918DEST_PATH_IMAGE013
表示天线数量,
Figure 44046DEST_PATH_IMAGE014
表示接入点与可重构智能表面协作服务的用户。in
Figure 532893DEST_PATH_IMAGE008
represents a positive coefficient that controls the trade-off between energy efficiency and transmission reliability,
Figure 76001DEST_PATH_IMAGE009
is a non-negative parameter that imposes a penalty on data rate violations,
Figure 384360DEST_PATH_IMAGE010
Indicates the data rate limit,
Figure 107948DEST_PATH_IMAGE011
A fixed value in each time slot.
Figure 65540DEST_PATH_IMAGE012
represents the data rate in each time slot,
Figure 677918DEST_PATH_IMAGE013
Indicates the number of antennas,
Figure 44046DEST_PATH_IMAGE014
Represents the user of the collaborative service between the access point and the reconfigurable smart surface.

其全局奖励函数可以表示为:Its global reward function can be expressed as:

Figure 942732DEST_PATH_IMAGE016
Figure 942732DEST_PATH_IMAGE016

步骤2.2:通过集成图嵌入和不同奖励两种技术实现更高效的合作学习,智能体表示无线接入点以及可重构智能表面,智能体之间的相互作用表示无线通信环境及其通信方式,智能体及其之间的相互作用被建模为有向通信图

Figure 561189DEST_PATH_IMAGE017
,其中智能体被建模为节点I,智能体之间的相互作用被建模成有向边
Figure 977258DEST_PATH_IMAGE018
Figure 994630DEST_PATH_IMAGE019
表示节点的特征,
Figure 408425DEST_PATH_IMAGE020
表示边的特征,Step 2.2: More efficient cooperative learning is achieved by integrating graph embedding and different rewards. The agents represent wireless access points and reconfigurable smart surfaces. The interactions between agents represent the wireless communication environment and their communication methods. The agents and their interactions are modeled as directed communication graphs.
Figure 561189DEST_PATH_IMAGE017
, where the agents are modeled as nodes I and the interactions between agents are modeled as directed edges
Figure 977258DEST_PATH_IMAGE018
,
Figure 994630DEST_PATH_IMAGE019
Represents the characteristics of the node,
Figure 408425DEST_PATH_IMAGE020
Represents the characteristics of the edge,

无线接入点i的节点特征包括接入点到其关联设备的空间信道信息、关联用户的队列信息以及接入点的本地动作观察历史:The node characteristics of wireless access point i include the spatial channel information from the access point to its associated devices, the queue information of the associated users, and the local action observation history of the access point:

Figure 606188DEST_PATH_IMAGE021
Figure 606188DEST_PATH_IMAGE021

边的特征描述了智能体

Figure 530675DEST_PATH_IMAGE022
到智能体
Figure 169598DEST_PATH_IMAGE023
之间的相互作用,在数学上可以表示为:The characteristics of the edge describe the agent
Figure 530675DEST_PATH_IMAGE022
To Agent
Figure 169598DEST_PATH_IMAGE023
The interaction between them can be expressed mathematically as:

Figure 908622DEST_PATH_IMAGE025
Figure 908622DEST_PATH_IMAGE025

步骤2.3:由于在大规模网络中图节点及边具有高维特征,提出了一种基于图嵌入的动作生成模块,在每一个分布式节点

Figure 203468DEST_PATH_IMAGE022
处维护一个消息传递图神经网络。与多层感知机类似,消息传递图神经网络采用分层结构,在每个消息传递图神经网络层当中,每个智能体首先将嵌入信息传输给其相邻的智能体,然后聚合来自相邻智能的嵌入信息并更新其本地隐藏状态,消息传递过程如下式所示:Step 2.3: Since graph nodes and edges have high-dimensional characteristics in large-scale networks, a graph embedding-based action generation module is proposed.
Figure 203468DEST_PATH_IMAGE022
A message passing graph neural network is maintained at each layer. Similar to the multi-layer perceptron, the message passing graph neural network adopts a hierarchical structure. In each message passing graph neural network layer, each agent first transmits the embedded information to its neighboring agents, and then aggregates the embedded information from the neighboring agents and updates its local hidden state. The message passing process is shown in the following formula:

Figure 758077DEST_PATH_IMAGE026
Figure 758077DEST_PATH_IMAGE026

其中

Figure 752971DEST_PATH_IMAGE027
表示消息函数,
Figure 898782DEST_PATH_IMAGE028
表示更新操作,在图嵌入模块之后,智能体
Figure 179459DEST_PATH_IMAGE022
将使用门控循环单元根据输出的局部嵌入状态
Figure 741022DEST_PATH_IMAGE029
预测局部动作,其中门控循环单元是长短期记忆网络的简化变体,局部嵌入状态如下式所示:in
Figure 752971DEST_PATH_IMAGE027
Represents a message function,
Figure 898782DEST_PATH_IMAGE028
represents the update operation. After the graph embedding module, the agent
Figure 179459DEST_PATH_IMAGE022
The gated recurrent unit will be used to embed the local state of the output
Figure 741022DEST_PATH_IMAGE029
Predict local actions, where the gated recurrent unit is a simplified variant of the long short-term memory network, and the local embedding state is as follows:

Figure 682433DEST_PATH_IMAGE030
Figure 682433DEST_PATH_IMAGE030

智能体

Figure 375976DEST_PATH_IMAGE022
所采用的局部动作
Figure 442152DEST_PATH_IMAGE031
是从动作生成子策略
Figure 368258DEST_PATH_IMAGE032
中采样得到的,Agent
Figure 375976DEST_PATH_IMAGE022
Local actions used
Figure 442152DEST_PATH_IMAGE031
Is to generate sub-strategies from actions
Figure 368258DEST_PATH_IMAGE032
The sample obtained from

步骤2.4:将分布式策略中的图嵌入模块和动作生成模块的组合参数表示为

Figure 773962DEST_PATH_IMAGE033
,我们的目标是最大化性能函数:Step 2.4: Express the combined parameters of the graph embedding module and the action generation module in the distributed strategy as
Figure 773962DEST_PATH_IMAGE033
, our goal is to maximize the performance function:

Figure 700723DEST_PATH_IMAGE034
Figure 700723DEST_PATH_IMAGE034

其中

Figure 316512DEST_PATH_IMAGE035
是遵循联合策略
Figure 954298DEST_PATH_IMAGE036
的联合状态转换,基于优势函数计算策略梯度,其由下式给出:in
Figure 316512DEST_PATH_IMAGE035
Follow a joint strategy
Figure 954298DEST_PATH_IMAGE036
The joint state transition of , calculates the policy gradient based on the advantage function, which is given by:

Figure 103258DEST_PATH_IMAGE038
Figure 103258DEST_PATH_IMAGE038

其中

Figure 371559DEST_PATH_IMAGE039
是图嵌入的实际输入,
Figure 740223DEST_PATH_IMAGE040
表示时间差优势,由下式给出:in
Figure 371559DEST_PATH_IMAGE039
is the actual input of the graph embedding,
Figure 740223DEST_PATH_IMAGE040
represents the time difference advantage, which is given by:

Figure 417586DEST_PATH_IMAGE041
Figure 417586DEST_PATH_IMAGE041

其中

Figure 922517DEST_PATH_IMAGE042
表示全局状态值,
Figure 922571DEST_PATH_IMAGE043
表示全局状态-动作值,为了解决训练期间的信用分配问题,利用价值分解来训练分布式的网络,将全局状态值
Figure 653898DEST_PATH_IMAGE044
分解为与混合函数相结合的形式,如下式所示:in
Figure 922517DEST_PATH_IMAGE042
Represents the global state value,
Figure 922571DEST_PATH_IMAGE043
Represents the global state-action value. In order to solve the credit allocation problem during training, value decomposition is used to train the distributed network.
Figure 653898DEST_PATH_IMAGE044
Decomposed into a form combined with a mixing function, as shown below:

Figure 492541DEST_PATH_IMAGE046
Figure 492541DEST_PATH_IMAGE046

其中

Figure 986668DEST_PATH_IMAGE047
表示智能体
Figure 924668DEST_PATH_IMAGE022
的局部状态值,在集中训练过程中,每个智能体通过基于局部图嵌入特征来评估其对全局奖励改进的贡献从而获得不同的奖励,以此进一步促进智能体之间的协调,将
Figure 500880DEST_PATH_IMAGE048
表示为分布式网络的权重参数,这些权重参数在智能体之间共享,使用
Figure 549739DEST_PATH_IMAGE049
表示混合网络
Figure 406093DEST_PATH_IMAGE050
的权重,通过小批量梯度下降优化分布式和混合网络,使得以下损失最小化:in
Figure 986668DEST_PATH_IMAGE047
Representing an Agent
Figure 924668DEST_PATH_IMAGE022
In the centralized training process, each agent obtains different rewards by evaluating its contribution to the improvement of the global reward based on the local graph embedding features, which further promotes the coordination between agents.
Figure 500880DEST_PATH_IMAGE048
Represented as weight parameters of a distributed network, these weight parameters are shared among agents using
Figure 549739DEST_PATH_IMAGE049
Representing a hybrid network
Figure 406093DEST_PATH_IMAGE050
The weights of , and the distributed and hybrid networks are optimized by mini-batch gradient descent to minimize the following loss:

Figure 842890DEST_PATH_IMAGE051
Figure 842890DEST_PATH_IMAGE051

其中

Figure 17651DEST_PATH_IMAGE052
是由最后一个状态引导的n步返回,n的上限为T,混合网络的参数可以由下式更新:in
Figure 17651DEST_PATH_IMAGE052
It is the n-step return guided by the last state, the upper limit of n is T, and the parameters of the hybrid network can be updated as follows:

Figure 696632DEST_PATH_IMAGE053
Figure 696632DEST_PATH_IMAGE053

其中

Figure 702765DEST_PATH_IMAGE054
是混合网络更新的学习率,进一步在分布式网络中共享非输出层的权值参数,表示分布式网络的组合权重参数为
Figure 421716DEST_PATH_IMAGE055
,关于
Figure 146089DEST_PATH_IMAGE056
的梯度可以计算为:in
Figure 702765DEST_PATH_IMAGE054
is the learning rate of the hybrid network update, and further shares the weight parameters of the non-output layer in the distributed network, indicating that the combined weight parameters of the distributed network are
Figure 421716DEST_PATH_IMAGE055
,about
Figure 146089DEST_PATH_IMAGE056
The gradient of can be calculated as:

Figure 864646DEST_PATH_IMAGE057
Figure 864646DEST_PATH_IMAGE057

分布式网络的更新规则可以推导为:The update rule of the distributed network can be derived as follows:

Figure 223821DEST_PATH_IMAGE058
Figure 223821DEST_PATH_IMAGE058

其中,

Figure 877788DEST_PATH_IMAGE059
Figure 918818DEST_PATH_IMAGE060
分别表示策略改进学习率和critic学习学习率。in,
Figure 877788DEST_PATH_IMAGE059
and
Figure 918818DEST_PATH_IMAGE060
They represent the strategy improvement learning rate and the critic learning rate respectively.

所述步骤3具体如下:The step 3 is specifically as follows:

步骤3.1:将实际观测得到的电力物联网数据作为智能体观测状态以及环境信息输入到基于图嵌入的网络更新算法当中,初始化网络参数,初始化网络学习率

Figure 50853DEST_PATH_IMAGE061
,Step 3.1: Input the actual observed power IoT data as the agent observation state and environmental information into the network update algorithm based on graph embedding, initialize the network parameters, and initialize the network learning rate
Figure 50853DEST_PATH_IMAGE061
,

步骤3.2:从经验池中抽取一个批次的数据

Figure 530114DEST_PATH_IMAGE062
,根据步骤2.4中所推导的公式计算策略梯度
Figure 449922DEST_PATH_IMAGE063
以及网络损失
Figure 791298DEST_PATH_IMAGE064
,基于步骤2.4中的混合网络参数更新公式更新混合网络参数,Step 3.2: Extract a batch of data from the experience pool
Figure 530114DEST_PATH_IMAGE062
, calculate the policy gradient according to the formula derived in step 2.4
Figure 449922DEST_PATH_IMAGE063
and network loss
Figure 791298DEST_PATH_IMAGE064
, update the hybrid network parameters based on the hybrid network parameter update formula in step 2.4,

步骤3.3:进一步根据步骤2.4中所述分布式网络参数更新算法更新电力物联网络中的网络参数,直至网络收敛,Step 3.3: Further update the network parameters in the power Internet of Things network according to the distributed network parameter update algorithm described in step 2.4 until the network converges.

步骤3.4:训练好的网络参数定期更新,或在电力物联网络发生较大变化时重新训练并更新网络参数,以此满足电路物联网络中设备的接入需求,实现定制化通信。Step 3.4: The trained network parameters are updated regularly, or the network parameters are retrained and updated when there are major changes in the power Internet of Things network, so as to meet the access requirements of devices in the circuit Internet of Things network and realize customized communication.

与现有技术相比,本申请的有益效果是:本申请针对电力物联网络需求,提出了一种无线接入点与可重构智能表面协作框架,以此满足海量设备接入需求。本申请通过实现无线接入点和可重构智能表面之间的协作,动态的最大化系统能效,以此实现高效通信。除此之外,本申请提出了一种基于图嵌入的无线网络表示方法,将巨大的无线通信网络建模成图,并且使用图嵌入的方法对其降维以获得高效的图表示。本申请提出的方法可以有效的降低模型训练复杂度,实现了高度定制化的通信。Compared with the prior art, the beneficial effects of the present application are as follows: the present application proposes a collaborative framework of wireless access points and reconfigurable smart surfaces to meet the access needs of massive devices in response to the needs of power Internet of Things networks. The present application dynamically maximizes the energy efficiency of the system by realizing the collaboration between wireless access points and reconfigurable smart surfaces, thereby achieving efficient communication. In addition, the present application proposes a wireless network representation method based on graph embedding, which models a huge wireless communication network into a graph, and uses a graph embedding method to reduce its dimension to obtain an efficient graph representation. The method proposed in the present application can effectively reduce the complexity of model training and realize highly customized communication.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for use in the embodiments of the present application will be briefly introduced below. It should be understood that the following drawings only show certain embodiments of the present application and therefore should not be regarded as limiting the scope. For ordinary technicians in this field, other related drawings can be obtained based on these drawings without paying creative work.

图1为本申请实施例的方法流程图。FIG1 is a flow chart of a method according to an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. It should be noted that similar reference numerals and letters represent similar items in the following drawings, so once an item is defined in one drawing, it does not need to be further defined and explained in the subsequent drawings.

术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。The terms "comprises," "comprising," or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also other elements not explicitly listed, or elements inherent to such process, method, article, or apparatus. In the absence of further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article, or apparatus that includes the element.

请参见图1,本申请提供了一种无线接入点与可重构智能表面协作方法,包括以下步骤。Please refer to Figure 1. The present application provides a method for collaboration between a wireless access point and a reconfigurable smart surface, including the following steps.

步骤1:搭建基于电力物联网络的设备通信架构,所述网络架构包括:M个预安装的接入点以及J个可重构智能表面,其中每个接入点通过与相邻接入点以及可重构智能表面协作关系建模为智能体之间的相互作用,即图神经网络输入中的边,构建消息传递图神经网络的输入拓扑,利用消息传递图神经网络获得拓扑的嵌入表示,以实现为电力物联网终端提供服务;Step 1: Build a device communication architecture based on the power Internet of Things network, which includes: M pre-installed access points and J reconfigurable smart surfaces, where each access point is modeled as an interaction between intelligent agents through the collaborative relationship with adjacent access points and reconfigurable smart surfaces, that is, the edge in the graph neural network input, and construct the input topology of the message passing graph neural network. The message passing graph neural network is used to obtain the embedded representation of the topology to provide services for the power Internet of Things terminals;

步骤2:根据上述所搭建的基于电力物联网络的设备通信架构,设计相应的接入点与可重构智能表面协作方法,以最大化系统能源效率为目标,实现电力物联网络下的海量设备对于传输数据速率和可靠性方面的服务质量需求;Step 2: Based on the above-mentioned device communication architecture based on the power Internet of Things network, design the corresponding access point and reconfigurable smart surface collaboration method to maximize the system energy efficiency and meet the service quality requirements of massive devices in the power Internet of Things network in terms of transmission data rate and reliability;

步骤3:基于步骤2所提出的接入点与可重构智能表面的协作方法,各接入点与可重构智能表面根据训练完成的模型进行协作,以满足电力物联网络下海量设备的接入需求。Step 3: Based on the collaboration method between access points and reconfigurable smart surfaces proposed in step 2, each access point collaborates with the reconfigurable smart surface according to the trained model to meet the access needs of massive devices in the power Internet of Things network.

作为优选,所述步骤1具体如下:Preferably, the step 1 is as follows:

步骤1:在电力物联网络的设备通信架构中,将网络中预装的接入点表示为

Figure 789341DEST_PATH_IMAGE001
,将网络中的可重构智能表面表述为
Figure 857529DEST_PATH_IMAGE002
,将M个无线接入点以及J个可重构智能表面表述为不同的智能体节点,将M个无线接入点以及J个可重构智能表面表述为不同的智能体节点,将无线接入点和可重构智能表面表述为图神经网络输入中的节点,将电力物联设备接入信息、多个无线接入点与多个可重构智能表面之间的混合空间波束配置视作图拓扑中的特征,输入到消息传递图神经网络,通过消息传递图神经网络的消息传递机制获得稳定的节点特征图嵌入表示。Step 1: In the device communication architecture of the power IoT network, the pre-installed access points in the network are represented as
Figure 789341DEST_PATH_IMAGE001
, the reconfigurable smart surface in the network is expressed as
Figure 857529DEST_PATH_IMAGE002
, M wireless access points and J reconfigurable smart surfaces are represented as different intelligent agent nodes, M wireless access points and J reconfigurable smart surfaces are represented as different intelligent agent nodes, wireless access points and reconfigurable smart surfaces are represented as nodes in the input of the graph neural network, the access information of the power Internet of Things devices and the hybrid spatial beam configuration between multiple wireless access points and multiple reconfigurable smart surfaces are regarded as features in the graph topology and input into the message passing graph neural network, and a stable node feature graph embedding representation is obtained through the message passing mechanism of the message passing graph neural network.

作为优选,所述步骤2具体如下:Preferably, the step 2 is as follows:

步骤2.1:由于电力物联网的网络边缘具有海量设备,而实现高性能的海量设备接入框架需要被精心设计,我们可以通过设计接入点与可重构智能表面之间的协作,灵活协调地重构混合波束,使得设备被协调接入通信网络当中,实现可定制的智能通信。因此,为了实现动态地最大化无线接入点和可重构智能表面协作的系统能效,系统的目标函数可以表示为:Step 2.1: Since there are a large number of devices at the edge of the power Internet of Things network, and the high-performance access framework for a large number of devices needs to be carefully designed, we can design the collaboration between the access point and the reconfigurable smart surface, flexibly and coordinately reconstruct the hybrid beam, so that the devices are coordinated to access the communication network and realize customizable smart communication. Therefore, in order to dynamically maximize the system energy efficiency of the collaboration between the wireless access point and the reconfigurable smart surface, the objective function of the system can be expressed as:

Figure 977932DEST_PATH_IMAGE065
Figure 977932DEST_PATH_IMAGE065

其中

Figure 898614DEST_PATH_IMAGE004
表示时隙t的网络能量效率。此目标函数可以建模为受约束的马尔可夫决策过程,然而,由于大规模联合状态-动作空间以及多个无线接入点和可重构智能表面到集中控制器的高维信息交换开销很大,以集中的方式求解上述问题在计算上效率低下。为了以高效且低复杂的方式处理上述问题并且保证多样化用户性能的同时最大化网络能量效率,我们可以联合可重构智能表面单元选择、协调离散相移控制和功率分配策略,将上述长期能源效率优化问题建模为去中心化部分可观察马尔可夫决策过程。具体来说部分可观察马尔可夫决策过程提供了一个通用的框架来描述具有不完整信息的马尔可夫决策过程,而去中心化部分可观察马尔可夫决策过程将其扩展到分散的位置。in
Figure 898614DEST_PATH_IMAGE004
Denotes the network energy efficiency at time slot t. This objective function can be modeled as a constrained Markov decision process. However, solving the above problem in a centralized manner is computationally inefficient due to the large-scale joint state-action space and the high-dimensional information exchange overhead from multiple wireless access points and reconfigurable smart surfaces to the centralized controller. In order to handle the above problem in an efficient and low-complexity manner and maximize the network energy efficiency while ensuring the performance of diverse users, we can combine reconfigurable smart surface unit selection, coordinated discrete phase shift control and power allocation strategy to model the above long-term energy efficiency optimization problem as a decentralized partially observable Markov decision process. Specifically, the partially observable Markov decision process provides a general framework to describe Markov decision processes with incomplete information, while the decentralized partially observable Markov decision process extends it to decentralized locations.

基于Lyapunov优化理论,我们可以将上述优化问题转换为去中心化部分可观察马尔可夫决策过程,转换后的优化函数如下所示:Based on Lyapunov optimization theory, we can transform the above optimization problem into a decentralized partially observable Markov decision process. The transformed optimization function is as follows:

Figure 405075DEST_PATH_IMAGE066
Figure 405075DEST_PATH_IMAGE066

其中

Figure 298076DEST_PATH_IMAGE008
表示控制能量效率和传输可靠性之间权衡的正系数,
Figure 353495DEST_PATH_IMAGE009
为一个非负参数,它对违反数据速率施加惩罚,
Figure 495894DEST_PATH_IMAGE010
表示数据速率限制,
Figure 337204DEST_PATH_IMAGE011
在每个时隙为一个固定值,
Figure 819133DEST_PATH_IMAGE012
表示在每个时隙的数据速率,
Figure 717556DEST_PATH_IMAGE013
表示天线数量,
Figure 940727DEST_PATH_IMAGE014
表示接入点与可重构智能表面协作服务的用户。in
Figure 298076DEST_PATH_IMAGE008
represents a positive coefficient that controls the trade-off between energy efficiency and transmission reliability,
Figure 353495DEST_PATH_IMAGE009
is a non-negative parameter that imposes a penalty on data rate violations,
Figure 495894DEST_PATH_IMAGE010
Indicates the data rate limit,
Figure 337204DEST_PATH_IMAGE011
A fixed value in each time slot.
Figure 819133DEST_PATH_IMAGE012
represents the data rate in each time slot,
Figure 717556DEST_PATH_IMAGE013
Indicates the number of antennas,
Figure 940727DEST_PATH_IMAGE014
Represents the user of the collaborative service between the access point and the reconfigurable smart surface.

其全局奖励函数可以表示为:Its global reward function can be expressed as:

Figure 756367DEST_PATH_IMAGE067
Figure 756367DEST_PATH_IMAGE067

步骤2.2:步骤2.1所述的优化问题可以使用传统的多智能体强化学习的方法去求解,但是由于需要在相邻智能体之间交换信息以实现协作,传统的多智能体强化学习的方法在处理高维信息时会导致高通信开销和延迟,因此现有的多智能体强化学习方法在解决高度耦合的去中心化部分可观察马尔可夫决策过程问题上效率低下。我们扩展了现有的多智能体强化学习算法中常用的中心化训练去中心化执行,通过集成图嵌入和不同奖励两种技术实现更高效的合作学习。智能体表示无线接入点以及可重构智能表面。智能体之间的相互作用表示无线通信环境及其通信方式。智能体及其之间的相互作用被建模为有向通信图

Figure 633146DEST_PATH_IMAGE017
。其中智能体被建模为节点I,智能体之间的相互作用被建模成有向边
Figure 203936DEST_PATH_IMAGE018
Figure 147359DEST_PATH_IMAGE019
表示节点的特征,
Figure 766690DEST_PATH_IMAGE020
表示边的特征。Step 2.2: The optimization problem described in step 2.1 can be solved using traditional multi-agent reinforcement learning methods. However, due to the need to exchange information between adjacent agents to achieve collaboration, traditional multi-agent reinforcement learning methods will result in high communication overhead and delay when processing high-dimensional information. Therefore, existing multi-agent reinforcement learning methods are inefficient in solving highly coupled decentralized partially observable Markov decision process problems. We have expanded the centralized training and decentralized execution commonly used in existing multi-agent reinforcement learning algorithms to achieve more efficient cooperative learning by integrating graph embedding and different reward techniques. Agents represent wireless access points and reconfigurable smart surfaces. The interactions between agents represent the wireless communication environment and its communication methods. Agents and their interactions are modeled as directed communication graphs
Figure 633146DEST_PATH_IMAGE017
The agents are modeled as nodes I, and the interactions between agents are modeled as directed edges
Figure 203936DEST_PATH_IMAGE018
,
Figure 147359DEST_PATH_IMAGE019
Represents the characteristics of the node,
Figure 766690DEST_PATH_IMAGE020
Represents the characteristics of an edge.

无线接入点i的节点特征包括接入点到其关联设备的空间信道信息、关联用户的队列信息以及接入点的本地动作观察历史:The node characteristics of wireless access point i include the spatial channel information from the access point to its associated devices, the queue information of the associated users, and the local action observation history of the access point:

Figure 879003DEST_PATH_IMAGE021
Figure 879003DEST_PATH_IMAGE021

边的特征描述了智能体

Figure 856579DEST_PATH_IMAGE022
到智能体
Figure 195288DEST_PATH_IMAGE023
之间的相互作用,在数学上可以表示为:The characteristics of the edge describe the agent
Figure 856579DEST_PATH_IMAGE022
To Agent
Figure 195288DEST_PATH_IMAGE023
The interaction between them can be expressed mathematically as:

Figure 913583DEST_PATH_IMAGE068
Figure 913583DEST_PATH_IMAGE068

步骤2.3:由于在大规模网络中图节点及边具有高维特征,因此我们提出了一种基于图嵌入的动作生成模块。该模块利用消息传递图神经网络学习有向图的低维嵌入特征,能够有效提高网络的泛化能力并且增强无线接入点和可重构智能表面之间的协作能力,同时只需要较低的信息交换开销。Step 2.3: Since graph nodes and edges have high-dimensional features in large-scale networks, we propose an action generation module based on graph embedding. This module uses message passing graph neural networks to learn low-dimensional embedding features of directed graphs, which can effectively improve the generalization ability of the network and enhance the collaboration between wireless access points and reconfigurable smart surfaces, while requiring only low information exchange overhead.

我们在每一个分布式节点

Figure 490189DEST_PATH_IMAGE022
处维护一个消息传递图神经网络。与多层感知机类似,消息传递图神经网络采用分层结构。在每个消息传递图神经网络层当中,每个智能体首先将嵌入信息传输给其相邻的智能体,然后聚合来自相邻智能的嵌入信息并更新其本地隐藏状态,消息传递过程如下式所示:We have distributed nodes
Figure 490189DEST_PATH_IMAGE022
A message passing graph neural network is maintained at each layer. Similar to the multi-layer perceptron, the message passing graph neural network adopts a hierarchical structure. In each message passing graph neural network layer, each agent first transmits the embedded information to its neighboring agents, then aggregates the embedded information from the neighboring agents and updates its local hidden state. The message passing process is shown in the following formula:

Figure 169825DEST_PATH_IMAGE026
Figure 169825DEST_PATH_IMAGE026

其中

Figure 464672DEST_PATH_IMAGE027
表示消息函数,
Figure 986658DEST_PATH_IMAGE028
表示更新操作。在图嵌入模块之后,智能体
Figure 542404DEST_PATH_IMAGE022
将使用门控循环单元根据输出的局部嵌入状态
Figure 330625DEST_PATH_IMAGE029
预测局部动作,其中门控循环单元是长短期记忆网络的简化变体,局部嵌入状态如下式所示:in
Figure 464672DEST_PATH_IMAGE027
Represents a message function,
Figure 986658DEST_PATH_IMAGE028
represents the update operation. After the graph embedding module, the agent
Figure 542404DEST_PATH_IMAGE022
The gated recurrent unit will be used to embed the local state of the output
Figure 330625DEST_PATH_IMAGE029
Predict local actions, where the gated recurrent unit is a simplified variant of the long short-term memory network, and the local embedding state is as follows:

Figure 909505DEST_PATH_IMAGE030
Figure 909505DEST_PATH_IMAGE030

智能体

Figure 533384DEST_PATH_IMAGE022
所采用的局部动作
Figure 848697DEST_PATH_IMAGE031
是从动作生成子策略
Figure 837512DEST_PATH_IMAGE032
中采样得到的。Agent
Figure 533384DEST_PATH_IMAGE022
Local actions used
Figure 848697DEST_PATH_IMAGE031
Is to generate sub-strategies from actions
Figure 837512DEST_PATH_IMAGE032
obtained by sampling in .

步骤2.4:将分布式策略中的图嵌入模块和动作生成模块的组合参数表示为

Figure 467470DEST_PATH_IMAGE033
,我们的目标是最大化性能函数:Step 2.4: Express the combined parameters of the graph embedding module and the action generation module in the distributed strategy as
Figure 467470DEST_PATH_IMAGE033
, our goal is to maximize the performance function:

Figure 832724DEST_PATH_IMAGE034
Figure 832724DEST_PATH_IMAGE034

其中

Figure 2543DEST_PATH_IMAGE035
是遵循联合策略
Figure 490156DEST_PATH_IMAGE036
的联合状态转换。因此,我们基于优势函数计算策略梯度,其由下式给出:in
Figure 2543DEST_PATH_IMAGE035
Follow a joint strategy
Figure 490156DEST_PATH_IMAGE036
Therefore, we calculate the policy gradient based on the advantage function, which is given by:

Figure 43628DEST_PATH_IMAGE037
Figure 43628DEST_PATH_IMAGE037

其中

Figure 182879DEST_PATH_IMAGE039
是图嵌入的实际输入,
Figure 708669DEST_PATH_IMAGE040
表示时间差优势,由下式给出:in
Figure 182879DEST_PATH_IMAGE039
is the actual input of the graph embedding,
Figure 708669DEST_PATH_IMAGE040
represents the time difference advantage, which is given by:

Figure 537823DEST_PATH_IMAGE041
Figure 537823DEST_PATH_IMAGE041

其中

Figure 47432DEST_PATH_IMAGE042
表示全局状态值,
Figure 255953DEST_PATH_IMAGE043
表示全局状态-动作值。为了解决训练期间的信用分配问题,我们利用价值分解来训练分布式的网络,将全局状态值
Figure 760884DEST_PATH_IMAGE044
分解为与混合函数相结合的形式,如下式所示:in
Figure 47432DEST_PATH_IMAGE042
Represents the global state value,
Figure 255953DEST_PATH_IMAGE043
Represents the global state-action value. To solve the credit assignment problem during training, we use value decomposition to train a distributed network and assign the global state value
Figure 760884DEST_PATH_IMAGE044
Decomposed into a form combined with a mixing function, as shown below:

Figure 760939DEST_PATH_IMAGE069
Figure 760939DEST_PATH_IMAGE069

其中

Figure 492266DEST_PATH_IMAGE047
表示智能体
Figure 973319DEST_PATH_IMAGE022
的局部状态值。在集中训练过程中,每个智能体通过基于局部图嵌入特征来评估其对全局奖励改进的贡献从而获得不同的奖励,以此进一步促进智能体之间的协调。将
Figure 332756DEST_PATH_IMAGE048
表示为分布式网络的权重参数,这些权重参数在智能体之间共享,使用
Figure 270756DEST_PATH_IMAGE049
表示混合网络
Figure 456756DEST_PATH_IMAGE050
的权重。通过小批量梯度下降优化分布式和混合网络,使得以下损失最小化:in
Figure 492266DEST_PATH_IMAGE047
Representing an Agent
Figure 973319DEST_PATH_IMAGE022
During the centralized training process, each agent receives different rewards by evaluating its contribution to the improvement of the global reward based on the local graph embedding features, which further promotes the coordination between agents.
Figure 332756DEST_PATH_IMAGE048
Represented as weight parameters of a distributed network, these weight parameters are shared among agents using
Figure 270756DEST_PATH_IMAGE049
Representing a hybrid network
Figure 456756DEST_PATH_IMAGE050
The distributed and hybrid networks are optimized by mini-batch gradient descent to minimize the following loss:

Figure 771194DEST_PATH_IMAGE070
Figure 771194DEST_PATH_IMAGE070

其中

Figure 510040DEST_PATH_IMAGE052
是由最后一个状态引导的n步返回,n的上限为T。因此,混合网络的参数可以由下式更新:in
Figure 510040DEST_PATH_IMAGE052
It is n steps back guided by the last state, and the upper limit of n is T. Therefore, the parameters of the hybrid network can be updated as follows:

Figure 87783DEST_PATH_IMAGE071
Figure 87783DEST_PATH_IMAGE071

其中

Figure 652756DEST_PATH_IMAGE054
是混合网络更新的学习率。为了降低复杂度,我们进一步在分布式网络中共享非输出层的权值参数,表示分布式网络的组合权重参数为
Figure 941524DEST_PATH_IMAGE055
。因此,关于
Figure 275553DEST_PATH_IMAGE056
的梯度可以计算为:in
Figure 652756DEST_PATH_IMAGE054
is the learning rate of the hybrid network update. To reduce complexity, we further share the weight parameters of the non-output layer in the distributed network, indicating that the combined weight parameters of the distributed network are
Figure 941524DEST_PATH_IMAGE055
Therefore, regarding
Figure 275553DEST_PATH_IMAGE056
The gradient of can be calculated as:

Figure 56821DEST_PATH_IMAGE072
Figure 56821DEST_PATH_IMAGE072

因此,分布式网络的更新规则可以推导为:Therefore, the update rule of the distributed network can be derived as:

Figure 515615DEST_PATH_IMAGE073
Figure 515615DEST_PATH_IMAGE073

其中,

Figure 280178DEST_PATH_IMAGE059
Figure 406397DEST_PATH_IMAGE060
分别表示策略改进学习率和critic学习学习率。in,
Figure 280178DEST_PATH_IMAGE059
and
Figure 406397DEST_PATH_IMAGE060
They represent the strategy improvement learning rate and the critic learning rate respectively.

作为优选,所述步骤3具体如下:Preferably, the step 3 is as follows:

步骤3.1:将实际观测得到的电力物联网数据作为智能体观测状态以及环境信息输入到基于图嵌入的网络更新算法当中,初始化网络参数,初始化网络学习率

Figure 686462DEST_PATH_IMAGE061
。Step 3.1: Input the actual observed power IoT data as the agent observation state and environmental information into the network update algorithm based on graph embedding, initialize the network parameters, and initialize the network learning rate
Figure 686462DEST_PATH_IMAGE061
.

步骤3.2:从经验池中抽取一个批次的数据

Figure 835814DEST_PATH_IMAGE062
,根据步骤2.4中所推导的公式计算策略梯度
Figure 358063DEST_PATH_IMAGE063
以及网络损失
Figure 837323DEST_PATH_IMAGE064
,基于步骤2.4中的混合网络参数更新公式更新混合网络参数。Step 3.2: Extract a batch of data from the experience pool
Figure 835814DEST_PATH_IMAGE062
, calculate the policy gradient according to the formula derived in step 2.4
Figure 358063DEST_PATH_IMAGE063
and network loss
Figure 837323DEST_PATH_IMAGE064
, update the hybrid network parameters based on the hybrid network parameter update formula in step 2.4.

步骤3.3:进一步根据步骤2.4中所述分布式网络参数更新算法更新电力物联网络中的网络参数,直至网络收敛。Step 3.3: Further update the network parameters in the power Internet of Things network according to the distributed network parameter update algorithm described in step 2.4 until the network converges.

步骤3.4:训练好的网络参数定期更新,或在电力物联网络发生较大变化时重新训练并更新网络参数。以此满足电路物联网络中设备的接入需求,实现定制化通信。Step 3.4: The trained network parameters are updated regularly, or retrained and updated when there are major changes in the power Internet of Things network. This can meet the access requirements of devices in the circuit Internet of Things network and achieve customized communication.

以上所述仅为本申请的实施例而已,并不用于限制本申请的保护范围,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above description is only an embodiment of the present application and is not intended to limit the scope of protection of the present application. For those skilled in the art, the present application may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included in the scope of protection of the present application.

Claims (1)

1.一种无线接入点与可重构智能表面协作方法,其特征在于,包括以下步骤:1. A method for cooperation between a wireless access point and a reconfigurable smart surface, characterized by comprising the following steps: 步骤1:搭建基于电力物联网络的设备通信架构,所述设备通信架构包括:M个预安装的接入点以及J个可重构智能表面,其中每个接入点通过与相邻接入点以及可重构智能表面协作关系建模为智能体之间的相互作用,即图神经网络输入中的边,构建消息传递图神经网络的输入拓扑,利用消息传递图神经网络获得拓扑的嵌入表示,以实现为电力物联网终端提供服务;Step 1: Build a device communication architecture based on the power Internet of Things network, which includes: M pre-installed access points and J reconfigurable smart surfaces, where each access point is modeled as an interaction between intelligent agents through the collaborative relationship with adjacent access points and reconfigurable smart surfaces, that is, the edge in the graph neural network input, and construct the input topology of the message passing graph neural network. The message passing graph neural network is used to obtain the embedded representation of the topology to provide services for the power Internet of Things terminals; 步骤2:根据上述所搭建的基于电力物联网络的设备通信架构,设计相应的接入点与可重构智能表面协作方法,以最大化系统能源效率为目标,实现电力物联网络下的海量设备对于传输数据速率和可靠性方面的服务质量需求;Step 2: Based on the above-mentioned device communication architecture based on the power Internet of Things network, design the corresponding access point and reconfigurable smart surface collaboration method to maximize the system energy efficiency and meet the service quality requirements of massive devices in the power Internet of Things network in terms of transmission data rate and reliability; 步骤3:基于步骤2所提出的接入点与可重构智能表面的协作方法,各接入点与可重构智能表面根据训练完成的模型进行协作,以满足电力物联网络下海量设备的接入需求;Step 3: Based on the collaboration method between access points and reconfigurable smart surfaces proposed in step 2, each access point collaborates with the reconfigurable smart surface according to the trained model to meet the access needs of massive devices in the power Internet of Things network; 所述步骤1具体如下:The step 1 is specifically as follows: 在电力物联网络的设备通信架构中,将网络中预装的接入点表示为
Figure QLYQS_1
,将网络中的可重构智能表面表述为
Figure QLYQS_2
,将M个无线接入点以及J个可重构智能表面表述为不同的智能体节点,将无线接入点和可重构智能表面表述为图神经网络输入中的节点,将电力物联设备接入信息、多个无线接入点与多个可重构智能表面之间的混合空间波束配置视作图拓扑中的特征,输入到消息传递图神经网络,通过消息传递图神经网络的消息传递机制获得稳定的节点特征图嵌入表示;
In the device communication architecture of the power Internet of Things network, the pre-installed access points in the network are represented as
Figure QLYQS_1
, the reconfigurable smart surface in the network is expressed as
Figure QLYQS_2
, M wireless access points and J reconfigurable smart surfaces are represented as different intelligent agent nodes, wireless access points and reconfigurable smart surfaces are represented as nodes in the graph neural network input, and the access information of power IoT devices and the hybrid spatial beam configuration between multiple wireless access points and multiple reconfigurable smart surfaces are regarded as features in the graph topology and input into the message passing graph neural network. A stable node feature graph embedding representation is obtained through the message passing mechanism of the message passing graph neural network.
所述步骤2具体如下:The step 2 is specifically as follows: 步骤2.1:将系统能源效率优化问题建模为去中心化部分可观察马尔可夫决策过程;Step 2.1: Model the system energy efficiency optimization problem as a decentralized partially observable Markov decision process; 为了实现动态地最大化无线接入点和可重构智能表面协作的系统能效,系统的目标函数可以表示为:In order to dynamically maximize the system energy efficiency of the collaboration between wireless access points and reconfigurable smart surfaces, the objective function of the system can be expressed as:
Figure QLYQS_3
Figure QLYQS_3
其中
Figure QLYQS_4
表示时隙t的网络能量效率,
Figure QLYQS_5
表示用户参数,联合可重构智能表面单元选择、协调离散相移控制和功率分配策略,将上述系统能源效率优化问题建模为去中心化部分可观察马尔可夫决策过程,将上述优化问题转换为去中心化部分可观察马尔可夫决策过程后,转换后的优化函数如下所示:
in
Figure QLYQS_4
represents the network energy efficiency at time slot t,
Figure QLYQS_5
Representing user parameters, jointly reconfigurable smart surface unit selection, coordinated discrete phase shift control and power allocation strategy, the above system energy efficiency optimization problem is modeled as a decentralized partially observable Markov decision process. After converting the above optimization problem into a decentralized partially observable Markov decision process, the converted optimization function is as follows:
Figure QLYQS_6
Figure QLYQS_6
其中
Figure QLYQS_7
表示控制能量效率和传输可靠性之间权衡的正系数,
Figure QLYQS_8
为一个非负参数,它对违反数据速率施加惩罚,
Figure QLYQS_9
表示数据速率限制,
Figure QLYQS_10
在每个时隙为一个固定值,
Figure QLYQS_11
表示在每个时隙的数据速率,
Figure QLYQS_12
表示天线数量,
Figure QLYQS_13
表示接入点与可重构智能表面协作服务的用户,
in
Figure QLYQS_7
represents a positive coefficient that controls the trade-off between energy efficiency and transmission reliability,
Figure QLYQS_8
is a non-negative parameter that imposes a penalty on data rate violations,
Figure QLYQS_9
Indicates the data rate limit,
Figure QLYQS_10
A fixed value in each time slot.
Figure QLYQS_11
represents the data rate in each time slot,
Figure QLYQS_12
Indicates the number of antennas,
Figure QLYQS_13
represents the user of the collaborative service between the access point and the reconfigurable smart surface,
其全局奖励函数可以表示为:Its global reward function can be expressed as:
Figure QLYQS_14
Figure QLYQS_14
;
步骤2.2:通过集成图嵌入和不同奖励两种技术实现更高效的合作学习;Step 2.2: Achieve more efficient collaborative learning by integrating graph embedding and different rewards; 智能体表示无线接入点以及可重构智能表面,智能体之间的相互作用表示无线通信环境及其通信方式,智能体及其之间的相互作用被建模为有向通信图
Figure QLYQS_15
,其中智能体被建模为节点I,智能体之间的相互作用被建模成有向边
Figure QLYQS_16
Figure QLYQS_17
表示节点的特征,
Figure QLYQS_18
表示边的特征,
The agents represent wireless access points and reconfigurable smart surfaces. The interactions between agents represent the wireless communication environment and its communication mode. The agents and their interactions are modeled as a directed communication graph.
Figure QLYQS_15
, where the agents are modeled as nodes I and the interactions between agents are modeled as directed edges
Figure QLYQS_16
,
Figure QLYQS_17
Represents the characteristics of the node,
Figure QLYQS_18
Represents the characteristics of the edge,
无线接入点i的节点特征包括接入点到其关联设备的空间信道信息、关联用户的队列信息以及接入点的本地动作观察历史:The node characteristics of wireless access point i include the spatial channel information from the access point to its associated devices, the queue information of the associated users, and the local action observation history of the access point:
Figure QLYQS_19
Figure QLYQS_19
边的特征描述了智能体
Figure QLYQS_20
到智能体
Figure QLYQS_21
之间的相互作用,在数学上可以表示为:
The characteristics of the edge describe the agent
Figure QLYQS_20
To Agent
Figure QLYQS_21
The interaction between them can be expressed mathematically as:
Figure QLYQS_22
Figure QLYQS_22
;
步骤2.3:在每一个分布式节点i处维护一个消息传递图神经网络,在每个消息传递图神经网络层当中,每个智能体首先将嵌入信息传输给其相邻的智能体,然后聚合来自相邻智能的嵌入信息并更新其本地隐藏状态;Step 2.3: Maintain a message passing graph neural network at each distributed node i. In each message passing graph neural network layer, each agent first transmits the embedding information to its neighboring agents, then aggregates the embedding information from the neighboring agents and updates its local hidden state. 消息传递过程如下式所示:The message passing process is shown below:
Figure QLYQS_23
Figure QLYQS_23
其中
Figure QLYQS_24
表示消息函数,
Figure QLYQS_25
表示更新操作,在图嵌入模块之后,智能体
Figure QLYQS_26
将使用门控循环单元根据输出的局部嵌入状态
Figure QLYQS_27
预测局部动作,其中门控循环单元是长短期记忆网络的简化变体,局部嵌入状态如下式所示:
in
Figure QLYQS_24
Represents a message function,
Figure QLYQS_25
represents the update operation. After the graph embedding module, the agent
Figure QLYQS_26
The gated recurrent unit will be used to embed the local state of the output
Figure QLYQS_27
Predict local actions, where the gated recurrent unit is a simplified variant of the long short-term memory network, and the local embedding state is as follows:
Figure QLYQS_28
Figure QLYQS_28
智能体
Figure QLYQS_29
所采用的局部动作
Figure QLYQS_30
是从动作生成子策略
Figure QLYQS_31
中采样得到的;
Agent
Figure QLYQS_29
Local actions used
Figure QLYQS_30
Is to generate sub-strategies from actions
Figure QLYQS_31
obtained by sampling;
步骤2.4:将分布式策略中的图嵌入模块和动作生成模块的组合参数表示为
Figure QLYQS_32
,我们的目标是最大化性能函数:
Step 2.4: Express the combined parameters of the graph embedding module and the action generation module in the distributed strategy as
Figure QLYQS_32
, our goal is to maximize the performance function:
Figure QLYQS_33
Figure QLYQS_33
其中
Figure QLYQS_34
是遵循联合策略
Figure QLYQS_35
的联合状态转换,基于优势函数计算策略梯度,其由下式给出:
in
Figure QLYQS_34
Follow a joint strategy
Figure QLYQS_35
The joint state transition of , calculates the policy gradient based on the advantage function, which is given by:
Figure QLYQS_36
Figure QLYQS_36
其中
Figure QLYQS_37
是图嵌入的实际输入,
Figure QLYQS_38
表示时间差优势,由下式给出:
in
Figure QLYQS_37
is the actual input of the graph embedding,
Figure QLYQS_38
represents the time difference advantage, which is given by:
Figure QLYQS_39
Figure QLYQS_39
其中
Figure QLYQS_40
表示全局状态值,
Figure QLYQS_41
表示全局状态-动作值,为了解决训练期间的信用分配问题,利用价值分解来训练分布式的网络,将全局状态值
Figure QLYQS_42
分解为与混合函数相结合的形式,如下式所示:
in
Figure QLYQS_40
Represents the global state value,
Figure QLYQS_41
Represents the global state-action value. In order to solve the credit allocation problem during training, value decomposition is used to train the distributed network.
Figure QLYQS_42
Decomposed into a form combined with a mixing function, as shown below:
Figure QLYQS_43
Figure QLYQS_43
其中
Figure QLYQS_44
表示智能体
Figure QLYQS_45
的局部状态值,在集中训练过程中,每个智能体通过基于局部图嵌入特征来评估其对全局奖励改进的贡献从而获得不同的奖励,以此进一步促进智能体之间的协调,将
Figure QLYQS_46
表示为分布式网络的权重参数,这些权重参数在智能体之间共享,使用
Figure QLYQS_47
表示混合网络
Figure QLYQS_48
的权重,通过小批量梯度下降优化分布式和混合网络,使得以下损失最小化:
in
Figure QLYQS_44
Representing an Agent
Figure QLYQS_45
In the centralized training process, each agent obtains different rewards by evaluating its contribution to the improvement of the global reward based on the local graph embedding features, which further promotes the coordination between agents.
Figure QLYQS_46
Represented as weight parameters of a distributed network, these weight parameters are shared among agents using
Figure QLYQS_47
Representing a hybrid network
Figure QLYQS_48
The weights of , and the distributed and hybrid networks are optimized by mini-batch gradient descent to minimize the following loss:
Figure QLYQS_49
Figure QLYQS_49
其中
Figure QLYQS_50
是由最后一个状态引导的
Figure QLYQS_51
步返回,
Figure QLYQS_52
的上限为T,混合网络的参数可以由下式更新:
in
Figure QLYQS_50
It is guided by the last state
Figure QLYQS_51
Step back,
Figure QLYQS_52
The upper limit of is T, and the parameters of the hybrid network can be updated as follows:
Figure QLYQS_53
Figure QLYQS_53
其中
Figure QLYQS_54
是混合网络更新的学习率,进一步在分布式网络中共享非输出层的权值参数,表示分布式网络的组合权重参数为
Figure QLYQS_55
,关于
Figure QLYQS_56
的梯度可以计算为:
in
Figure QLYQS_54
is the learning rate of the hybrid network update, and further shares the weight parameters of the non-output layer in the distributed network, indicating that the combined weight parameters of the distributed network are
Figure QLYQS_55
,about
Figure QLYQS_56
The gradient of can be calculated as:
Figure QLYQS_57
Figure QLYQS_57
分布式网络的更新规则可以推导为:The update rule of the distributed network can be derived as follows:
Figure QLYQS_58
Figure QLYQS_58
其中,
Figure QLYQS_59
Figure QLYQS_60
分别表示策略改进学习率和critic学习学习率;
in,
Figure QLYQS_59
and
Figure QLYQS_60
Represent the strategy improvement learning rate and critic learning rate respectively;
所述步骤3具体如下:The step 3 is specifically as follows: 步骤3.1:将实际观测得到的电力物联网数据作为智能体观测状态以及环境信息输入到基于图嵌入的网络更新算法当中,初始化网络参数,初始化网络学习率
Figure QLYQS_61
Step 3.1: Input the actual observed power IoT data as the agent observation state and environmental information into the network update algorithm based on graph embedding, initialize the network parameters, and initialize the network learning rate
Figure QLYQS_61
,
步骤3.2:从经验池中抽取一个批次的数据B,根据步骤2.4中所推导的公式计算策略梯度
Figure QLYQS_62
以及网络损失
Figure QLYQS_63
,基于步骤2.4中的混合网络参数更新公式更新混合网络参数,
Step 3.2: Extract a batch of data B from the experience pool and calculate the policy gradient according to the formula derived in step 2.4
Figure QLYQS_62
and network loss
Figure QLYQS_63
, update the hybrid network parameters based on the hybrid network parameter update formula in step 2.4,
步骤3.3:进一步根据步骤2.4中所述分布式网络参数更新算法更新电力物联网络中的网络参数,直至网络收敛,Step 3.3: Further update the network parameters in the power Internet of Things network according to the distributed network parameter update algorithm described in step 2.4 until the network converges. 步骤3.4:训练好的网络参数定期更新,或在电力物联网络发生较大变化时重新训练并更新网络参数,以此满足电路物联网络中设备的接入需求,实现定制化通信。Step 3.4: The trained network parameters are updated regularly, or the network parameters are retrained and updated when there are major changes in the power Internet of Things network, so as to meet the access requirements of devices in the circuit Internet of Things network and realize customized communication.
CN202211429707.5A 2022-11-16 2022-11-16 A method for cooperation between a wireless access point and a reconfigurable smart surface Active CN115499849B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211429707.5A CN115499849B (en) 2022-11-16 2022-11-16 A method for cooperation between a wireless access point and a reconfigurable smart surface

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211429707.5A CN115499849B (en) 2022-11-16 2022-11-16 A method for cooperation between a wireless access point and a reconfigurable smart surface

Publications (2)

Publication Number Publication Date
CN115499849A CN115499849A (en) 2022-12-20
CN115499849B true CN115499849B (en) 2023-04-07

Family

ID=85115737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211429707.5A Active CN115499849B (en) 2022-11-16 2022-11-16 A method for cooperation between a wireless access point and a reconfigurable smart surface

Country Status (1)

Country Link
CN (1) CN115499849B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 A UAV network hovering position optimization method based on multi-agent deep reinforcement learning
CN113472419A (en) * 2021-06-23 2021-10-01 西北工业大学 Safe transmission method and system based on space-based reconfigurable intelligent surface
CN115103372A (en) * 2022-06-17 2022-09-23 东南大学 A user scheduling method for multi-user MIMO systems based on deep reinforcement learning
CN115310775A (en) * 2022-07-13 2022-11-08 武汉大学 Multi-agent reinforcement learning rolling scheduling method, device, equipment and storage medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210192358A1 (en) * 2018-05-18 2021-06-24 Deepmind Technologies Limited Graph neural network systems for behavior prediction and reinforcement learning in multple agent environments
CN111612126B (en) * 2020-04-18 2024-06-21 华为技术有限公司 Method and apparatus for reinforcement learning
US11546022B2 (en) * 2020-04-29 2023-01-03 The Regents Of The University Of California Virtual MIMO with smart surfaces
US12136989B2 (en) * 2021-02-01 2024-11-05 Ntt Docomo, Inc. Method and apparatus for user localization and tracking using radio signals reflected by reconfigurable smart surfaces
CN113573293B (en) * 2021-07-14 2022-10-04 南通大学 An Intelligent Emergency Communication System Based on RIS
CN114422056B (en) * 2021-12-03 2023-05-23 北京航空航天大学 Space-to-ground non-orthogonal multiple access uplink transmission method based on intelligent reflecting surface
CN114286369B (en) * 2021-12-28 2024-02-27 杭州电子科技大学 AP and RIS joint selection method of RIS auxiliary communication system
CN114466388B (en) * 2022-02-16 2023-08-08 北京航空航天大学 A smart metasurface-assisted wireless energy-carrying communication method
CN115333143B (en) * 2022-07-08 2024-05-07 国网黑龙江省电力有限公司大庆供电公司 Deep learning multi-agent microgrid collaborative control method based on dual neural network
CN115146538A (en) * 2022-07-11 2022-10-04 河海大学 Power system state estimation method based on message passing graph neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 A UAV network hovering position optimization method based on multi-agent deep reinforcement learning
CN113472419A (en) * 2021-06-23 2021-10-01 西北工业大学 Safe transmission method and system based on space-based reconfigurable intelligent surface
CN115103372A (en) * 2022-06-17 2022-09-23 东南大学 A user scheduling method for multi-user MIMO systems based on deep reinforcement learning
CN115310775A (en) * 2022-07-13 2022-11-08 武汉大学 Multi-agent reinforcement learning rolling scheduling method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115499849A (en) 2022-12-20

Similar Documents

Publication Publication Date Title
Gacanin et al. Wireless 2.0: Toward an intelligent radio environment empowered by reconfigurable meta-surfaces and artificial intelligence
Shi et al. Machine learning for large-scale optimization in 6g wireless networks
Liu et al. DeepNap: Data-driven base station sleeping operations through deep reinforcement learning
She et al. Deep learning for ultra-reliable and low-latency communications in 6G networks
Zhang et al. Satellite edge computing with collaborative computation offloading: An intelligent deep deterministic policy gradient approach
Wang et al. AI-based cloud-edge-device collaboration in 6G space-air-ground integrated power IoT
Guim et al. Autonomous lifecycle management for resource-efficient workload orchestration for green edge computing
Chen et al. Edge intelligent networking optimization for internet of things in smart city
CN114626306B (en) Method and system for guaranteeing freshness of regulation and control information of park distributed energy
CN113946423B (en) Multi-task edge computing, scheduling and optimizing method based on graph attention network
Shen et al. EdgeMatrix: A resource-redefined scheduling framework for SLA-guaranteed multi-tier edge-cloud computing systems
Liu et al. Neural network-based event-triggered fault detection for nonlinear Markov jump system with frequency specifications
CN115033359A (en) Internet of things agent multi-task scheduling method and system based on time delay control
Su et al. Joint DNN partition and resource allocation optimization for energy-constrained hierarchical edge-cloud systems
Garg et al. SDN-NFV-aided edge-cloud interplay for 5G-envisioned energy Internet ecosystem
Xu et al. Living with artificial intelligence: A paradigm shift toward future network traffic control
Mishra et al. Enabling cyber‐physical demand response in smart grids via conjoint communication and controller design
Qadeer et al. Deep-deterministic policy gradient based multi-resource allocation in edge-cloud system: A distributed approach
Si et al. When spectrum sharing in cognitive networks meets deep reinforcement learning: Architecture, fundamentals, and challenges
CN117195728A (en) A complex mobile task deployment method based on graph-to-sequence reinforcement learning
CN115499849B (en) A method for cooperation between a wireless access point and a reconfigurable smart surface
CN117938669B (en) A network function chain adaptive orchestration method for 6G inclusive intelligent services
CN114980160A (en) Unmanned aerial vehicle-assisted terahertz communication network joint optimization method and device
CN118510054A (en) A digital twin migration method and terminal in a vehicle edge computing network
Zhou et al. An overview of machine learning-enabled optimization for reconfigurable intelligent surfaces-aided 6g networks: From reinforcement learning to large language models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant