CN115499849B

CN115499849B - A method for cooperation between a wireless access point and a reconfigurable smart surface

Info

Publication number: CN115499849B
Application number: CN202211429707.5A
Authority: CN
Inventors: 罗弦; 廖荣涛; 杨荣浩; 李想; 姚渭箐; 董亮; 刘芬; 张岱; 郭岳; 王逸兮; 李磊; 孟浩华; 王敬靖; 胡欢君; 龙霏; 袁翔宇; 王博涛
Original assignee: Information and Telecommunication Branch of State Grid Hubei Electric Power Co Ltd
Current assignee: Information and Telecommunication Branch of State Grid Hubei Electric Power Co Ltd
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2023-04-07
Anticipated expiration: 2042-11-16
Also published as: CN115499849A

Abstract

This application relates to a wireless access point and a reconfigurable smart surface collaboration method, including the following steps: building a device communication architecture based on the power Internet of Things network; designing a corresponding device communication architecture based on the power Internet of Things network built above. The access point and the intelligent reconfigurable surface collaboration method aim at maximizing the energy efficiency of the system, and realize the quality of service requirements for the transmission data rate and reliability of massive devices under the power Internet of Things network; each access point and the reconfigurable surface The reconstructed smart surface cooperates according to the trained model to meet the access requirements of massive devices under the power Internet of Things network. This application models the giant wireless communication network as a graph, and uses the method of graph embedding to reduce its dimension to obtain an efficient graph representation, which can effectively reduce the complexity of model training and realize highly customized communication.

Description

A collaboration method between wireless access point and reconfigurable intelligent surface

技术领域Technical Field

本申请属于电力物联网技术领域，尤其涉及一种无线接入点与可重构智能表面协作方法。The present application belongs to the technical field of electric power Internet of Things, and in particular, relates to a method for collaboration between a wireless access point and a reconfigurable smart surface.

背景技术Background Art

近年来，随着电力物联网的快速发展，电力物联网的网络边缘部署了海量设备。由于电力网络系统复杂且庞大，仅仅依靠人力管控存在着管理难度大、成本高等问题，因此，需要引入新的信息通信技术，以提升电力系统的运行性能和管控效率。为了实现电力物联网络的智能管控，需要实时的感知测量电力网络的调配情况与性能表现。因此，电力物联网络需要满足网络边缘海联设备接入及海量数据传输，以此保证电力物联网络的高效可靠运作。随着信息通信技术的不断发展，新一代的移动通信技术能够在大量电力设备接入电力网络时提供高速、稳定的服务，但由于网络边缘设备的异构性，目前无法实现高度定制化和智能通信，即动态的配置网络资源，以支持超密集连接。In recent years, with the rapid development of the power Internet of Things, a large number of devices have been deployed at the network edge of the power Internet of Things. Since the power network system is complex and large, relying solely on human control will result in problems such as difficulty in management and high cost. Therefore, new information and communication technologies need to be introduced to improve the operating performance and control efficiency of the power system. In order to realize the intelligent control of the power Internet of Things network, it is necessary to perceive and measure the deployment and performance of the power network in real time. Therefore, the power Internet of Things network needs to meet the access of network edge sea-connected devices and massive data transmission to ensure the efficient and reliable operation of the power Internet of Things network. With the continuous development of information and communication technology, the new generation of mobile communication technology can provide high-speed and stable services when a large number of power devices are connected to the power network. However, due to the heterogeneity of network edge devices, it is currently impossible to achieve high customization and intelligent communication, that is, dynamically configure network resources to support ultra-dense connections.

可重构智能表面是一种全新的革命性技术，它可以通过在平面上集成大量低成本的无源反射元件，智能地重新配置无线传播环境，从而显著提高无线通信网络地性能。可重构智能表面为高度定制化提供了可能，它可以通过高度可控和智能信号反射来重新配置无线传播环境，这为进一步提高无线链路的性能提供了新的自由度，为实现智能可编程无线环境铺平了道路。借助可重构智能表面技术，通过无线接入点与其协同灵活配置混合空间波束，按需增强数据，灵活的进行干扰抑制以及高效的混合空域和功率域复用可以有效的高度的定制化通信及智能通信。因此，在异构电网且具有海量设备的电力物联网场景下，一种有效的无线接入点与可重构智能表面协作技术亟需被设计，以此实现高度定制化通信及智能通信。Reconfigurable smart surface is a new revolutionary technology that can intelligently reconfigure the wireless propagation environment by integrating a large number of low-cost passive reflective elements on a plane, thereby significantly improving the performance of wireless communication networks. Reconfigurable smart surface makes it possible to be highly customized. It can reconfigure the wireless propagation environment through highly controllable and intelligent signal reflection, which provides new degrees of freedom to further improve the performance of wireless links and paves the way for the realization of intelligent programmable wireless environments. With the help of reconfigurable smart surface technology, wireless access points can flexibly configure hybrid spatial beams, enhance data on demand, flexibly suppress interference, and efficiently reuse hybrid air and power domains to effectively achieve highly customized and intelligent communications. Therefore, in the scenario of power Internet of Things with heterogeneous power grids and massive devices, an effective wireless access point and reconfigurable smart surface collaboration technology needs to be designed to achieve highly customized and intelligent communications.

发明内容Summary of the invention

本申请实施例的目的在于提供一种无线接入点与可重构智能表面协作方法，将无线通信网络建模为图表示，并且使用图嵌入的方法获得该网络的嵌入表示，通过图嵌入的方法可以有效获得图的低维表示，并且降低模型训练复杂度，实现了高度定制化通信。The purpose of the embodiments of the present application is to provide a method for collaboration between a wireless access point and a reconfigurable intelligent surface, modeling a wireless communication network as a graph representation, and using a graph embedding method to obtain an embedded representation of the network. The graph embedding method can effectively obtain a low-dimensional representation of the graph, reduce the complexity of model training, and achieve highly customized communication.

为实现上述目的，本申请提供如下技术方案：To achieve the above objectives, this application provides the following technical solutions:

本申请实施例提供一种无线接入点与可重构智能表面协作方法，其特征在于，包括以下步骤：The embodiment of the present application provides a method for cooperation between a wireless access point and a reconfigurable smart surface, characterized by comprising the following steps:

步骤1：搭建基于电力物联网络的设备通信架构，所述网络架构包括：M个预安装的接入点以及J个可重构智能表面，其中每个接入点通过与相邻接入点以及可重构智能表面协作关系建模为智能体之间的相互作用，即图神经网络输入中的边，构建消息传递图神经网络的输入拓扑，利用消息传递图神经网络获得拓扑的嵌入表示，以实现为电力物联网终端提供服务；Step 1: Build a device communication architecture based on the power Internet of Things network, which includes: M pre-installed access points and J reconfigurable smart surfaces, where each access point is modeled as an interaction between intelligent agents through the collaborative relationship with adjacent access points and reconfigurable smart surfaces, that is, the edge in the graph neural network input, and construct the input topology of the message passing graph neural network. The message passing graph neural network is used to obtain the embedded representation of the topology to provide services for the power Internet of Things terminals;

步骤2：根据上述所搭建的基于电力物联网络的设备通信架构，设计相应的接入点与可重构智能表面协作方法，以最大化系统能源效率为目标，实现电力物联网络下的海量设备对于传输数据速率和可靠性方面的服务质量需求；Step 2: Based on the above-mentioned device communication architecture based on the power Internet of Things network, design the corresponding access point and reconfigurable smart surface collaboration method to maximize the system energy efficiency and meet the service quality requirements of massive devices in the power Internet of Things network in terms of transmission data rate and reliability;

步骤3：基于步骤2所提出的接入点与可重构智能表面的协作方法，各接入点与可重构智能表面根据训练完成的模型进行协作，以满足电力物联网络下海量设备的接入需求。Step 3: Based on the collaboration method between access points and reconfigurable smart surfaces proposed in step 2, each access point collaborates with the reconfigurable smart surface according to the trained model to meet the access needs of massive devices in the power Internet of Things network.

所述步骤1具体如下：The step 1 is specifically as follows:

步骤1：在电力物联网络的设备通信架构中，将网络中预装的接入点表示为

，将网络中的可重构智能表面表述为

，将M个无线接入点以及J个可重构智能表面表述为不同的智能体节点，将无线接入点和可重构智能表面表述为图神经网络输入中的节点，将电力物联设备接入信息、多个无线接入点与多个可重构智能表面之间的混合空间波束配置视作图拓扑中的特征，输入到消息传递图神经网络，通过消息传递图神经网络的消息传递机制获得稳定的节点特征图嵌入表示。Step 1: In the device communication architecture of the power IoT network, the pre-installed access points in the network are represented as

, the reconfigurable smart surface in the network is expressed as

, M wireless access points and J reconfigurable smart surfaces are represented as different intelligent agent nodes, the wireless access points and reconfigurable smart surfaces are represented as nodes in the graph neural network input, the access information of power Internet of Things devices and the hybrid spatial beam configuration between multiple wireless access points and multiple reconfigurable smart surfaces are regarded as features in the graph topology and input into the message passing graph neural network. A stable node feature graph embedding representation is obtained through the message passing mechanism of the message passing graph neural network.

所述步骤2具体如下：The step 2 is specifically as follows:

步骤2.1：为了实现动态地最大化无线接入点和可重构智能表面协作的系统能效，系统的目标函数可以表示为：Step 2.1: In order to dynamically maximize the system energy efficiency of the collaboration between wireless access points and reconfigurable smart surfaces, the objective function of the system can be expressed as:

其中

表示时隙t的网络能量效率，

表示用户参数,联合可重构智能表面单元选择、协调离散相移控制和功率分配策略，将上述长期能源效率优化问题建模为去中心化部分可观察马尔可夫决策过程。将上述优化问题转换为去中心化部分可观察马尔可夫决策过程后，转换后的优化函数如下所示：in

represents the network energy efficiency at time slot t,

Representing user parameters, jointly reconfigurable smart surface unit selection, coordinated discrete phase shift control and power allocation strategy, the above long-term energy efficiency optimization problem is modeled as a decentralized partially observable Markov decision process. After converting the above optimization problem into a decentralized partially observable Markov decision process, the converted optimization function is as follows:

其中

表示控制能量效率和传输可靠性之间权衡的正系数，

为一个非负参数，它对违反数据速率施加惩罚，

表示数据速率限制，

在每个时隙为一个固定值，

表示在每个时隙的数据速率，

表示天线数量，

表示接入点与可重构智能表面协作服务的用户。in

represents a positive coefficient that controls the trade-off between energy efficiency and transmission reliability,

is a non-negative parameter that imposes a penalty on data rate violations,

Indicates the data rate limit,

A fixed value in each time slot.

represents the data rate in each time slot,

Indicates the number of antennas,

Represents the user of the collaborative service between the access point and the reconfigurable smart surface.

其全局奖励函数可以表示为：Its global reward function can be expressed as:

步骤2.2：通过集成图嵌入和不同奖励两种技术实现更高效的合作学习，智能体表示无线接入点以及可重构智能表面，智能体之间的相互作用表示无线通信环境及其通信方式，智能体及其之间的相互作用被建模为有向通信图

，其中智能体被建模为节点I，智能体之间的相互作用被建模成有向边

，

表示节点的特征，

表示边的特征，Step 2.2: More efficient cooperative learning is achieved by integrating graph embedding and different rewards. The agents represent wireless access points and reconfigurable smart surfaces. The interactions between agents represent the wireless communication environment and their communication methods. The agents and their interactions are modeled as directed communication graphs.

, where the agents are modeled as nodes I and the interactions between agents are modeled as directed edges

,

Represents the characteristics of the node,

Represents the characteristics of the edge,

无线接入点i的节点特征包括接入点到其关联设备的空间信道信息、关联用户的队列信息以及接入点的本地动作观察历史：The node characteristics of wireless access point i include the spatial channel information from the access point to its associated devices, the queue information of the associated users, and the local action observation history of the access point:

边的特征描述了智能体

到智能体

之间的相互作用，在数学上可以表示为：The characteristics of the edge describe the agent

To Agent

The interaction between them can be expressed mathematically as:

步骤2.3：由于在大规模网络中图节点及边具有高维特征，提出了一种基于图嵌入的动作生成模块，在每一个分布式节点

处维护一个消息传递图神经网络。与多层感知机类似，消息传递图神经网络采用分层结构，在每个消息传递图神经网络层当中，每个智能体首先将嵌入信息传输给其相邻的智能体，然后聚合来自相邻智能的嵌入信息并更新其本地隐藏状态，消息传递过程如下式所示：Step 2.3: Since graph nodes and edges have high-dimensional characteristics in large-scale networks, a graph embedding-based action generation module is proposed.

A message passing graph neural network is maintained at each layer. Similar to the multi-layer perceptron, the message passing graph neural network adopts a hierarchical structure. In each message passing graph neural network layer, each agent first transmits the embedded information to its neighboring agents, and then aggregates the embedded information from the neighboring agents and updates its local hidden state. The message passing process is shown in the following formula:

其中

表示消息函数，

表示更新操作，在图嵌入模块之后，智能体

将使用门控循环单元根据输出的局部嵌入状态

预测局部动作，其中门控循环单元是长短期记忆网络的简化变体，局部嵌入状态如下式所示：in

Represents a message function,

represents the update operation. After the graph embedding module, the agent

The gated recurrent unit will be used to embed the local state of the output

Predict local actions, where the gated recurrent unit is a simplified variant of the long short-term memory network, and the local embedding state is as follows:

智能体

所采用的局部动作

是从动作生成子策略

中采样得到的，Agent

Local actions used

Is to generate sub-strategies from actions

The sample obtained from

步骤2.4：将分布式策略中的图嵌入模块和动作生成模块的组合参数表示为

，我们的目标是最大化性能函数：Step 2.4: Express the combined parameters of the graph embedding module and the action generation module in the distributed strategy as

, our goal is to maximize the performance function:

其中

是遵循联合策略

的联合状态转换，基于优势函数计算策略梯度，其由下式给出：in

Follow a joint strategy

The joint state transition of , calculates the policy gradient based on the advantage function, which is given by:

其中

是图嵌入的实际输入，

表示时间差优势，由下式给出：in

is the actual input of the graph embedding,

represents the time difference advantage, which is given by:

其中

表示全局状态值，

表示全局状态-动作值，为了解决训练期间的信用分配问题，利用价值分解来训练分布式的网络，将全局状态值

分解为与混合函数相结合的形式，如下式所示：in

Represents the global state value,

Represents the global state-action value. In order to solve the credit allocation problem during training, value decomposition is used to train the distributed network.

Decomposed into a form combined with a mixing function, as shown below:

其中

表示智能体

的局部状态值，在集中训练过程中，每个智能体通过基于局部图嵌入特征来评估其对全局奖励改进的贡献从而获得不同的奖励，以此进一步促进智能体之间的协调，将

表示为分布式网络的权重参数，这些权重参数在智能体之间共享，使用

表示混合网络

的权重，通过小批量梯度下降优化分布式和混合网络，使得以下损失最小化：in

Representing an Agent

In the centralized training process, each agent obtains different rewards by evaluating its contribution to the improvement of the global reward based on the local graph embedding features, which further promotes the coordination between agents.

Represented as weight parameters of a distributed network, these weight parameters are shared among agents using

Representing a hybrid network

The weights of , and the distributed and hybrid networks are optimized by mini-batch gradient descent to minimize the following loss:

其中

是由最后一个状态引导的n步返回，n的上限为T，混合网络的参数可以由下式更新：in

It is the n-step return guided by the last state, the upper limit of n is T, and the parameters of the hybrid network can be updated as follows:

其中

是混合网络更新的学习率，进一步在分布式网络中共享非输出层的权值参数，表示分布式网络的组合权重参数为

，关于

的梯度可以计算为：in

is the learning rate of the hybrid network update, and further shares the weight parameters of the non-output layer in the distributed network, indicating that the combined weight parameters of the distributed network are

,about

The gradient of can be calculated as:

分布式网络的更新规则可以推导为：The update rule of the distributed network can be derived as follows:

其中，

和

分别表示策略改进学习率和critic学习学习率。in,

and

They represent the strategy improvement learning rate and the critic learning rate respectively.

所述步骤3具体如下：The step 3 is specifically as follows:

步骤3.1：将实际观测得到的电力物联网数据作为智能体观测状态以及环境信息输入到基于图嵌入的网络更新算法当中，初始化网络参数，初始化网络学习率

，Step 3.1: Input the actual observed power IoT data as the agent observation state and environmental information into the network update algorithm based on graph embedding, initialize the network parameters, and initialize the network learning rate

,

步骤3.2：从经验池中抽取一个批次的数据

，根据步骤2.4中所推导的公式计算策略梯度

以及网络损失

，基于步骤2.4中的混合网络参数更新公式更新混合网络参数，Step 3.2: Extract a batch of data from the experience pool

, calculate the policy gradient according to the formula derived in step 2.4

and network loss

, update the hybrid network parameters based on the hybrid network parameter update formula in step 2.4,

步骤3.3：进一步根据步骤2.4中所述分布式网络参数更新算法更新电力物联网络中的网络参数，直至网络收敛，Step 3.3: Further update the network parameters in the power Internet of Things network according to the distributed network parameter update algorithm described in step 2.4 until the network converges.

步骤3.4：训练好的网络参数定期更新，或在电力物联网络发生较大变化时重新训练并更新网络参数，以此满足电路物联网络中设备的接入需求，实现定制化通信。Step 3.4: The trained network parameters are updated regularly, or the network parameters are retrained and updated when there are major changes in the power Internet of Things network, so as to meet the access requirements of devices in the circuit Internet of Things network and realize customized communication.

与现有技术相比，本申请的有益效果是：本申请针对电力物联网络需求，提出了一种无线接入点与可重构智能表面协作框架，以此满足海量设备接入需求。本申请通过实现无线接入点和可重构智能表面之间的协作，动态的最大化系统能效，以此实现高效通信。除此之外，本申请提出了一种基于图嵌入的无线网络表示方法，将巨大的无线通信网络建模成图，并且使用图嵌入的方法对其降维以获得高效的图表示。本申请提出的方法可以有效的降低模型训练复杂度，实现了高度定制化的通信。Compared with the prior art, the beneficial effects of the present application are as follows: the present application proposes a collaborative framework of wireless access points and reconfigurable smart surfaces to meet the access needs of massive devices in response to the needs of power Internet of Things networks. The present application dynamically maximizes the energy efficiency of the system by realizing the collaboration between wireless access points and reconfigurable smart surfaces, thereby achieving efficient communication. In addition, the present application proposes a wireless network representation method based on graph embedding, which models a huge wireless communication network into a graph, and uses a graph embedding method to reduce its dimension to obtain an efficient graph representation. The method proposed in the present application can effectively reduce the complexity of model training and realize highly customized communication.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例的技术方案，下面将对本申请实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本申请的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for use in the embodiments of the present application will be briefly introduced below. It should be understood that the following drawings only show certain embodiments of the present application and therefore should not be regarded as limiting the scope. For ordinary technicians in this field, other related drawings can be obtained based on these drawings without paying creative work.

图1为本申请实施例的方法流程图。FIG1 is a flow chart of a method according to an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行描述。应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步定义和解释。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. It should be noted that similar reference numerals and letters represent similar items in the following drawings, so once an item is defined in one drawing, it does not need to be further defined and explained in the subsequent drawings.

术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。The terms "comprises," "comprising," or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also other elements not explicitly listed, or elements inherent to such process, method, article, or apparatus. In the absence of further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article, or apparatus that includes the element.

请参见图1，本申请提供了一种无线接入点与可重构智能表面协作方法，包括以下步骤。Please refer to Figure 1. The present application provides a method for collaboration between a wireless access point and a reconfigurable smart surface, including the following steps.

作为优选，所述步骤1具体如下：Preferably, the step 1 is as follows:

，将网络中的可重构智能表面表述为

，将M个无线接入点以及J个可重构智能表面表述为不同的智能体节点，将M个无线接入点以及J个可重构智能表面表述为不同的智能体节点，将无线接入点和可重构智能表面表述为图神经网络输入中的节点，将电力物联设备接入信息、多个无线接入点与多个可重构智能表面之间的混合空间波束配置视作图拓扑中的特征，输入到消息传递图神经网络，通过消息传递图神经网络的消息传递机制获得稳定的节点特征图嵌入表示。Step 1: In the device communication architecture of the power IoT network, the pre-installed access points in the network are represented as

, the reconfigurable smart surface in the network is expressed as

, M wireless access points and J reconfigurable smart surfaces are represented as different intelligent agent nodes, M wireless access points and J reconfigurable smart surfaces are represented as different intelligent agent nodes, wireless access points and reconfigurable smart surfaces are represented as nodes in the input of the graph neural network, the access information of the power Internet of Things devices and the hybrid spatial beam configuration between multiple wireless access points and multiple reconfigurable smart surfaces are regarded as features in the graph topology and input into the message passing graph neural network, and a stable node feature graph embedding representation is obtained through the message passing mechanism of the message passing graph neural network.

作为优选，所述步骤2具体如下：Preferably, the step 2 is as follows:

步骤2.1：由于电力物联网的网络边缘具有海量设备，而实现高性能的海量设备接入框架需要被精心设计，我们可以通过设计接入点与可重构智能表面之间的协作，灵活协调地重构混合波束，使得设备被协调接入通信网络当中，实现可定制的智能通信。因此，为了实现动态地最大化无线接入点和可重构智能表面协作的系统能效，系统的目标函数可以表示为：Step 2.1: Since there are a large number of devices at the edge of the power Internet of Things network, and the high-performance access framework for a large number of devices needs to be carefully designed, we can design the collaboration between the access point and the reconfigurable smart surface, flexibly and coordinately reconstruct the hybrid beam, so that the devices are coordinated to access the communication network and realize customizable smart communication. Therefore, in order to dynamically maximize the system energy efficiency of the collaboration between the wireless access point and the reconfigurable smart surface, the objective function of the system can be expressed as:

其中

表示时隙t的网络能量效率。此目标函数可以建模为受约束的马尔可夫决策过程，然而，由于大规模联合状态-动作空间以及多个无线接入点和可重构智能表面到集中控制器的高维信息交换开销很大，以集中的方式求解上述问题在计算上效率低下。为了以高效且低复杂的方式处理上述问题并且保证多样化用户性能的同时最大化网络能量效率，我们可以联合可重构智能表面单元选择、协调离散相移控制和功率分配策略，将上述长期能源效率优化问题建模为去中心化部分可观察马尔可夫决策过程。具体来说部分可观察马尔可夫决策过程提供了一个通用的框架来描述具有不完整信息的马尔可夫决策过程，而去中心化部分可观察马尔可夫决策过程将其扩展到分散的位置。in

Denotes the network energy efficiency at time slot t. This objective function can be modeled as a constrained Markov decision process. However, solving the above problem in a centralized manner is computationally inefficient due to the large-scale joint state-action space and the high-dimensional information exchange overhead from multiple wireless access points and reconfigurable smart surfaces to the centralized controller. In order to handle the above problem in an efficient and low-complexity manner and maximize the network energy efficiency while ensuring the performance of diverse users, we can combine reconfigurable smart surface unit selection, coordinated discrete phase shift control and power allocation strategy to model the above long-term energy efficiency optimization problem as a decentralized partially observable Markov decision process. Specifically, the partially observable Markov decision process provides a general framework to describe Markov decision processes with incomplete information, while the decentralized partially observable Markov decision process extends it to decentralized locations.

基于Lyapunov优化理论，我们可以将上述优化问题转换为去中心化部分可观察马尔可夫决策过程，转换后的优化函数如下所示：Based on Lyapunov optimization theory, we can transform the above optimization problem into a decentralized partially observable Markov decision process. The transformed optimization function is as follows:

其中

表示控制能量效率和传输可靠性之间权衡的正系数，

为一个非负参数，它对违反数据速率施加惩罚，

表示数据速率限制，

在每个时隙为一个固定值，

表示在每个时隙的数据速率，

表示天线数量，

表示接入点与可重构智能表面协作服务的用户。in

is a non-negative parameter that imposes a penalty on data rate violations,

Indicates the data rate limit,

A fixed value in each time slot.

represents the data rate in each time slot,

Indicates the number of antennas,

步骤2.2：步骤2.1所述的优化问题可以使用传统的多智能体强化学习的方法去求解，但是由于需要在相邻智能体之间交换信息以实现协作，传统的多智能体强化学习的方法在处理高维信息时会导致高通信开销和延迟，因此现有的多智能体强化学习方法在解决高度耦合的去中心化部分可观察马尔可夫决策过程问题上效率低下。我们扩展了现有的多智能体强化学习算法中常用的中心化训练去中心化执行，通过集成图嵌入和不同奖励两种技术实现更高效的合作学习。智能体表示无线接入点以及可重构智能表面。智能体之间的相互作用表示无线通信环境及其通信方式。智能体及其之间的相互作用被建模为有向通信图

。其中智能体被建模为节点I，智能体之间的相互作用被建模成有向边

，

表示节点的特征，

表示边的特征。Step 2.2: The optimization problem described in step 2.1 can be solved using traditional multi-agent reinforcement learning methods. However, due to the need to exchange information between adjacent agents to achieve collaboration, traditional multi-agent reinforcement learning methods will result in high communication overhead and delay when processing high-dimensional information. Therefore, existing multi-agent reinforcement learning methods are inefficient in solving highly coupled decentralized partially observable Markov decision process problems. We have expanded the centralized training and decentralized execution commonly used in existing multi-agent reinforcement learning algorithms to achieve more efficient cooperative learning by integrating graph embedding and different reward techniques. Agents represent wireless access points and reconfigurable smart surfaces. The interactions between agents represent the wireless communication environment and its communication methods. Agents and their interactions are modeled as directed communication graphs

The agents are modeled as nodes I, and the interactions between agents are modeled as directed edges

,

Represents the characteristics of the node,

Represents the characteristics of an edge.

边的特征描述了智能体

到智能体

To Agent

The interaction between them can be expressed mathematically as:

步骤2.3：由于在大规模网络中图节点及边具有高维特征，因此我们提出了一种基于图嵌入的动作生成模块。该模块利用消息传递图神经网络学习有向图的低维嵌入特征，能够有效提高网络的泛化能力并且增强无线接入点和可重构智能表面之间的协作能力，同时只需要较低的信息交换开销。Step 2.3: Since graph nodes and edges have high-dimensional features in large-scale networks, we propose an action generation module based on graph embedding. This module uses message passing graph neural networks to learn low-dimensional embedding features of directed graphs, which can effectively improve the generalization ability of the network and enhance the collaboration between wireless access points and reconfigurable smart surfaces, while requiring only low information exchange overhead.

我们在每一个分布式节点

处维护一个消息传递图神经网络。与多层感知机类似，消息传递图神经网络采用分层结构。在每个消息传递图神经网络层当中，每个智能体首先将嵌入信息传输给其相邻的智能体，然后聚合来自相邻智能的嵌入信息并更新其本地隐藏状态，消息传递过程如下式所示：We have distributed nodes

A message passing graph neural network is maintained at each layer. Similar to the multi-layer perceptron, the message passing graph neural network adopts a hierarchical structure. In each message passing graph neural network layer, each agent first transmits the embedded information to its neighboring agents, then aggregates the embedded information from the neighboring agents and updates its local hidden state. The message passing process is shown in the following formula:

其中

表示消息函数，

表示更新操作。在图嵌入模块之后，智能体

将使用门控循环单元根据输出的局部嵌入状态

Represents a message function,

represents the update operation. After the graph embedding module, the agent

The gated recurrent unit will be used to embed the local state of the output

智能体

所采用的局部动作

是从动作生成子策略

中采样得到的。Agent

Local actions used

Is to generate sub-strategies from actions

obtained by sampling in .

, our goal is to maximize the performance function:

其中

是遵循联合策略

的联合状态转换。因此，我们基于优势函数计算策略梯度，其由下式给出：in

Follow a joint strategy

Therefore, we calculate the policy gradient based on the advantage function, which is given by:

其中

是图嵌入的实际输入，

表示时间差优势，由下式给出：in

is the actual input of the graph embedding,

represents the time difference advantage, which is given by:

其中

表示全局状态值，

表示全局状态-动作值。为了解决训练期间的信用分配问题，我们利用价值分解来训练分布式的网络，将全局状态值

分解为与混合函数相结合的形式，如下式所示：in

Represents the global state value,

Represents the global state-action value. To solve the credit assignment problem during training, we use value decomposition to train a distributed network and assign the global state value

Decomposed into a form combined with a mixing function, as shown below:

其中

表示智能体

的局部状态值。在集中训练过程中，每个智能体通过基于局部图嵌入特征来评估其对全局奖励改进的贡献从而获得不同的奖励，以此进一步促进智能体之间的协调。将

表示混合网络

的权重。通过小批量梯度下降优化分布式和混合网络，使得以下损失最小化：in

Representing an Agent

During the centralized training process, each agent receives different rewards by evaluating its contribution to the improvement of the global reward based on the local graph embedding features, which further promotes the coordination between agents.

Representing a hybrid network

The distributed and hybrid networks are optimized by mini-batch gradient descent to minimize the following loss:

其中

是由最后一个状态引导的n步返回，n的上限为T。因此，混合网络的参数可以由下式更新：in

It is n steps back guided by the last state, and the upper limit of n is T. Therefore, the parameters of the hybrid network can be updated as follows:

其中

是混合网络更新的学习率。为了降低复杂度，我们进一步在分布式网络中共享非输出层的权值参数，表示分布式网络的组合权重参数为

。因此，关于

的梯度可以计算为：in

is the learning rate of the hybrid network update. To reduce complexity, we further share the weight parameters of the non-output layer in the distributed network, indicating that the combined weight parameters of the distributed network are

Therefore, regarding

The gradient of can be calculated as:

因此，分布式网络的更新规则可以推导为：Therefore, the update rule of the distributed network can be derived as:

其中，

和

分别表示策略改进学习率和critic学习学习率。in,

and

作为优选，所述步骤3具体如下：Preferably, the step 3 is as follows:

。Step 3.1: Input the actual observed power IoT data as the agent observation state and environmental information into the network update algorithm based on graph embedding, initialize the network parameters, and initialize the network learning rate

.

步骤3.2：从经验池中抽取一个批次的数据

，根据步骤2.4中所推导的公式计算策略梯度

以及网络损失

，基于步骤2.4中的混合网络参数更新公式更新混合网络参数。Step 3.2: Extract a batch of data from the experience pool

, calculate the policy gradient according to the formula derived in step 2.4

and network loss

, update the hybrid network parameters based on the hybrid network parameter update formula in step 2.4.

步骤3.3：进一步根据步骤2.4中所述分布式网络参数更新算法更新电力物联网络中的网络参数，直至网络收敛。Step 3.3: Further update the network parameters in the power Internet of Things network according to the distributed network parameter update algorithm described in step 2.4 until the network converges.

步骤3.4：训练好的网络参数定期更新，或在电力物联网络发生较大变化时重新训练并更新网络参数。以此满足电路物联网络中设备的接入需求，实现定制化通信。Step 3.4: The trained network parameters are updated regularly, or retrained and updated when there are major changes in the power Internet of Things network. This can meet the access requirements of devices in the circuit Internet of Things network and achieve customized communication.

以上所述仅为本申请的实施例而已，并不用于限制本申请的保护范围，对于本领域的技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above description is only an embodiment of the present application and is not intended to limit the scope of protection of the present application. For those skilled in the art, the present application may have various modifications and variations. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included in the scope of protection of the present application.

Claims

1. A method for cooperation between a wireless access point and a reconfigurable smart surface, characterized by comprising the following steps:

Step 1: Build a device communication architecture based on the power Internet of Things network, which includes: M pre-installed access points and J reconfigurable smart surfaces, where each access point is modeled as an interaction between intelligent agents through the collaborative relationship with adjacent access points and reconfigurable smart surfaces, that is, the edge in the graph neural network input, and construct the input topology of the message passing graph neural network. The message passing graph neural network is used to obtain the embedded representation of the topology to provide services for the power Internet of Things terminals;

Step 2: Based on the above-mentioned device communication architecture based on the power Internet of Things network, design the corresponding access point and reconfigurable smart surface collaboration method to maximize the system energy efficiency and meet the service quality requirements of massive devices in the power Internet of Things network in terms of transmission data rate and reliability;

Step 3: Based on the collaboration method between access points and reconfigurable smart surfaces proposed in step 2, each access point collaborates with the reconfigurable smart surface according to the trained model to meet the access needs of massive devices in the power Internet of Things network;

The step 1 is specifically as follows:

In the device communication architecture of the power Internet of Things network, the pre-installed access points in the network are represented as

, the reconfigurable smart surface in the network is expressed as

, M wireless access points and J reconfigurable smart surfaces are represented as different intelligent agent nodes, wireless access points and reconfigurable smart surfaces are represented as nodes in the graph neural network input, and the access information of power IoT devices and the hybrid spatial beam configuration between multiple wireless access points and multiple reconfigurable smart surfaces are regarded as features in the graph topology and input into the message passing graph neural network. A stable node feature graph embedding representation is obtained through the message passing mechanism of the message passing graph neural network.

The step 2 is specifically as follows:

Step 2.1: Model the system energy efficiency optimization problem as a decentralized partially observable Markov decision process;

In order to dynamically maximize the system energy efficiency of the collaboration between wireless access points and reconfigurable smart surfaces, the objective function of the system can be expressed as:

in

represents the network energy efficiency at time slot t,

Representing user parameters, jointly reconfigurable smart surface unit selection, coordinated discrete phase shift control and power allocation strategy, the above system energy efficiency optimization problem is modeled as a decentralized partially observable Markov decision process. After converting the above optimization problem into a decentralized partially observable Markov decision process, the converted optimization function is as follows:

in

is a non-negative parameter that imposes a penalty on data rate violations,

Indicates the data rate limit,

A fixed value in each time slot.

represents the data rate in each time slot,

Indicates the number of antennas,

represents the user of the collaborative service between the access point and the reconfigurable smart surface,

Its global reward function can be expressed as:

;

Step 2.2: Achieve more efficient collaborative learning by integrating graph embedding and different rewards;

The agents represent wireless access points and reconfigurable smart surfaces. The interactions between agents represent the wireless communication environment and its communication mode. The agents and their interactions are modeled as a directed communication graph.

,

Represents the characteristics of the node,

Represents the characteristics of the edge,

The node characteristics of wireless access point i include the spatial channel information from the access point to its associated devices, the queue information of the associated users, and the local action observation history of the access point:

The characteristics of the edge describe the agent

To Agent

The interaction between them can be expressed mathematically as:

;

Step 2.3: Maintain a message passing graph neural network at each distributed node i. In each message passing graph neural network layer, each agent first transmits the embedding information to its neighboring agents, then aggregates the embedding information from the neighboring agents and updates its local hidden state.

The message passing process is shown below:

in

Represents a message function,

represents the update operation. After the graph embedding module, the agent

The gated recurrent unit will be used to embed the local state of the output

Agent

Local actions used

Is to generate sub-strategies from actions

obtained by sampling;

Step 2.4: Express the combined parameters of the graph embedding module and the action generation module in the distributed strategy as

, our goal is to maximize the performance function:

in

Follow a joint strategy

in

is the actual input of the graph embedding,

represents the time difference advantage, which is given by:

in

Represents the global state value,

Decomposed into a form combined with a mixing function, as shown below:

in

Representing an Agent

Representing a hybrid network

in

It is guided by the last state

Step back,

The upper limit of is T, and the parameters of the hybrid network can be updated as follows:

in

,about

The gradient of can be calculated as:

The update rule of the distributed network can be derived as follows:

in,

and

Represent the strategy improvement learning rate and critic learning rate respectively;

The step 3 is specifically as follows:

Step 3.1: Input the actual observed power IoT data as the agent observation state and environmental information into the network update algorithm based on graph embedding, initialize the network parameters, and initialize the network learning rate

,

Step 3.2: Extract a batch of data B from the experience pool and calculate the policy gradient according to the formula derived in step 2.4

and network loss

Step 3.3: Further update the network parameters in the power Internet of Things network according to the distributed network parameter update algorithm described in step 2.4 until the network converges.

Step 3.4: The trained network parameters are updated regularly, or the network parameters are retrained and updated when there are major changes in the power Internet of Things network, so as to meet the access requirements of devices in the circuit Internet of Things network and realize customized communication.