[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115499849B - Wireless access point and reconfigurable intelligent surface cooperation method - Google Patents

Wireless access point and reconfigurable intelligent surface cooperation method Download PDF

Info

Publication number
CN115499849B
CN115499849B CN202211429707.5A CN202211429707A CN115499849B CN 115499849 B CN115499849 B CN 115499849B CN 202211429707 A CN202211429707 A CN 202211429707A CN 115499849 B CN115499849 B CN 115499849B
Authority
CN
China
Prior art keywords
network
things
access point
intelligent
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211429707.5A
Other languages
Chinese (zh)
Other versions
CN115499849A (en
Inventor
罗弦
廖荣涛
杨荣浩
李想
姚渭箐
董亮
刘芬
张岱
郭岳
王逸兮
李磊
孟浩华
王敬靖
胡欢君
龙霏
袁翔宇
王博涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Telecommunication Branch of State Grid Hubei Electric Power Co Ltd
Original Assignee
Information and Telecommunication Branch of State Grid Hubei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Telecommunication Branch of State Grid Hubei Electric Power Co Ltd filed Critical Information and Telecommunication Branch of State Grid Hubei Electric Power Co Ltd
Priority to CN202211429707.5A priority Critical patent/CN115499849B/en
Publication of CN115499849A publication Critical patent/CN115499849A/en
Application granted granted Critical
Publication of CN115499849B publication Critical patent/CN115499849B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/18Network planning tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The application relates to a method for cooperation between a wireless access point and a reconfigurable intelligent surface, which comprises the following steps: building an equipment communication architecture based on the power internet of things; according to the established equipment communication architecture based on the power internet of things, a corresponding access point and intelligent reconfigurable surface cooperation method is designed, the aim of maximizing system energy efficiency is taken, and the service quality requirements of mass equipment under the power internet of things on the aspects of data transmission rate and reliability are met; and each access point is cooperated with the reconfigurable intelligent surface according to the trained model so as to meet the access requirement of the mass equipment in the power internet of things. According to the method, the giant wireless communication network is modeled into a graph, and the graph is subjected to dimension reduction by using a graph embedding method to obtain an efficient graph representation, so that the model training complexity can be effectively reduced, and highly customized communication is realized.

Description

Wireless access point and reconfigurable intelligent surface cooperation method
Technical Field
The application belongs to the technical field of power Internet of things, and particularly relates to a wireless access point and reconfigurable intelligent surface cooperation method.
Background
In recent years, with the rapid development of the power internet of things, massive equipment is deployed at the network edge of the power internet of things. Because the power network system is complex and huge, and the problems of high management difficulty, high cost and the like exist only by relying on manpower to manage and control, a new information communication technology needs to be introduced to improve the operation performance and the management and control efficiency of the power system. In order to realize intelligent management and control of the power internet of things, the allocation condition and performance of the power network need to be sensed and measured in real time. Therefore, the power internet of things needs to meet the requirements of network edge internet of things equipment access and mass data transmission, so that efficient and reliable operation of the power internet of things is guaranteed. With the continuous development of information communication technology, a new generation of mobile communication technology can provide high-speed and stable service when a large amount of power equipment is accessed to a power network, but due to the heterogeneity of network edge equipment, high-degree customized and intelligent communication cannot be realized at present, namely, network resources are dynamically configured to support ultra-dense connection.
Reconfigurable smart surfaces are a totally new revolutionary technology that can intelligently reconfigure the wireless propagation environment by integrating a large number of low-cost passive reflective elements in a plane, thereby significantly improving the performance of wireless communication networks. The reconfigurable intelligent surface provides possibility for high customization, and can reconfigure a wireless propagation environment through highly controllable and intelligent signal reflection, thereby providing a new degree of freedom for further improving the performance of a wireless link and paving a road for realizing an intelligent programmable wireless environment. By means of a reconfigurable intelligent surface technology, mixed space beams are flexibly configured through cooperation of the wireless access point and the wireless access point, data are enhanced as required, interference suppression is flexibly carried out, efficient mixed airspace and power domain multiplexing is carried out, and efficient customized communication and intelligent communication can be effectively carried out. Therefore, in a power internet of things scene with a heterogeneous power grid and massive devices, an effective wireless access point and reconfigurable intelligent surface cooperation technology needs to be designed urgently so as to realize highly customized communication and intelligent communication.
Disclosure of Invention
The embodiment of the application aims to provide a method for cooperation between a wireless access point and a reconfigurable intelligent surface, wherein a wireless communication network is modeled into a graph representation, an embedded representation of the network is obtained by using a graph embedding method, a low-dimensional representation of the graph can be effectively obtained by using the graph embedding method, the model training complexity is reduced, and high-degree customized communication is realized.
In order to achieve the above purpose, the present application provides the following technical solutions:
the embodiment of the application provides a method for cooperation between a wireless access point and a reconfigurable intelligent surface, which is characterized by comprising the following steps:
step 1: building a device communication architecture based on an electric power internet of things, wherein the network architecture comprises: the method comprises the steps that M pre-installed access points and J reconfigurable intelligent surfaces are built, wherein each access point is modeled into interaction between intelligent bodies through a cooperative relation with adjacent access points and the reconfigurable intelligent surfaces, namely edges in graph neural network input are built, input topology of a message transmission graph neural network is built, and embedded representation of the topology is obtained through the message transmission graph neural network, so that services are provided for a power internet of things terminal;
and 2, step: according to the established equipment communication architecture based on the power internet of things, a corresponding access point and reconfigurable intelligent surface cooperation method is designed, the aim of maximizing system energy efficiency is taken, and the service quality requirements of mass equipment under the power internet of things on the aspects of data transmission rate and reliability are met;
and step 3: based on the method for the cooperation between the access point and the reconfigurable intelligent surface, which is provided by the step 2, each access point cooperates with the reconfigurable intelligent surface according to the trained model so as to meet the access requirements of mass equipment in the power internet of things.
The step 1 is specifically as follows:
step 1: in the device communication architecture of the power internet of things, a preinstalled access point in the network is represented as
Figure 127484DEST_PATH_IMAGE001
Representing a reconfigurable intelligent surface in a network as &>
Figure 687034DEST_PATH_IMAGE002
The method comprises the steps of representing M wireless access points and J reconfigurable intelligent surfaces as different intelligent body nodes, representing the wireless access points and the reconfigurable intelligent surfaces as nodes in graph neural network input, taking access information of the power internet of things equipment, mixed space wave beam configuration between the wireless access points and the reconfigurable intelligent surfaces as features in graph topology, inputting the features into a message transfer graph neural network, and obtaining stable node feature graph embedded representation through a message transfer mechanism of the message transfer graph neural network.
The step 2 is specifically as follows:
step 2.1: in order to achieve a dynamic maximization of the system energy efficiency of the cooperation of the wireless access point and the reconfigurable intelligent surface, the objective function of the system can be expressed as:
Figure 342138DEST_PATH_IMAGE003
wherein
Figure 206188DEST_PATH_IMAGE004
Represents the network energy efficiency of the time slot t, < > is greater or less>
Figure 394462DEST_PATH_IMAGE005
And representing user parameters, combining the selection of a reconfigurable intelligent surface unit, coordinating the discrete phase shift control and the power distribution strategy, and modeling the long-term energy efficiency optimization problem into a decentralized part observable Markov decision process. After converting the above optimization problem into a decentralized part observable markov decision process, the converted optimization function is as follows:
Figure 561132DEST_PATH_IMAGE007
wherein
Figure 532893DEST_PATH_IMAGE008
A positive factor representing a trade-off between control energy efficiency and transmission reliability>
Figure 76001DEST_PATH_IMAGE009
Is a non-negative parameter that imposes a penalty on violating the data rate, and>
Figure 384360DEST_PATH_IMAGE010
indicates a data rate limit, <' > or>
Figure 107948DEST_PATH_IMAGE011
Is a fixed value in each time slot, is greater than or equal to>
Figure 65540DEST_PATH_IMAGE012
Indicates the data rate at each time slot, and->
Figure 677918DEST_PATH_IMAGE013
Representing the number of antennas>
Figure 44046DEST_PATH_IMAGE014
Representing the access point and the users of the reconfigurable intelligent surface collaboration service.
Its global reward function can be expressed as:
Figure 942732DEST_PATH_IMAGE016
step 2.2: more efficient cooperative learning is achieved through two technologies of integration graph embedding and different rewards, the intelligent bodies represent wireless access points and reconfigurable intelligent surfaces, the interaction between the intelligent bodies represents a wireless communication environment and a communication mode thereof, and the intelligent bodies and the interaction between the intelligent bodies are modeled into a directed communication graph
Figure 561189DEST_PATH_IMAGE017
Wherein the agent is modeled as node I, the interaction between agents is modeled as a directed edge { [ MEANS ]>
Figure 977258DEST_PATH_IMAGE018
Figure 994630DEST_PATH_IMAGE019
Represents a characteristic of a node, is asserted>
Figure 408425DEST_PATH_IMAGE020
The characteristics of the edges are represented by,
the node characteristics of a wireless access point i include spatial channel information of the access point to its associated devices, queue information of associated users, and local action observation history of the access point:
Figure 606188DEST_PATH_IMAGE021
the edge being characterized as an agent
Figure 530675DEST_PATH_IMAGE022
To intelligent agent>
Figure 169598DEST_PATH_IMAGE023
The interaction between them can be expressed mathematically as:
Figure 908622DEST_PATH_IMAGE025
step 2.3: because graph nodes and edges have high-dimensional characteristics in a large-scale network, an action generation module based on graph embedding is provided, and each distributed node is provided with a plurality of distributed nodes
Figure 203468DEST_PATH_IMAGE022
A messaging graph neural network is maintained. Similar to the multi-layered perceptron, the message passing graph neural network adopts a layered structure, in each message passing graph neural network layer, each agent first transmits embedded information to its neighboring agents, and then aggregates the embedded information from the neighboring agents and updates its local hidden state, and the message passing process is as follows:
Figure 758077DEST_PATH_IMAGE026
wherein
Figure 752971DEST_PATH_IMAGE027
Represents a message function, <' > or>
Figure 898782DEST_PATH_IMAGE028
Represents an update operation, after the graph embedding module, the agent @>
Figure 179459DEST_PATH_IMAGE022
Will use a gated-loop unit based on the locally embedded state of the output->
Figure 741022DEST_PATH_IMAGE029
Predicting local action, wherein the gated cyclic unit is a simplified variant of the long-short term memory network, and the local embedding state is shown as follows:
Figure 682433DEST_PATH_IMAGE030
intelligent agent
Figure 375976DEST_PATH_IMAGE022
The local action taken->
Figure 442152DEST_PATH_IMAGE031
Is slave action taken sub-strategy>
Figure 368258DEST_PATH_IMAGE032
The obtained result of the medium sampling is that,
step 2.4: representing combined parameters of graph embedding module and action generating module in distributed strategy as
Figure 773962DEST_PATH_IMAGE033
Our goal is to maximize the performance function:
Figure 700723DEST_PATH_IMAGE034
wherein
Figure 316512DEST_PATH_IMAGE035
Is to follow a union strategy>
Figure 954298DEST_PATH_IMAGE036
Based on the dominance function, a policy gradient is calculated, which is given by:
Figure 103258DEST_PATH_IMAGE038
wherein
Figure 371559DEST_PATH_IMAGE039
Is the actual input of a map embedding>
Figure 740223DEST_PATH_IMAGE040
Representing the time difference advantage, given by:
Figure 417586DEST_PATH_IMAGE041
wherein
Figure 922517DEST_PATH_IMAGE042
Represents a global state value, <' > based on a global status>
Figure 922571DEST_PATH_IMAGE043
Representing global state-action values, for solving credit allocation problems during training, training a distributed network using value decomposition with a global state value->
Figure 653898DEST_PATH_IMAGE044
The decomposition is in the form of a combination with a mixing function as shown in the following equation:
Figure 492541DEST_PATH_IMAGE046
wherein
Figure 986668DEST_PATH_IMAGE047
Indicating an intelligent cube pick>
Figure 924668DEST_PATH_IMAGE022
In a centralized training process, each agent receives different rewards by evaluating its contribution to global reward improvements based on local map embedded features to further facilitate coordination between agents that will ÷ or based on their local state values>
Figure 500880DEST_PATH_IMAGE048
Weight parameters expressed as a distributed network, shared among agents, with ≦ based on>
Figure 549739DEST_PATH_IMAGE049
Indicates the mixing network->
Figure 406093DEST_PATH_IMAGE050
By small batch gradient descent, the distributed and hybrid networks are optimized such that the following losses are minimized:
Figure 842890DEST_PATH_IMAGE051
wherein
Figure 17651DEST_PATH_IMAGE052
Is n steps back from the last state, the upper limit of n is T, and the parameters of the hybrid network can be updated by the following formula:
Figure 696632DEST_PATH_IMAGE053
wherein
Figure 702765DEST_PATH_IMAGE054
Is the learning rate of the mixed network update, further shares the weight parameter of the non-output layer in the distributed network, and represents that the combined weight parameter of the distributed network is ^ er>
Figure 421716DEST_PATH_IMAGE055
About>
Figure 146089DEST_PATH_IMAGE056
The gradient of (d) can be calculated as:
Figure 864646DEST_PATH_IMAGE057
the update rule for a distributed network can be derived as:
Figure 223821DEST_PATH_IMAGE058
wherein,
Figure 877788DEST_PATH_IMAGE059
and &>
Figure 918818DEST_PATH_IMAGE060
Respectively representing the strategy improvement learning rate and the critic learning rate.
The step 3 is specifically as follows:
step 3.1: inputting the data of the power internet of things obtained by actual observation as the observation state of the intelligent agent and environmental information into a network updating algorithm based on graph embedding, initializing network parameters and initializing network learning rate
Figure 50853DEST_PATH_IMAGE061
Step 3.2: extracting data of a batch from an experience pool
Figure 530114DEST_PATH_IMAGE062
The strategy gradient is calculated according to the formula derived in step 2.4>
Figure 449922DEST_PATH_IMAGE063
And network loss->
Figure 791298DEST_PATH_IMAGE064
Updating the hybrid network parameters based on the hybrid network parameter updating formula in step 2.4,
step 3.3: further updating the network parameters in the power internet of things according to the distributed network parameter updating algorithm in the step 2.4 until the network converges,
step 3.4: the trained network parameters are updated regularly, or the network parameters are retrained and updated when the power internet of things is changed greatly, so that the access requirements of equipment in the circuit internet of things are met, and customized communication is realized.
Compared with the prior art, the beneficial effects of this application are: the application provides a wireless access point and reconfigurable intelligent surface cooperation framework aiming at the requirements of an electric power Internet of things, so that the access requirements of mass equipment are met. According to the method and the device, the cooperation between the wireless access point and the reconfigurable intelligent surface is realized, and the system energy efficiency is dynamically maximized, so that the high-efficiency communication is realized. In addition, the application provides a graph embedding-based wireless network representation method, which models a huge wireless communication network into a graph and reduces the dimension of the graph to obtain an efficient graph representation by using the graph embedding method. The method provided by the application can effectively reduce the complexity of model training and realize highly customized communication.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 is a flow chart of a method according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
Referring to fig. 1, the present application provides a method for a wireless access point to cooperate with a reconfigurable intelligent surface, which includes the following steps.
Step 1: building a device communication architecture based on an electric power internet of things, wherein the network architecture comprises: the method comprises the steps that M pre-installed access points and J reconfigurable intelligent surfaces are built, wherein each access point is modeled into interaction between intelligent bodies through a cooperative relation with adjacent access points and the reconfigurable intelligent surfaces, namely edges in graph neural network input are built, input topology of a message transmission graph neural network is built, and embedded representation of the topology is obtained through the message transmission graph neural network, so that services are provided for a power internet of things terminal;
step 2: according to the established equipment communication architecture based on the power internet of things, a corresponding access point and reconfigurable intelligent surface cooperation method is designed, the aim of maximizing system energy efficiency is taken, and the service quality requirements of mass equipment under the power internet of things on the aspects of data transmission rate and reliability are met;
and step 3: based on the method for the cooperation between the access point and the reconfigurable intelligent surface, which is provided by the step 2, each access point cooperates with the reconfigurable intelligent surface according to the trained model so as to meet the access requirements of mass equipment in the power internet of things.
Preferably, the step 1 is as follows:
step 1: in the device communication architecture of the power internet of things, a preinstalled access point in the network is represented as
Figure 789341DEST_PATH_IMAGE001
Representing a reconfigurable intelligent surface in a network as &>
Figure 857529DEST_PATH_IMAGE002
M wireless access points and J reconfigurable intelligent surfaces are expressed as different intelligent body nodes, and wireless access points and reconfigurable intelligent surfaces are expressed as drawing godsThe access information of the power internet of things equipment, the mixed space beam configuration between the plurality of wireless access points and the plurality of reconfigurable intelligent surfaces are regarded as characteristics in graph topology through nodes in network input and input into a message transmission graph neural network, and stable node characteristic graph embedded representation is obtained through a message transmission mechanism of the message transmission graph neural network.
Preferably, the step 2 is specifically as follows:
step 2.1: because the network edge of the power internet of things is provided with mass equipment, and a high-performance mass equipment access framework needs to be elaborately designed, the hybrid beams can be flexibly and coordinately reconstructed by designing the cooperation between the access point and the reconfigurable intelligent surface, so that the equipment is coordinately accessed into a communication network, and the customizable intelligent communication is realized. Therefore, to achieve a system energy efficiency that dynamically maximizes the cooperation of the wireless access point and the reconfigurable intelligent surface, the objective function of the system can be expressed as:
Figure 977932DEST_PATH_IMAGE065
wherein
Figure 898614DEST_PATH_IMAGE004
Representing the network energy efficiency of the time slot t. This objective function can be modeled as a constrained markov decision process, however, solving the above problem in a centralized manner is computationally inefficient due to the large scale joint state-action space and the high dimensional information exchange overhead of multiple wireless access points and reconfigurable smart surfaces to a centralized controller. To address the above issues in an efficient and low-complexity manner and to maximize network energy efficiency while ensuring diversified user performance, we can model the above long-term energy efficiency optimization problem as a decentralized partially observable markov decision process in conjunction with reconfigurable intelligent surface unit selection, coordinated discrete phase shift control, and power allocation strategies. In particular, the partially observable Markov decision process provides a general framework for describing Markov with incomplete informationThe decision process, while the de-centering portion may observe the markov decision process to extend it to discrete locations.
Based on the Lyapunov optimization theory, we can convert the above optimization problem into a decentralized partially observable markov decision process, and the converted optimization function is as follows:
Figure 405075DEST_PATH_IMAGE066
wherein
Figure 298076DEST_PATH_IMAGE008
A positive factor representing a trade-off between control energy efficiency and transmission reliability>
Figure 353495DEST_PATH_IMAGE009
Is a non-negative parameter which penalizes a data rate violation>
Figure 495894DEST_PATH_IMAGE010
Indicates a data rate limit, <' > or>
Figure 337204DEST_PATH_IMAGE011
In each time slot is a fixed value>
Figure 819133DEST_PATH_IMAGE012
Representing a data rate in each time slot, based on the time period>
Figure 717556DEST_PATH_IMAGE013
Indicates the number of antennas, and>
Figure 940727DEST_PATH_IMAGE014
representing the access point and the users of the reconfigurable intelligent surface collaboration service.
Its global reward function can be expressed as:
Figure 756367DEST_PATH_IMAGE067
step 2.2: the optimization problem described in step 2.1 can be solved using the conventional multi-agent reinforcement learning method, but because information needs to be exchanged between adjacent agents to achieve cooperation, the conventional multi-agent reinforcement learning method causes high communication overhead and delay in processing high-dimensional information, so the conventional multi-agent reinforcement learning method is inefficient in solving the observable markov decision process problem of the highly-coupled decentralized part. The common centralized training and decentralized execution in the existing multi-agent reinforcement learning algorithm is expanded, and more efficient cooperative learning is realized by integrating two technologies of graph embedding and different rewards. The intelligence represents a wireless access point and a reconfigurable intelligent surface. The interaction between agents represents the wireless communication environment and its way of communication. Agents and interactions therebetween are modeled as directed communication graphs
Figure 633146DEST_PATH_IMAGE017
. Where agent is modeled as node I, the interaction between agents is modeled as a directed edge +>
Figure 203936DEST_PATH_IMAGE018
Figure 147359DEST_PATH_IMAGE019
Represents a characteristic of a node, is asserted>
Figure 766690DEST_PATH_IMAGE020
Representing the characteristics of the edge.
The node characteristics of a wireless access point i include spatial channel information of the access point to its associated devices, queue information of associated users, and local action observation history of the access point:
Figure 879003DEST_PATH_IMAGE021
the edge being characterized as an agent
Figure 856579DEST_PATH_IMAGE022
To intelligent agent>
Figure 195288DEST_PATH_IMAGE023
The interaction between them can be expressed mathematically as:
Figure 913583DEST_PATH_IMAGE068
step 2.3: since graph nodes and edges have high-dimensional characteristics in a large-scale network, an action generation module based on graph embedding is provided. The module utilizes the low-dimensional embedding characteristic of the message transfer graph neural network learning directed graph, can effectively improve the generalization capability of the network and enhance the cooperation capability between the wireless access point and the reconfigurable intelligent surface, and simultaneously only needs lower information exchange overhead.
We are at each distributed node
Figure 490189DEST_PATH_IMAGE022
A messaging graph neural network is maintained. Similar to the multi-tier perceptrons, the messaging graph neural network employs a hierarchical structure. Within each messaging graph neural network layer, each agent first transmits embedded information to its neighboring agents, then aggregates the embedded information from the neighboring agents and updates its local hidden state, the messaging process is as follows:
Figure 169825DEST_PATH_IMAGE026
wherein
Figure 464672DEST_PATH_IMAGE027
Represents a message function, <' > or>
Figure 986658DEST_PATH_IMAGE028
Indicating an update operation. After the map embedding module, the agent->
Figure 542404DEST_PATH_IMAGE022
Will use a gated-loop unit based on the locally embedded state of the output->
Figure 330625DEST_PATH_IMAGE029
Predicting local action, wherein the gated cyclic unit is a simplified variant of the long-short term memory network, and the local embedding state is shown as follows: />
Figure 909505DEST_PATH_IMAGE030
Intelligent agent
Figure 533384DEST_PATH_IMAGE022
The local action taken->
Figure 848697DEST_PATH_IMAGE031
Is slave action taken sub-strategy>
Figure 837512DEST_PATH_IMAGE032
And (4) medium sampling.
Step 2.4: representing combined parameters of graph embedding module and action generating module in distributed strategy as
Figure 467470DEST_PATH_IMAGE033
Our goal is to maximize the performance function:
Figure 832724DEST_PATH_IMAGE034
wherein
Figure 2543DEST_PATH_IMAGE035
Is to follow a union policy->
Figure 490156DEST_PATH_IMAGE036
The joint state transition of (1). Therefore, we meanThe policy gradient is computed from the merit function, which is given by:
Figure 43628DEST_PATH_IMAGE037
wherein
Figure 182879DEST_PATH_IMAGE039
Is the actual entry of the map insert, is asserted>
Figure 708669DEST_PATH_IMAGE040
Represents the time difference advantage, given by:
Figure 537823DEST_PATH_IMAGE041
wherein
Figure 47432DEST_PATH_IMAGE042
Represents a global state value, <' > is asserted>
Figure 255953DEST_PATH_IMAGE043
Representing a global state-action value. To solve the credit allocation problem during training, we train a distributed network with a value decomposition that brings a global status value ≦ into the ≦ value>
Figure 760884DEST_PATH_IMAGE044
The decomposition is in the form of a combination with a mixing function as shown in the following equation:
Figure 760939DEST_PATH_IMAGE069
wherein
Figure 492266DEST_PATH_IMAGE047
Indicating an intelligent cube pick>
Figure 973319DEST_PATH_IMAGE022
The local state value of (2).In the centralized training process, each agent receives different rewards by evaluating its contribution to global reward improvement based on the local graph embedding features, thereby further facilitating coordination between agents. Will->
Figure 332756DEST_PATH_IMAGE048
Weight parameters expressed as a distributed network, shared among agents, with ≦ based on>
Figure 270756DEST_PATH_IMAGE049
Indicates the mixing network->
Figure 456756DEST_PATH_IMAGE050
The weight of (c). The distributed and hybrid networks are optimized by small batch gradient descent, minimizing the following losses:
Figure 771194DEST_PATH_IMAGE070
wherein
Figure 510040DEST_PATH_IMAGE052
Is an n-step return from the last state, with the upper limit of n being T. Thus, the parameters of the hybrid network may be updated by:
Figure 87783DEST_PATH_IMAGE071
wherein
Figure 652756DEST_PATH_IMAGE054
Is the learning rate of the hybrid network update. To reduce complexity, we further share the weight parameter of the non-output layer in the distributed network, indicating that the combining weight parameter of the distributed network is ≥ l>
Figure 941524DEST_PATH_IMAGE055
. Accordingly, in respect of->
Figure 275553DEST_PATH_IMAGE056
The gradient of (d) can be calculated as:
Figure 56821DEST_PATH_IMAGE072
thus, the update rule for a distributed network can be derived as:
Figure 515615DEST_PATH_IMAGE073
wherein,
Figure 280178DEST_PATH_IMAGE059
and &>
Figure 406397DEST_PATH_IMAGE060
Respectively representing a strategy improvement learning rate and a critic learning rate.
Preferably, the step 3 is specifically as follows:
step 3.1: inputting the data of the power internet of things obtained by actual observation as the observation state of the intelligent agent and environmental information into a network updating algorithm based on graph embedding, initializing network parameters and initializing network learning rate
Figure 686462DEST_PATH_IMAGE061
Step 3.2: extracting data of a batch from an experience pool
Figure 835814DEST_PATH_IMAGE062
The strategy gradient is calculated according to the formula derived in step 2.4>
Figure 358063DEST_PATH_IMAGE063
And network loss>
Figure 837323DEST_PATH_IMAGE064
Based on the mixed network parameters in step 2.4The update formula updates the hybrid network parameters.
Step 3.3: and further updating the network parameters in the power internet of things according to the distributed network parameter updating algorithm in the step 2.4 until the network converges.
Step 3.4: and the trained network parameters are updated periodically, or the network parameters are retrained and updated when the power internet of things is greatly changed. Therefore, the access requirement of the equipment in the circuit Internet of things is met, and customized communication is realized.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (1)

1. A method for cooperation between a wireless access point and a reconfigurable intelligent surface is characterized by comprising the following steps:
step 1: setting up a device communication architecture based on an electric power internet of things, wherein the device communication architecture comprises: the method comprises the steps that M pre-installed access points and J reconfigurable intelligent surfaces are built, wherein each access point is modeled into interaction between intelligent bodies through a cooperative relation with adjacent access points and the reconfigurable intelligent surfaces, namely edges in graph neural network input, an input topology of a message transmission graph neural network is built, and the message transmission graph neural network is utilized to obtain an embedded representation of the topology so as to provide service for a power internet of things terminal;
and 2, step: according to the established equipment communication architecture based on the power internet of things, a corresponding access point and reconfigurable intelligent surface cooperation method is designed, the aim of maximizing system energy efficiency is taken, and the service quality requirements of mass equipment under the power internet of things on the aspects of data transmission rate and reliability are met;
and step 3: based on the method for the cooperation between the access points and the reconfigurable intelligent surface, which is provided by the step 2, each access point cooperates with the reconfigurable intelligent surface according to the trained model so as to meet the access requirements of mass equipment in the power internet of things;
the step 1 is specifically as follows:
in the device communication architecture of the power internet of things, a preinstalled access point in the network is represented as
Figure QLYQS_1
Expressing a reconfigurable intelligent surface in a network as ^ or ^>
Figure QLYQS_2
The method comprises the steps of expressing M wireless access points and J reconfigurable intelligent surfaces as different intelligent body nodes, expressing the wireless access points and the reconfigurable intelligent surfaces as nodes in graph neural network input, considering the access information of the power internet of things equipment, the configuration of mixed space wave beams between a plurality of wireless access points and a plurality of reconfigurable intelligent surfaces as characteristics in graph topology, inputting the characteristics to a message transmission graph neural network, and obtaining stable node characteristic graph embedded representation through a message transmission mechanism of the message transmission graph neural network;
the step 2 is specifically as follows:
step 2.1: modeling the system energy efficiency optimization problem as a decentralized part observable Markov decision process;
in order to achieve a dynamic maximization of the system energy efficiency of the cooperation of the wireless access point and the reconfigurable intelligent surface, the objective function of the system can be expressed as:
Figure QLYQS_3
wherein
Figure QLYQS_4
Represents the network energy efficiency of the time slot t, < > is greater or less>
Figure QLYQS_5
Representing user parameters, combining the selection of the reconfigurable intelligent surface unit, the coordination of the discrete phase shift control and the power distribution strategy, and combining the aboveModeling the system energy efficiency optimization problem into a decentralized part observable Markov decision process, and after converting the optimization problem into the decentralized part observable Markov decision process, the converted optimization function is as follows:
Figure QLYQS_6
wherein
Figure QLYQS_7
Positive coefficient representing a trade-off between control energy efficiency and transmission reliability>
Figure QLYQS_8
Is a non-negative parameter that imposes a penalty on violating the data rate, and>
Figure QLYQS_9
indicates a data rate limit, <' > or>
Figure QLYQS_10
In each time slot is a fixed value>
Figure QLYQS_11
Indicates the data rate at each time slot, and->
Figure QLYQS_12
Representing the number of antennas>
Figure QLYQS_13
Representing the access point and the users of the reconfigurable intelligent surface collaboration service,
its global reward function can be expressed as:
Figure QLYQS_14
step 2.2: more efficient cooperative learning is realized through two technologies of integration graph embedding and different rewards;
the agents represent wireless access points and reconfigurable intelligent surfaces, the interaction between agents represents the wireless communication environment and the communication mode thereof, and the agents and the interaction therebetween are modeled as directed communication graphs
Figure QLYQS_15
Wherein the agent is modeled as node I, the interaction between agents is modeled as a directed edge { [ MEANS ]>
Figure QLYQS_16
Figure QLYQS_17
Represents a characteristic of a node, is asserted>
Figure QLYQS_18
The characteristics of the edges are represented by,
the node characteristics of wireless access point i include spatial channel information of the access point to its associated devices, queue information of associated users, and local action observation history of the access point:
Figure QLYQS_19
the edge being characterized as an agent
Figure QLYQS_20
To the intelligent body->
Figure QLYQS_21
The interaction between them can be expressed mathematically as:
Figure QLYQS_22
step 2.3: maintaining a message passing graph neural network at each distributed node i, wherein in each message passing graph neural network layer, each agent firstly transmits embedded information to adjacent agents, and then aggregates the embedded information from the adjacent agents and updates the local hidden state of the agents;
the message passing process is shown as follows:
Figure QLYQS_23
wherein
Figure QLYQS_24
Represents a message function, <' > or>
Figure QLYQS_25
Represents an update operation, after the graph embedding module, the agent @>
Figure QLYQS_26
Will use a gated-loop unit based on the locally embedded state of the output->
Figure QLYQS_27
Predicting local action, wherein the gated cyclic unit is a simplified variant of the long-short term memory network, and the local embedding state is shown as follows:
Figure QLYQS_28
intelligent agent
Figure QLYQS_29
The local action taken->
Figure QLYQS_30
Is slave action taken sub-strategy>
Figure QLYQS_31
Obtained by middle sampling;
step 2.4: embedding graphs in distributed policiesThe combined parameters of the module and the action generating module are expressed as
Figure QLYQS_32
Our goal is to maximize the performance function:
Figure QLYQS_33
wherein
Figure QLYQS_34
Is to follow a union policy->
Figure QLYQS_35
Based on the dominance function, a policy gradient is calculated, which is given by: />
Figure QLYQS_36
Wherein
Figure QLYQS_37
Is the actual entry of the map insert, is asserted>
Figure QLYQS_38
Representing the time difference advantage, given by:
Figure QLYQS_39
wherein
Figure QLYQS_40
Represents a global state value, <' > is asserted>
Figure QLYQS_41
Representing global state-action values, training a distributed network using value decomposition to solve credit allocation problems during trainingThe global state value is->
Figure QLYQS_42
The decomposition is in the form of a combination with a mixing function as shown in the following equation:
Figure QLYQS_43
wherein
Figure QLYQS_44
Indicating an intelligent cube pick>
Figure QLYQS_45
In a centralized training process, each agent receives different rewards by evaluating its contribution to global reward improvement based on local graph-embedded features to further facilitate coordination between agents that will ∑ er>
Figure QLYQS_46
Expressed as weight parameters of a distributed network, shared among agents, using
Figure QLYQS_47
Indicates the mixing network->
Figure QLYQS_48
By small batch gradient descent, the distributed and hybrid networks are optimized such that the following losses are minimized:
Figure QLYQS_49
wherein
Figure QLYQS_50
Is based on the last state>
Figure QLYQS_51
Step back and is->
Figure QLYQS_52
With an upper limit of T, the parameters of the hybrid network may be updated by:
Figure QLYQS_53
wherein
Figure QLYQS_54
Is the learning rate of the mixed network update, further shares the weight parameter of the non-output layer in the distributed network, and represents that the combined weight parameter of the distributed network is ^ er>
Figure QLYQS_55
On/in>
Figure QLYQS_56
The gradient of (d) can be calculated as:
Figure QLYQS_57
the update rule for a distributed network can be derived as:
Figure QLYQS_58
wherein,
Figure QLYQS_59
and &>
Figure QLYQS_60
Respectively representing a strategy improvement learning rate and a criticc learning rate;
the step 3 is specifically as follows:
step 3.1: the electricity obtained by actual observationInputting the data of the Internet of things as the observation state of the intelligent agent and environmental information into a network updating algorithm based on graph embedding, initializing network parameters, and initializing network learning rate
Figure QLYQS_61
Step 3.2: data B for a batch is extracted from the experience pool and the policy gradient is calculated according to the formula derived in step 2.4
Figure QLYQS_62
And network loss->
Figure QLYQS_63
Updating the hybrid network parameters based on the hybrid network parameter updating formula in step 2.4,
step 3.3: further updating the network parameters in the power internet of things according to the distributed network parameter updating algorithm in the step 2.4 until the network converges,
step 3.4: the trained network parameters are updated regularly, or the network parameters are retrained and updated when the power internet of things is changed greatly, so that the access requirements of equipment in the circuit internet of things are met, and customized communication is realized.
CN202211429707.5A 2022-11-16 2022-11-16 Wireless access point and reconfigurable intelligent surface cooperation method Active CN115499849B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211429707.5A CN115499849B (en) 2022-11-16 2022-11-16 Wireless access point and reconfigurable intelligent surface cooperation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211429707.5A CN115499849B (en) 2022-11-16 2022-11-16 Wireless access point and reconfigurable intelligent surface cooperation method

Publications (2)

Publication Number Publication Date
CN115499849A CN115499849A (en) 2022-12-20
CN115499849B true CN115499849B (en) 2023-04-07

Family

ID=85115737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211429707.5A Active CN115499849B (en) 2022-11-16 2022-11-16 Wireless access point and reconfigurable intelligent surface cooperation method

Country Status (1)

Country Link
CN (1) CN115499849B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN113472419A (en) * 2021-06-23 2021-10-01 西北工业大学 Safe transmission method and system based on space-based reconfigurable intelligent surface
CN115103372A (en) * 2022-06-17 2022-09-23 东南大学 Multi-user MIMO system user scheduling method based on deep reinforcement learning
CN115310775A (en) * 2022-07-13 2022-11-08 武汉大学 Multi-agent reinforcement learning rolling scheduling method, device, equipment and storage medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3776370A1 (en) * 2018-05-18 2021-02-17 Deepmind Technologies Limited Graph neural network systems for behavior prediction and reinforcement learning in multple agent environments
CN111612126B (en) * 2020-04-18 2024-06-21 华为技术有限公司 Method and apparatus for reinforcement learning
US11546022B2 (en) * 2020-04-29 2023-01-03 The Regents Of The University Of California Virtual MIMO with smart surfaces
JP7307825B2 (en) * 2021-02-01 2023-07-12 株式会社Nttドコモ Method and apparatus for user location and tracking using radio signals reflected by reconfigurable smart surfaces
CN113573293B (en) * 2021-07-14 2022-10-04 南通大学 Intelligent emergency communication system based on RIS
CN114422056B (en) * 2021-12-03 2023-05-23 北京航空航天大学 Space-to-ground non-orthogonal multiple access uplink transmission method based on intelligent reflecting surface
CN114286369B (en) * 2021-12-28 2024-02-27 杭州电子科技大学 AP and RIS joint selection method of RIS auxiliary communication system
CN114466388B (en) * 2022-02-16 2023-08-08 北京航空航天大学 Intelligent super-surface-assisted wireless energy-carrying communication method
CN115333143B (en) * 2022-07-08 2024-05-07 国网黑龙江省电力有限公司大庆供电公司 Deep learning multi-agent micro-grid cooperative control method based on double neural networks
CN115146538A (en) * 2022-07-11 2022-10-04 河海大学 Power system state estimation method based on message passing graph neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN113472419A (en) * 2021-06-23 2021-10-01 西北工业大学 Safe transmission method and system based on space-based reconfigurable intelligent surface
CN115103372A (en) * 2022-06-17 2022-09-23 东南大学 Multi-user MIMO system user scheduling method based on deep reinforcement learning
CN115310775A (en) * 2022-07-13 2022-11-08 武汉大学 Multi-agent reinforcement learning rolling scheduling method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115499849A (en) 2022-12-20

Similar Documents

Publication Publication Date Title
Mocanu et al. On-line building energy optimization using deep reinforcement learning
Zeb et al. Industrial digital twins at the nexus of NextG wireless networks and computational intelligence: A survey
CN113282368B (en) Edge computing resource scheduling method for substation inspection
Liu et al. Federated reinforcement learning for decentralized voltage control in distribution networks
Chen et al. Mean field deep reinforcement learning for fair and efficient UAV control
Shi et al. Machine learning for large-scale optimization in 6g wireless networks
Abdullahi et al. A survey of symbiotic organisms search algorithms and applications
Zhang et al. Consensus Transfer ${Q} $-Learning for Decentralized Generation Command Dispatch Based on Virtual Generation Tribe
CN112598150B (en) Method for improving fire detection effect based on federal learning in intelligent power plant
Kumari et al. An energy efficient smart metering system using edge computing in LoRa network
WO2017114810A9 (en) Methods, controllers and systems for the control of distribution systems using a neural network architecture
Hsieh et al. AQ-learning-based swarm optimization algorithm for economic dispatch problem
Xia et al. Intelligent task offloading and collaborative computation in multi-UAV-enabled mobile edge computing
Hlophe et al. AI meets CRNs: A prospective review on the application of deep architectures in spectrum management
Zhou et al. Hierarchical multi-agent deep reinforcement learning for energy-efficient hybrid computation offloading
Qin et al. Dynamic IoT service placement based on shared parallel architecture in fog-cloud computing
Zhang et al. Backtracking search algorithm with dynamic population for energy consumption problem of a UAV-assisted IoT data collection system
CN115499849B (en) Wireless access point and reconfigurable intelligent surface cooperation method
Si et al. When spectrum sharing in cognitive networks meets deep reinforcement learning: Architecture, fundamentals, and challenges
Li et al. Toward Reinforcement-Learning-Based Intelligent Network Control in 6G Networks
KR102515287B1 (en) Intelligent home energy mangement system and method based on federated learning
Zhang et al. Application of artificial intelligence for space-air-ground-sea integrated network
Rodway et al. Differential evolution optimized fuzzy controller for wireless sensor network energy management
Chen et al. Joint optimization of UAV-WPT and mixed task offloading strategies with shared mode in SAG-PIoT: A MAD4PG approach
Zhang Artificial Intelligence for Digital Twin

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant