CN116847425A - Multi-resource route optimization method based on high-dimensional data joint optimization - Google Patents
Multi-resource route optimization method based on high-dimensional data joint optimization Download PDFInfo
- Publication number
- CN116847425A CN116847425A CN202310736074.0A CN202310736074A CN116847425A CN 116847425 A CN116847425 A CN 116847425A CN 202310736074 A CN202310736074 A CN 202310736074A CN 116847425 A CN116847425 A CN 116847425A
- Authority
- CN
- China
- Prior art keywords
- node
- network
- resource
- matrix
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000005457 optimization Methods 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000011159 matrix material Substances 0.000 claims abstract description 99
- 230000009471 action Effects 0.000 claims abstract description 96
- 230000006870 function Effects 0.000 claims abstract description 76
- 239000013598 vector Substances 0.000 claims abstract description 57
- 238000005265 energy consumption Methods 0.000 claims abstract description 28
- 238000013528 artificial neural network Methods 0.000 claims abstract description 21
- 230000002787 reinforcement Effects 0.000 claims abstract description 19
- 238000003860 storage Methods 0.000 claims abstract description 13
- 230000002776 aggregation Effects 0.000 claims description 18
- 238000004220 aggregation Methods 0.000 claims description 18
- 238000009826 distribution Methods 0.000 claims description 16
- 239000003795 chemical substances by application Substances 0.000 claims description 15
- 238000004146 energy storage Methods 0.000 claims description 14
- 230000004931 aggregating effect Effects 0.000 claims description 9
- 230000003993 interaction Effects 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 claims description 5
- 239000013604 expression vector Substances 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 230000001934 delay Effects 0.000 claims description 3
- 239000000463 material Substances 0.000 claims 1
- 238000012549 training Methods 0.000 abstract description 15
- 238000004891 communication Methods 0.000 abstract description 5
- 238000013468 resource allocation Methods 0.000 abstract 1
- 230000008569 process Effects 0.000 description 8
- 230000008859 change Effects 0.000 description 7
- 235000002020 sage Nutrition 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W40/00—Communication routing or communication path finding
- H04W40/02—Communication route or path selection, e.g. power-based or shortest path routing
- H04W40/04—Communication route or path selection, e.g. power-based or shortest path routing based on wireless node resources
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/02—Topology update or discovery
- H04L45/08—Learning-based routing, e.g. using neural networks or artificial intelligence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W40/00—Communication routing or communication path finding
- H04W40/02—Communication route or path selection, e.g. power-based or shortest path routing
- H04W40/12—Communication route or path selection, e.g. power-based or shortest path routing based on transmission quality or channel quality
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W40/00—Communication routing or communication path finding
- H04W40/24—Connectivity information management, e.g. connectivity discovery or connectivity update
- H04W40/248—Connectivity information update
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The application discloses a multi-resource route optimization method based on high-dimensional data combined optimization, and belongs to the field of network route optimization. The application determines the link relation among nodes to construct an adjacent matrix through the topological structure information of the network, and constructs a multidimensional resource matrix of the network based on the resources, bandwidth, time delay, energy consumption, storage space and the states of the nodes of the network. Training the graph neural network by using the adjacency matrix and the multidimensional resource matrix to obtain feature vectors carrying node and graph feature information; and taking the feature vector as a state representation of deep reinforcement learning, and selecting actions to update the environment state of the ad hoc network according to the current state. And acquiring a state cost function and an action cost function of feedback information update deep reinforcement learning, and selecting the action with the maximum value in all states based on the value. The application solves the problem of network resource allocation in the ad hoc network, improves the utilization rate of network resources and the communication efficiency of the network, and provides a better routing scheme for the ad hoc network.
Description
Technical Field
The application relates to the field of network route optimization, in particular to a multi-resource route optimization method based on high-dimensional data combined optimization.
Background
At present, with the high-speed development of mobile communication technology and intelligent equipment, the scale of a network is continuously enlarged, and the demands of people on the communication network are gradually increased, so that the performance requirements on the network are higher; route optimisation of networks is a problem that currently has to be solved. The goal of route optimization is to select the optimal communication path in the communication network to maximize the quality and efficiency of the communication. The traditional route optimization method generally calculates a route strategy according to the collected network flow meters, then carries out route configuration, and most of the optimization methods only consider a single route target. For example, the shortest path route optimization method uses Dijkstra algorithm and Bellman-Ford algorithm to calculate the shortest path from the current node to the target node, ignoring the global topology structure and resource utilization condition of the whole network, and causing unbalanced distribution of resources and network congestion. The method generally performs route calculation under the condition of fixed network topology and resource state, and cannot meet the requirements of network dynamics and real-time under the network environment of network topology and resource state variation.
The existing route optimization method using deep learning can comprehensively optimize route decision by considering a plurality of network resources and QoS indexes, dynamically adjust routes according to the characteristics and requirements of different flows, effectively utilize the network resources and reduce network congestion and resource waste. The system can dynamically adjust according to the real-time network state and service requirements so as to adapt to the change of network load and the change of requirements. However, designing and implementing such a route optimization system requires consideration of various factors including network topology, traffic characteristics, qoS requirements, etc., which greatly increases the complexity of the system. On the other hand, the optimization needs to be balanced among different service quality requirements, different application scenes have different requirements, and the priorities of bandwidth, time delay and the like are difficult to find a proper balance point. Before training the model, a large amount of data is required for effective learning and generalization, and in route optimization, obtaining accurate and representative training data is very difficult; longer time and a large amount of computing resources are required for training. In the face of large-scale networks and dynamically changing topologies, it is difficult to synthesize trade-off optimization objectives between multiple resources to provide accurate route optimization strategies.
Disclosure of Invention
The application aims to provide a multi-resource route optimization method based on high-dimensional data joint optimization, which is different from a single measurement standard in the prior art, and optimizes routes from four network attributes of throughput, packet loss rate, congestion state and channel state; the modeling is realized by utilizing the multidimensional resource matrix in the ad hoc network, a plurality of resource dimensions are comprehensively considered, the change of the network state is flexibly adapted, the optimal routing strategy under the complex network is solved in an efficient manner, and the network performance and the resource utilization rate are improved.
In order to realize the method, the application provides a multi-resource route optimization method based on high-dimensional data joint optimization, which integrates network topology and multi-dimensional resources to improve the performance of a route strategy, and continuously learns and adjusts an optimal route decision through self-adaptive learning of deep reinforcement learning. The method specifically comprises the following steps:
step S1: acquiring the current network state of the ad hoc network and the topology information of the network, and constructing an adjacency matrix according to the connection states among the nodes; respectively constructing resource matrixes of each network resource based on link bandwidths, time delays, energy consumption and storage spaces of nodes in the current network state, and combining the constructed matrixes to form a multidimensional resource matrix;
step S2: transmitting the constructed adjacency matrix and the multidimensional resource matrix into a graph neural network, and processing to obtain feature vectors containing feature information of each node and topological structure attributes of the graph;
step S3: the feature vector output by the graph neural network is used as a state representation of deep reinforcement learning, the value of the current state is calculated by using the feature vector, an action is selected by using a strategy according to the current state and is sent to the environment, the action is executed so as to realize interaction between an intelligent agent and the environment, and feedback of the environment is stored in an experience pool for subsequent strategy updating;
step S4: and updating the state cost function and the action cost function according to the feedback of the network environment, and optimizing the state cost function and the action cost function through deep reinforcement learning to obtain the strategy of maximum accumulated return, namely the optimal strategy.
As a preferred technical scheme of the present application, the specific steps of the step S1 are as follows:
step S11: obtaining a network topology diagram of the ad hoc network by using a network protocol analysis tool, and constructing an adjacency matrix by using an adjacency matrix representation method according to the topology diagram;
creating a matrix of size N x N, where N represents the number of nodes in the ad hoc network, and if there is a connection between node i and node j, setting the ith row and jth column elements of the matrix to 1; otherwise, set to 0; the diagonal elements of the adjacency matrix are set to 0 or other value;
step S12: aiming at each node in the target ad hoc network, respectively constructing a corresponding resource matrix according to the bandwidth, time delay, energy consumption and storage space of the node;
s121, constructing a bandwidth resource matrix: taking the bandwidth capacity of each link in the network topology as an element of a bandwidth resource matrix, wherein each row represents a network node, each column represents a node connected with the node, the element represents the bandwidth capacity of the node connected with the link, and if a certain node is not connected with a certain link, the element of the link in the corresponding row of the node is 0;
s122, constructing a time delay resource matrix: taking the time delay of each link in the network topology as an element of a time delay resource matrix, wherein each row represents a network node, each column represents a node connected with the node, the element represents the time delay of the node connected with the link, and if a certain node is not connected with a certain node, the element at the corresponding position of the node is 0;
s123, constructing an energy consumption resource matrix: taking the energy consumption of each node in the network topology as an element of an energy consumption resource matrix, wherein each row represents one network node, the element represents the time, and the element represents the energy consumption of the node at the time;
s124, constructing a storage space resource matrix: taking the storage space capacity of each node in the network topology as an element of a storage space resource matrix, wherein each row represents one network node, and only one column exists, and the element represents the storage space capacity of the node;
s125, combining the four resource matrixes in S121-S124 by adopting an extended dimension method to obtain a four-dimensional resource matrix. In this multi-dimensional resource matrix, each dimension represents a resource, and each element represents the value of the corresponding resource on the corresponding node and link. By such combination, the states and distribution of various resources in the ad hoc network can be considered at the same time, so that more comprehensive analysis and decision can be performed.
As a preferred technical solution of the present application, the step S2 specifically includes:
step S21: the graphic neural network is a neural network model formed by a graphic convolution network (Graph Convolutional Networks, GCN) and a full connection layer, and the graphic neural network adopted by the application is a variant Graph SAGE (Graph Sample andAggregated) of the GCN.
GraphSAGE is a variant of GCN, and by sampling and feature aggregation of node neighbors, the information in the graph structure is effectively utilized, and a low-dimensional representation of the nodes is generated, so that the method is suitable for processing large-scale graph data, and meanwhile, high precision and high efficiency are maintained;
the GraphSAGE network takes an adjacent matrix and a resource matrix as inputs, randomly initializes an initial representation vector for each node, samples each node from neighbor nodes, determines a neighbor node subset to be aggregated, and generates an aggregated representation vector by adopting average aggregation operation according to the representation vector of the neighbor node and the characteristics in the resource matrix;
aggregating neighbor information:
wherein ,is the result of node v aggregating neighbor information at layer l, AGGREGATE (l) is an aggregation function, +.>Is the neighbor node set of node v, +.>Is the representation vector of node u at layer 1;
step S22: the updating layer fuses the aggregated node representation vector with the original node representation vector to generate an updated node representation vector;
updating the node representation:
wherein ,is an updated representation vector of node v at layer l, CONCAT (l) Is a splicing function, which splices the expression vector of the node v in the first layer-1 with the aggregated neighbor information.
The update layer will typically follow each aggregation layer of the model to update the node representation vector layer by layer. The output of each aggregation layer will be the input of the next layer and processed through the update layer. Through multiple iterations, the model can gradually aggregate and update node information to obtain more accurate and rich node representations. After the Graph SAGE model processes the adjacency matrix and the resource matrix, the obtained node representation vector is in the form of a two-dimensional matrix. Each row of the matrix corresponds to a node, and each column corresponds to a node representing a dimension or feature of the vector. Therefore, the whole matrix covers the representation information of all nodes in the graph, and the representation vectors of the nodes gradually merge the information of the local neighbor nodes through multi-layer aggregation and updating operation, so that the characteristic expression of the nodes on the graph structure and the resource attribute is reflected.
As a preferred technical solution of the present application, the step S3 specifically includes:
the matrix which is output by the graph convolution neural network and contains the feature vector is used as a state representation of deep reinforcement learning and used for calculating a state cost function and selecting strategies, and when actions are selected according to the current state, an epsilon-greedy strategy is used for exploring new actions;
when an epsilon-greedy strategy is used for selecting actions, setting an epsilon value smaller than 1, representing that random action exploration is carried out under a certain probability, generating a random number rand between 0 and 1, and selecting random actions if rand is smaller than epsilon; if the rand is greater than or equal to E, selecting the action with the highest Q value as the current action;
an exponential decay strategy is used for determining the value of epsilon, an initial epsilon value is selected when a model is built, and the epsilon value is updated by setting a decay factor beta, which usually takes a value between 0 and 1:
∈ new =∈ old ×β
wherein ,∈new Is the updated epsilon value, epsilon old Is the last epsilon value;
the agent observes the current state obtained, selects an action to send to the environment according to the E-greedy strategy, and executes the action in the environment.
As a preferred technical solution of the present application, the state cost function may help us evaluate the quality of a certain state, so as to guide the selection of a policy, where in the step S4, the state cost function is V (S):
V(s)=E[G t |S t =s]
wherein ,St Indicating the state at time t, G t Indicating a desired cumulative return from time t, E indicating a desired operation;
calculating an action cost function:
the action cost function represents the expected return, denoted Q (s, a), from performing an action in a given state. The application uses Q-learning algorithm to learn action cost function, and estimates the value of each state action pair by maintaining a Q value table, so that the action cost function can be continuously updated in the reinforcement learning process, and the current state and the action value can be reflected more accurately. Initializing the action cost function Q (s, a) to a random value, observing the current state s in each time step t t According to the current state s t And action cost function Q select action a t Executing the selected action a t Interacting with the environment, collecting and storing the obtained experience data into an experience pool after the interaction, and obtaining a new state s t+1 And the action cost function Q is used for calculating a target value target;
t arget =r t+1 +γ×max(Q(s t+1 ,a t+1 ))
wherein ,rt+1 For rewarding, s t+1 In a new state, a t+1 For the optimal action of the new state, gamma is a discount factor for balancing the importance of the current rewards and future rewards;
updating the action cost function Q(s) t ,a t ):
Q(s t ,a t )=(1-α)×Q(s t ,a t )+α×target
And alpha is the learning rate, and the optimal action cost function is finally obtained by continuously and iteratively updating the Q value.
In step S4, the state motion cost function is updated, and the difference between the predicted value and the target value is measured using the loss function.
The difference between the generated routing scheme distribution and the desired optimal routing distribution is compared using KL divergence as a loss function. Defining weight coefficients of four network resources including bandwidth, time delay, energy consumption and storage space, and applying the weight coefficients to KL divergence of each resource to obtain a loss function:
wherein ,pb (x)、p d (x)、p e (x)、p s (x) Respectively representing the probability of real resource distribution of bandwidth, time delay, energy consumption and storage space, q b (x)、q d (x)、q e (x)、q s (x) Respectively representing the probability, w, of model predicted resource distribution b 、w d 、w e 、w s Representing the weight coefficient of the corresponding resource.
The KL divergence loss of the four resources including bandwidth, time delay, energy consumption and storage space is comprehensively considered, the importance of different resources in a loss function is adjusted by multiplying the KL divergence loss by a weight coefficient, and the optimization and balance of the resources are carried out according to actual demands.
The application also relates to a multi-resource route optimization system based on high-dimensional data joint optimization, which is used for constructing a multi-dimensional resource matrix of an adjacent matrix and network resources according to an initial network state to serve as the input of a graph convolution neural network, wherein the graph neural network is used for aggregating and updating node-to-node and node-to-side information to obtain node-to-side representation vectors with rich semantics, and the vectors can capture important information such as node-to-node and node-to-side structures, positions, attributes and the like and the association among all resources and combine the characteristics in the resource matrix; on the other hand, the parameters of the model can be updated according to the real-time network state and environment change, so that different routing scenes can be adapted;
the agent is configured to calculate a value of the current state based on the feature vector and determine an optimal action in the current state using an action cost function. Using the policy to select actions, the actions are performed in the environment with feedback, and experience data is stored in an experience pool for subsequent training. The training is performed by randomly extracting experience samples from the experience pool, and the random sampling can break the correlation of experience data and increase the diversity of training. During the training process, the environment is constantly interacted with, new experience data is acquired and stored in the experience playback buffer. Repeating the steps to randomly sample and train the neural network from the experience playback buffer area, and through iterative training, the intelligent agent can continuously improve the estimation of strategy and value function, improve the performance and robustness of the intelligent agent, and obtain a deep reinforcement learning model which enables the utilization rate of the self-organizing network resources to maximally select the optimal link;
and deploying a trained network model in the self-organizing network environment, and acquiring an optimal routing strategy with the maximum resource utilization rate in the self-organizing network according to the network state, and applying the routing strategy to the target self-organizing network.
Compared with the prior art, the application has at least the following beneficial effects:
1. the graph neural network is adopted to effectively process graph data, including the relation, topological structure and attribute information among nodes, so that the mode and the association of resource distribution can be better captured, and the performance of a routing strategy is improved.
2. The multidimensional resource matrix can comprehensively consider a plurality of resource dimensions in the network, more comprehensively describe the distribution condition of network resources and perform more accurate route optimization.
3. The deep reinforcement learning combined with the graph neural network can perform self-adaptive learning, and the routing strategy is dynamically adjusted according to the resource conditions of different network environments. Through iteration and optimization of the reinforcement learning algorithm, the intelligent agent can constantly learn and adjust the decision of the optimal route, and adapt to the resource change and the real-time demand change in the self-organizing network.
4. The feature extraction and representation learning capability of the graph neural network and the decision and optimization capability of deep reinforcement learning enable the system to learn and discover the rules of network resource distribution, generate an efficient routing strategy and improve network performance and resource utilization efficiency.
Drawings
FIG. 1 is an overall block diagram of a multi-resource route optimization method based on high-dimensional data joint optimization.
Detailed Description
The following detailed description of preferred embodiments of the application is made in connection with the accompanying drawings, which form a part hereof, and together with the description of the embodiments of the application, are used to explain the principles of the application and are not intended to limit the scope of the application.
The application discloses a multi-resource route optimization method based on high-dimensional data joint optimization, which is shown in figure 1 and comprises the following steps:
step S1: obtaining the current network state of the ad hoc network and the topology information of the network, and constructing a graph adjacency matrix according to the connection states among the nodes; respectively constructing resource matrixes of each network resource based on link bandwidths, time delays, energy consumption and storage spaces of nodes in the current network state, and combining the constructed matrixes to form a multidimensional resource matrix;
the specific steps of step S1 are as follows:
step S11: obtaining a network topological graph of the self-networking by using a network protocol analysis tool, and constructing an adjacency matrix by using an adjacency matrix representation method according to the topological graph, wherein the adjacency matrix is specifically as follows:
a matrix of size N x N is created, where N represents the number of nodes in the network. If there is a connection between node i and node j, the (i, j) th column element of the matrix is set to 1; otherwise, set to 0. The diagonal elements of the adjacency matrix generally represent the connection of the nodes themselves, which can be set to 0 or other values depending on the specific requirements.
Step S12: and respectively constructing corresponding resource matrixes according to the bandwidth, time delay, energy consumption and storage space of each node in the target ad hoc network. The method comprises the following specific steps:
constructing a bandwidth resource matrix: the bandwidth capacity of each link in the network topology is taken as an element of a bandwidth resource matrix, each row represents a network node, each column represents a node connected with the node, and the element represents the bandwidth capacity of the node connected with the link. If a node is not connected to a link, the element of the corresponding row of the link at the node is 0.
Constructing a time delay resource matrix: and taking the time delay of each link in the network topology as an element of a time delay resource matrix, wherein each row represents a network node, each column represents a node connected with the node, and the element represents the time delay of the node connected with the link. If a node is not connected to the node, the element at the corresponding position of the node is 0.
Constructing an energy consumption resource matrix: the energy consumption of each node in the network topology is taken as the element of the energy consumption resource matrix, each row represents one network node, the column represents the time, and the element represents the energy consumption of the node at the time
Constructing a storage space resource matrix: the storage space capacity of each node in the network topology is taken as an element of a storage space resource matrix, each row represents one network node, only one column is needed, and the element represents the storage space capacity of the node.
Four resource matrixes are combined by adopting a dimension expansion method, the size N multiplied by M of each matrix is determined according to a specific network, and finally a four-dimensional resource matrix is obtained. In this multi-dimensional resource matrix, each dimension represents a resource, and each element represents the value of the corresponding resource on the corresponding node and link. By such combination, the states and distribution of various resources in the ad hoc network can be considered at the same time, so that more comprehensive analysis and decision can be performed.
Step S2: transmitting the constructed adjacency matrix and the multidimensional resource matrix into a graph neural network, and obtaining feature vectors containing feature information of each node and topological structure attributes of the graph through network processing;
the specific steps of step S2 are as follows:
step S21, graph SAGE takes an adjacency matrix and a resource matrix as inputs, and randomly initializes an initial representation vector for each node. For each node, generating a new representation vector by aggregation operation according to the representation vector of the neighbor node and the characteristics in the resource matrix, wherein the representation vector comprises the characteristics of the adjacent matrix and the resource matrix, and then, interacting and fusing the information of the node and the neighbor node.
Aggregating neighbor information:
wherein ,is the result of node v aggregating neighbor information at layer l, AGGREGATE (l) Is an aggregation function, +.>Is the neighbor node set of node v, +.>Is the representation vector of node u at layer l-1.
The step of aggregating the neighbor information is completed by an aggregation layer of Graph SAGE, sampling is carried out from neighbor nodes of each node, a neighbor node subset needing to be aggregated is determined, the average value aggregation is adopted to aggregate the representation vectors of the neighbor nodes, and the representation vectors of the neighbor nodes are applied to an aggregation function; in this process, the characteristics of both the adjacency matrix and the resource matrix are considered to fully exploit the information they provide.
Step S22, generating a new node representation vector through a result obtained by the aggregation function, wherein the update layer is responsible for fusing the aggregated node representation vector with the original node representation vector to generate an updated node representation vector. The new expression vector contains information of neighbor nodes, and the result is obtained after aggregation and fusion.
Updating the node representation:
wherein ,is an updated representation vector of node v at layer l, CONCAT (l) Is a splicing function, which splices the expression vector of the node v in the first layer-1 with the aggregated neighbor information.
The update layer will typically follow each aggregation layer of the model to update the node representation vector layer by layer. The output of each aggregation layer will be the input of the next layer and processed through the update layer. Through multiple iterations, the model can gradually aggregate and update node information to obtain more accurate and rich node representations. After the Graph SAGE model processes the adjacency matrix and the resource matrix, the obtained node representation vector is in the form of a two-dimensional matrix. Each row of the matrix corresponds to a node, and each column corresponds to a node representing a dimension or feature of the vector. Therefore, the whole matrix covers the representation information of all nodes in the graph, and the representation vectors of the nodes gradually merge the information of the local neighbor nodes through multi-layer aggregation and updating operation, so that the characteristic expression of the nodes on the graph structure and the resource attribute is reflected.
Step S3: the feature vector output by the graph neural network is used as a state representation of deep reinforcement learning, and the value of the current state is calculated by using the feature vector. Selecting an action according to the current state using a strategy, sending the action to the environment, executing the action so as to realize interaction between the intelligent agent and the environment, and storing feedback of the environment in an experience pool for subsequent strategy updating;
the specific steps of step S3 are as follows:
and taking the matrix which is output by the graph convolution neural network and contains the feature vector as a state representation of deep reinforcement learning, and using the matrix for calculating a state cost function and selecting a strategy. In particular, the value of the current state may be calculated using the feature vector. When selecting an action based on the current state, an e-greedy strategy is used to explore the new action.
When an epsilon-greedy strategy is used for selecting actions, an epsilon value smaller than 1 is set, and random action exploration is performed under a certain probability. Generating a random number rand between 0 and 1, and selecting random action if the rand is smaller than E; if rand is greater than or equal to E, the action with the highest Q value is selected as the current action.
An exponential decay strategy is used for determining the value of epsilon, a reasonable initial epsilon value is selected according to experience of similar problems or results of previous experiments when a model is built, and epsilon is multiplied by beta each time by setting a decay factor beta, so that epsilon is gradually reduced.
∈ new =∈ old ×β
wherein ,∈new Is the updated epsilon value, epsilon old Is the last epsilon value and beta is the decay factor, typically taking a value between 0 and 1.
The agent observes the current state obtained, selects an action to send to the environment according to the E-greedy strategy, and executes the action in the environment. The environment gives a feedback including immediate rewards r after interaction with the environment t+1 And new state s t+1 。
The learning effect of the model is optimized by using the technology of experience playback, and the stability and the robustness of the model are improved. The agent interacts with the environment, records the current state, the selected action, feedback of the environment (reward signal) and the next state at each time step, and stores these experiences in an experience pool, in a first-in first-out manner, retaining a certain number of recent experiences. The experience playback technology can fully utilize the existing experience data, and avoid using only the experience at the current moment, thereby improving the efficiency utilization of the data; experience with importance is selected more frequently for training by adjusting sampling probability or priority.
Step S4: the state cost function and the action cost function are updated based on feedback from the network environment. The agent periodically takes experience samples from a pool of experiences and uses these experiences to update its policy and cost function. The state-cost function can help us evaluate the quality of a certain state, thereby guiding the selection of policies. The action cost function is to calculate the value of each action based on the current state and the possible actions so as to select the action with the greatest value. Optimizing the state cost function and the action cost function through deep reinforcement learning, so as to obtain a strategy with the maximum accumulated return, namely an optimal strategy;
the specific steps of step S4 are as follows:
experience samples are extracted from the experience pool for updating strategies and action cost functions, and intelligent performance is gradually optimized.
The state-cost function can help us evaluate the quality of a certain state, thereby guiding the selection of policies.
V(s)=E[G t |S t ]=s]
wherein ,St Indicating the state at time t, G t Indicating the expected cumulative return from time t, E indicating the expected operation, V(s) being a state cost function.
The action cost function is calculated as follows:
the action cost function represents the expected return, denoted Q (s, a), from performing an action in a given state. The application uses Q-learning algorithm to learn action cost function, and estimates the value of each state action pair by maintaining a Q value table, so that the action cost function can be continuously updated in the reinforcement learning process, and the current state and the action value can be reflected more accurately. Initializing the action cost function Q (s, a) to a random value, observing the current state s in each time step t t According to the current state s t And action cost function Q select action a t . Perform selected action a t Interacting with the environment, collecting and storing the obtained experience data into an experience pool after the interaction, and obtaining a new state s t+1 And an action cost function Q for calculating a target value target.
target=r t+1 +γ×max(Q(s t+1 ,a t+1 ))
wherein ,rt+1 For rewarding, s t+1 In a new state, a t+1 Is the optimal action of the new state, gamma is the discount factor, is used for balancingThe importance of current rewards and future rewards.
Updating the action cost function Q(s) t ,a t ):
Q(s t ,a t )=(1-α)×Q(s t ,a t )+α×target
And alpha is the learning rate, and the optimal action cost function is finally obtained by continuously and iteratively updating the Q value.
Updating the state-action cost function requires using the loss function to measure the gap between the predicted value and the target value.
The difference between the generated routing scheme distribution and the desired optimal routing distribution is compared using KL divergence as a loss function. Defining weight coefficients of four network resources including bandwidth, time delay, energy consumption and storage space, and applying the weight coefficients to KL divergence of each resource to obtain a loss function:
wherein x represents the possible value of the corresponding resource (bandwidth, delay, energy consumption, memory space), p b (x)、p d (x)、p e (x)、p s (x) Respectively representing the probability of real resource distribution of bandwidth, time delay, energy consumption and storage space, q b (x)、q d (x)、q e (x)、q s (x) Respectively representing the probability, w, of model predicted resource distribution b 、w d 、w e 、w s Representing the weight coefficient of the corresponding resource.
The KL divergence loss of the four resources including bandwidth, time delay, energy consumption and storage space is comprehensively considered, the importance of different resources in a loss function is adjusted by multiplying the KL divergence loss by a weight coefficient, and the optimization and balance of the resources are carried out according to actual demands.
The embodiment of the application also discloses a system of the multi-resource route optimization method based on the high-dimensional data joint optimization, referring to fig. 1, in order to fully utilize network resources in the ad hoc network, a multi-dimensional resource matrix of an adjacent matrix and the network resources is constructed according to an initial network state to serve as input of a graph convolution neural network, the graph neural network is used for aggregating and updating node-to-node and node-to-side information to obtain node-to-side representation vectors with rich semantics, and the vectors can capture important information such as node-to-node and node-to-side structures, positions, attributes and the like and the association among all the resources and combine the characteristics in the resource matrix; on the other hand, the parameters of the model can be updated according to the real-time network state and environment change, so that different routing scenes can be adapted;
the agent is configured to calculate a value of the current state based on the feature vector and determine an optimal action in the current state using an action cost function. Using the policy to select actions, the actions are performed in the environment with feedback, and experience data is stored in an experience pool for subsequent training. The training is performed by randomly extracting experience samples from the experience pool, and the random sampling can break the correlation of experience data and increase the diversity of training. During the training process, the environment is constantly interacted with, new experience data is acquired and stored in the experience playback buffer. Repeating the steps to randomly sample and train the neural network from the experience playback buffer area, and through iterative training, the intelligent agent can continuously improve the estimation of strategy and value function, improve the performance and robustness of the intelligent agent, and obtain a deep reinforcement learning model which enables the utilization rate of the self-organizing network resources to maximally select the optimal link;
and deploying a trained network model in the self-organizing network environment, and acquiring an optimal routing strategy with the maximum resource utilization rate in the self-organizing network according to the network state, and applying the routing strategy to the target self-organizing network.
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application.
Claims (6)
1. A multi-resource route optimization method based on high-dimensional data joint optimization is characterized by comprising the following steps:
step S1: acquiring the current network state of the ad hoc network and the topology information of the network, and constructing an adjacency matrix according to the connection states among the nodes; respectively constructing resource matrixes of each network resource based on link bandwidths, time delays, energy consumption and storage spaces of nodes in the current network state, and combining the constructed matrixes to form a multidimensional resource matrix;
step S2: transmitting the constructed adjacency matrix and the multidimensional resource matrix into a graph neural network, and processing to obtain feature vectors containing feature information of each node and topological structure attributes of the graph;
step S3: the feature vector output by the graph neural network is used as a state representation of deep reinforcement learning, the value of the current state is calculated by using the feature vector, an action is selected by using a strategy according to the current state and is sent to the environment, the action is executed so as to realize interaction between an intelligent agent and the environment, and feedback of the environment is stored in an experience pool for subsequent strategy updating;
step S4: and updating the state cost function and the action cost function according to the feedback of the network environment, and optimizing the state cost function and the action cost function through deep reinforcement learning to obtain the strategy of maximum accumulated return, namely the optimal strategy.
2. The multi-resource route optimization method based on high-dimensional data joint optimization according to claim 1, wherein the specific steps of the step S1 are as follows:
step S11: obtaining a network topology diagram of the ad hoc network by using a network protocol analysis tool, and constructing an adjacency matrix by using an adjacency matrix representation method according to the topology diagram;
creating a matrix of size N x N, where N represents the number of nodes in the ad hoc network, and if there is a connection between node i and node j, setting the ith row and jth column elements of the matrix to 1; otherwise, set to 0; the diagonal elements of the adjacency matrix are set to 0 or other value;
step S12: aiming at each node in the target ad hoc network, respectively constructing a corresponding resource matrix according to the bandwidth, time delay, energy consumption and storage space of the node;
s121, constructing a bandwidth resource matrix: taking the bandwidth capacity of each link in the network topology as an element of a bandwidth resource matrix, wherein each row represents a network node, each column represents a node connected with the node, the element represents the bandwidth capacity of the node connected with the link, and if a certain node is not connected with a certain link, the element of the link in the corresponding row of the node is 0;
s122, constructing a time delay resource matrix: taking the time delay of each link in the network topology as an element of a time delay resource matrix, wherein each row represents a network node, each column represents a node connected with the node, the element represents the time delay of the node connected with the link, and if a certain node is not connected with a certain node, the element at the corresponding position of the node is 0;
s123, constructing an energy consumption resource matrix: taking the energy consumption of each node in the network topology as an element of an energy consumption resource matrix, wherein each row represents one network node, the element represents the time, and the element represents the energy consumption of the node at the time;
s124, constructing a storage space resource matrix: taking the storage space capacity of each node in the network topology as an element of a storage space resource matrix, wherein each row represents one network node, and only one column exists, and the element represents the storage space capacity of the node;
s125, combining the four resource matrixes in S121-S124 by adopting an extended dimension method to obtain a four-dimensional resource matrix.
3. The multi-resource route optimization method based on high-dimensional data joint optimization according to claim 1, wherein the step S2 specifically includes:
step S21: the GraphSAGE network takes an adjacent matrix and a resource matrix as inputs, randomly initializes an initial representation vector for each node, samples each node from neighbor nodes, determines a neighbor node subset to be aggregated, and generates an aggregated representation vector by adopting average aggregation operation according to the representation vector of the neighbor node and the characteristics in the resource matrix;
aggregating neighbor information:
wherein ,is the result of node v aggregating neighbor information at layer l, AGGREGATE l Is a function of the aggregation of the materials,is the neighbor node set of node v, +.>Is the representation vector of node u at layer 1;
step S22: the updating layer fuses the aggregated node representation vector with the original node representation vector to generate an updated node representation vector;
updating the node representation:
wherein ,is the updated representation vector of node b at layer I, CONCAT l Is a splicing function, which splices the expression vector of the node v in the first layer-1 with the aggregated neighbor information.
4. The multi-resource route optimization method based on high-dimensional data joint optimization according to claim 1, wherein the step S3 specifically includes:
the matrix which is output by the graph convolution neural network and contains the feature vector is used as a state representation of deep reinforcement learning and used for calculating a state cost function and selecting strategies, and when actions are selected according to the current state, an epsilon-greedy strategy is used for exploring new actions;
when an epsilon-greedy strategy is used for selecting actions, setting an epsilon value smaller than 1, representing that random action exploration is carried out under a certain probability, generating a random number rand between 0 and 1, and selecting random actions if rand is smaller than epsilon; if the rand is greater than or equal to E, selecting the action with the highest Q value as the current action;
an exponential decay strategy is used for determining the value of epsilon, an initial epsilon value is selected when a model is built, and the epsilon value is updated by setting a decay factor beta:
∈ new =∈ old ×β
wherein ,∈new Is the updated epsilon value, epsilon old Is the last epsilon value;
the agent observes the current state obtained, selects an action to send to the environment according to the E-greedy strategy, and executes the action in the environment.
5. The multi-resource route optimization method based on high-dimensional data joint optimization according to claim 4, wherein the state cost function in the step S4 is y (S):
y(s)=E[G t |S t =s]
wherein ,St Indicating the state at time t, G t Indicating a desired cumulative return from time t, E indicating a desired operation;
calculating an action cost function:
using Q-learning algorithm to learn motion cost function, initializing motion cost function Q (s, a) to a random value, observing current state s in each time step t t According to the current state s t And action cost function Q select action a t Executing the selected action a t Interacting with the environment, collecting and storing the obtained experience data into an experience pool after the interaction, and obtaining a new state s t+1 And action cost function Q for calculating targetA value target;
target=r t+1 +γ×max(Q(s t+1 ,a t+1 ))
wherein ,rt+1 For rewarding, s t+1 In a new state, a t+1 For the optimal action of the new state, gamma is a discount factor for balancing the importance of the current rewards and future rewards;
updating the action cost function Q(s) t ,a t ):
Q(s t ,a t )=(1-α)×Q(s t ,a t )+α×target
And alpha is the learning rate, and the optimal action cost function is finally obtained by continuously and iteratively updating the Q value.
6. The multi-resource route optimization method based on high-dimensional data joint optimization according to claim 4 or 5, wherein the state action cost function is updated in step S4, and the difference between the predicted value and the target value is measured by using the loss function;
defining weight coefficients of four network resources including bandwidth, time delay, energy consumption and storage space, and applying the weight coefficients to KL divergence of each resource to obtain a loss function:
wherein ,pb (x)、p d (x)、p e (x)、p s (x) Respectively representing the probability of real resource distribution of bandwidth, time delay, energy consumption and storage space, q b (x)、q d (x)、q e (x)、q s (x) Respectively representing the probability, w, of model predicted resource distribution b 、w d 、w e 、w s Representing the weight coefficient of the corresponding resource.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310736074.0A CN116847425A (en) | 2023-06-20 | 2023-06-20 | Multi-resource route optimization method based on high-dimensional data joint optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310736074.0A CN116847425A (en) | 2023-06-20 | 2023-06-20 | Multi-resource route optimization method based on high-dimensional data joint optimization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116847425A true CN116847425A (en) | 2023-10-03 |
Family
ID=88171832
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310736074.0A Pending CN116847425A (en) | 2023-06-20 | 2023-06-20 | Multi-resource route optimization method based on high-dimensional data joint optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116847425A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117808627A (en) * | 2023-12-29 | 2024-04-02 | 光谷技术有限公司 | Energy consumption supervision method and related device |
-
2023
- 2023-06-20 CN CN202310736074.0A patent/CN116847425A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117808627A (en) * | 2023-12-29 | 2024-04-02 | 光谷技术有限公司 | Energy consumption supervision method and related device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113194034A (en) | Route optimization method and system based on graph neural network and deep reinforcement learning | |
CN108075975B (en) | Method and system for determining route transmission path in Internet of things environment | |
CN102299854B (en) | Opportunistic network environment-oriented multi-object routing decision making system | |
CN114697229A (en) | Construction method and application of distributed routing planning model | |
CN114500561B (en) | Power Internet of things network resource allocation decision-making method, system, equipment and medium | |
CN114710439B (en) | Network energy consumption and throughput joint optimization routing method based on deep reinforcement learning | |
CN116170327B (en) | Segmented routing network incremental deployment method based on graph neural network and reinforcement learning | |
Xu et al. | Evaluating and boosting reinforcement learning for intra-domain routing | |
CN113612692A (en) | Centralized optical on-chip network self-adaptive route planning method based on DQN algorithm | |
CN117014355A (en) | TSSDN dynamic route decision method based on DDPG deep reinforcement learning algorithm | |
CN116669068A (en) | GCN-based delay service end-to-end slice deployment method and system | |
CN116938810A (en) | Deep reinforcement learning SDN intelligent route optimization method based on graph neural network | |
CN116847425A (en) | Multi-resource route optimization method based on high-dimensional data joint optimization | |
CN116455820A (en) | Multi-transmission path adjustment system and method based on congestion avoidance | |
CN115473854A (en) | Intelligent flow control method for multi-mode network | |
Zhang et al. | A service migration method based on dynamic awareness in mobile edge computing | |
CN114448899A (en) | Method for balancing network load of data center | |
Rao et al. | A deep learning-based constrained intelligent routing method | |
Wei et al. | G-Routing: Graph Neural Networks-Based Flexible Online Routing | |
CN112423359A (en) | Energy-balanced workflow | |
CN116389347A (en) | Dynamic SDN route optimization algorithm based on reinforcement learning | |
CN117714307A (en) | Dynamic network route optimization method and system based on deep learning prediction | |
CN117478518A (en) | Map convolution neural network assisted big data storage forwarding scheduling method | |
CN116155805A (en) | Distributed intelligent routing method, system, electronic equipment and storage medium | |
CN111010704B (en) | Underwater wireless sensor network data prediction optimization method based on exponential smoothing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |