[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN117041132B - Distributed load balancing satellite routing method based on deep reinforcement learning - Google Patents

Distributed load balancing satellite routing method based on deep reinforcement learning Download PDF

Info

Publication number
CN117041132B
CN117041132B CN202311292725.8A CN202311292725A CN117041132B CN 117041132 B CN117041132 B CN 117041132B CN 202311292725 A CN202311292725 A CN 202311292725A CN 117041132 B CN117041132 B CN 117041132B
Authority
CN
China
Prior art keywords
satellite
satellite node
node
neighbor
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311292725.8A
Other languages
Chinese (zh)
Other versions
CN117041132A (en
Inventor
吕志成
岳洋
倪少杰
牟卫华
唐小妹
李宗楠
李蓬蓬
马春江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202311292725.8A priority Critical patent/CN117041132B/en
Publication of CN117041132A publication Critical patent/CN117041132A/en
Application granted granted Critical
Publication of CN117041132B publication Critical patent/CN117041132B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/08Learning-based routing, e.g. using neural networks or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/18578Satellite systems for providing broadband data service to individual earth stations
    • H04B7/18584Arrangements for data networking, i.e. for data packet routing, for congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/12Shortest path evaluation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Astronomy & Astrophysics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application relates to a distributed load balancing satellite routing method based on deep reinforcement learning, which comprises the steps of obtaining the queue occupancy rate and the queue state identification of a neighbor satellite node by sensing the data buffer queue load condition of the neighbor satellite node; on one hand, the geographic distance factors of the neighbor nodes and the destination nodes are combined with the topological distance factors to measure the proximity degree of the neighbor nodes and the destination nodes, the obtained topological-geographic distance factors can better reflect the proximity degree between the satellite nodes in the same hemisphere or different hemispheres and can be used as a driving factor for good satellite network routing to accelerate the convergence of a routing algorithm, so that communication data flows can be transmitted according to the path with the shortest transmission distance under the condition of low load in the routing forwarding process, and the nodes with high load can be timely bypassed, thereby reducing the congestion condition of the nodes as much as possible, realizing the load balance of data flow transmission, reducing the network packet loss rate and improving the throughput.

Description

Distributed load balancing satellite routing method based on deep reinforcement learning
Technical Field
The application relates to the technical field of wireless communication, in particular to a distributed load balancing satellite routing method based on deep reinforcement learning.
Background
In recent years, low-orbit satellite communication systems represented by "Iridium" (Iridium), "global star" (GlobalStar), "one-net" (OneWeb), and "star link" (Starlink) have become the dominant trend of new-generation satellite communication systems, and have driven the hot trend of the construction of global low-orbit satellite constellations. The low orbit satellite constellation has remarkable advantages in the aspects of satellite quantity, orbit height, space environment, signal intensity, batch development, rapid deployment and the like, and can possibly bring about great changes of satellite communication, navigation, remote sensing, investigation, monitoring, early warning detection and other technologies and influence the overall pattern of the space electronic information field. The satellite network taking the low orbit satellite constellation as the core gets rid of the physical limit that the traditional ground network needs to build stations, can realize global continuous coverage, and provides real-time communication service with the characteristics of low delay, high speed, high power, high flux and the like.
The low orbit Satellite communication system establishes Inter-Satellite Link (ISL) between satellites to form a space communication network with the satellites as switching nodes, so as to realize point-to-point communication between ground stations and satellites and ground users. The satellite network using the inter-satellite link networking can reduce the deployment number of ground gateway stations, reduce the complexity of a ground section system, and form a data relay system independent of the ground, thereby truly becoming an independent and important component in an air-ground integrated information network. For low orbit satellite communication networks comprising a large number of satellite nodes and inter-satellite links, the selection of routing strategies becomes a key to ensure that data flows are timely and accurately transmitted and submitted. Because of the communication data between users located in different geographic locations on the ground, between the users and the ground station, and between the ground station and the ground station, relay transmission is performed between satellite nodes via a satellite network based on inter-satellite link networking, and the routing strategy determines the path selection of the relay transmission of the data stream in the satellite network. Different from the ground network, each node in the satellite network is used as a route forwarding node, but is in continuous motion, the earth and the satellite constellation keep relative motion, and the ground network service request also has the condition of uneven geographic distribution, so that the traffic transmission of the satellite Internet has high dynamic property, and the phenomenon of uneven network traffic distribution is easy to occur. Because of factors such as dynamic property of the low orbit satellite network, imbalance of network bandwidth and storage resources, imbalance of network traffic load and the like, for the same pair of source nodes and destination nodes, communication quality of different transmission paths is quite different, so how to plan an efficient and reliable path for communication requests of each user becomes a key problem of satellite network routing technology research.
Disclosure of Invention
Based on the above technical problems, it is necessary to provide a distributed load balancing satellite routing method based on deep reinforcement learning, which aims to solve the technical problem of efficient and reliable inter-satellite link network routing design under the condition that the network traffic load is unbalanced in the low-orbit satellite communication system, and design a system capable of sensing the state of the low-orbit communication satellite network and the traffic load change condition in real time, optimizing and adjusting the signal propagation path, and realizing the load balancing and efficient transmission of the satellite network data traffic.
A distributed load balancing satellite routing method, the method comprising:
obtaining a current inter-satellite link according to the low-orbit satellite network topology snapshot of the current time slot; including a plurality of satellite nodes in an inter-satellite link; each satellite node corresponds to a routing decision agent;
in an inter-satellite link under a current time slot, calculating to obtain geographic distance factors of a neighbor satellite node and a target satellite node according to space coordinates of the neighbor satellite node and the target satellite node of the current satellite node, calculating to obtain topological distance factors of the neighbor satellite node and the target satellite node according to virtual topological addresses of the neighbor satellite node and the target satellite node of the current satellite node, and obtaining a topological-geographic distance factor according to the geographic distance factors and the corresponding topological distance factors; the virtual topology address comprises an orbit number of the satellite node and an in-orbit sequence number of the satellite node;
the queue state identifier, the queue occupancy rate and the topology-geographic distance factor of each neighbor satellite node and the target satellite node are used as observation state information under the current time slot and are input into a deep neural network of a routing decision agent corresponding to the current satellite node, and the state-action value of the current satellite node for forwarding the data stream to each neighbor satellite node is output;
and selecting the neighbor satellite node corresponding to the maximum state-action value as the next hop forwarding node of the current satellite node, and completing the data forwarding task of the current time slot.
According to the distributed load balancing satellite routing method based on deep reinforcement learning, on one hand, the queue occupancy rate and the queue state identification of the neighbor satellite nodes are obtained by sensing the data cache queue load condition of the neighbor satellite nodes; on one hand, the geographic distance factors and the topological distance factors of the neighbor nodes and the destination nodes are combined to measure the proximity degree of the neighbor nodes and the destination nodes, wherein the topological distance factors consider the track surface proximity degree and the in-track sequential proximity degree of 2 satellite nodes, so that the topological-geographic distance factors can better reflect the proximity degree between the satellite nodes in the same hemisphere or different hemispheres and can be used as driving factors for good satellite network routing to accelerate routing algorithm convergence, communication data flows can be transmitted according to the shortest transmission distance path under the condition of low load in the routing forwarding process, and nodes with high load can be timely bypassed, so that the node congestion condition is reduced as much as possible, the load balance of data flow transmission is realized, the network packet loss rate is reduced, and the throughput is improved.
Drawings
FIG. 1 is a schematic flow chart of a distributed load balancing satellite routing method based on deep reinforcement learning;
FIG. 2 is a schematic diagram of a geographic location-based "greedy" routing strategy;
FIG. 3 is a schematic diagram of a reinforcement learning technique framework;
FIG. 4 is a deep neural network architecture of a distributed load balancing satellite routing method based on deep reinforcement learning;
FIG. 5 is a system implementation architecture of a distributed load balancing satellite routing method based on deep reinforcement learning;
fig. 6 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In one embodiment, as shown in fig. 1, a distributed load balancing satellite routing method based on deep reinforcement learning is provided, which includes the following steps:
step 102, obtaining the current inter-satellite link according to the low orbit satellite network topology snapshot of the current time slot.
The inter-satellite link comprises a plurality of satellite nodes, and each satellite node corresponds to one routing decision agent.
The topology control method using the virtual topology strategy divides the low orbit satellite network topology into a plurality of time slots, the satellite network in each time slot can be regarded as static topology, namely, the connection relation between satellite nodes is not changed, and the static topology in each time slot is the inter-satellite link under the corresponding time slot.
104, in the inter-satellite link under the current time slot, calculating to obtain the geographic distance factors of the neighbor satellite node and the target satellite node according to the space coordinates of the neighbor satellite node and the target satellite node of the current satellite node, calculating to obtain the topological distance factors of the neighbor satellite node and the target satellite node according to the virtual topological addresses of the neighbor satellite node and the target satellite node of the current satellite node, and obtaining the topological-geographic distance factors according to the geographic distance factors and the corresponding topological distance factors.
The virtual topology address comprises the track number of the satellite node and the sequence number in the track.
As shown in fig. 2, a geographic location based "greedy" routing policy schematic is provided. In a low orbit satellite network, for a source nodeData stream generated by point S and destination node DIf a satellite node closer to the destination node in geospatial is selected for each hop forwarding, then +.>Will experience the least number of forwarding hops. In most cases, the shortest propagation distance path is included in the plurality of minimum hop paths, and the shortest propagation distance path and the minimum hop path are not greatly different in propagation delay. Therefore, the greedy routing strategy generated by continuously approaching the destination node from the geographic distance can quickly find the minimum hop path reaching the destination node D, so that the optimal time delay performance is realized.
As can be seen from fig. 2, the data flowAt node->And the destination node is +.>When selecting node +>Geographical location as next hop node will be closer to the destination node +.>. Therefore, in the scene of lighter network load, the greedy selection strategy based on the geographic position is favorable for quickly finding the transmission path with the minimum forwarding hop number, and the data flow is realized>The end-to-end delay experienced is minimized. In addition to calculating the proximity between satellite nodes using spatial coordinates, the transmission distance between satellite nodes can also be measured by grid coordinates due to the grid characteristics of the low orbit satellite network topology. SatelliteThe meshing of the network can model the spherical space network topology into grid coordinates of a two-dimensional plane, and the approach degree between two satellite nodes can be measured through the topology coordinates. The application therefore proposes a metric method combining geographical distance and topological distance to characterize the proximity between two nodes for driving route calculation, i.e. topological-geographical distance factor (Topologic and Geographic Distance Factor, TGDF). The topological and geographic distance factors TGDP consist of two parts, a topological distance factor (Topologic Distance Factor, TDF) and a geographic distance factor (Geographic Distance Factor, GDF). In topology time slot->Inside, satellite node->Three-dimensional geographic coordinates and virtual topological addresses of (a) are respectivelyAnd->Wherein->For track number>Sequentially numbered within the track.
The advantage of using TGDF values instead of solely TDF or GDF is: the TGDF can well reflect the proximity degree between satellite nodes in the same hemisphere or different hemispheres, so the TGDF can be used as a driving factor for good satellite network routing to accelerate the convergence of a routing algorithm.
And 106, taking the queue state identifier, the queue occupancy rate and the topology-geographic distance factor of each neighbor satellite node and the target satellite node as the observation state information under the current time slot, inputting the observation state information into the deep neural network of the routing decision agent corresponding to the current satellite node, and outputting the state-action value of the current satellite node for forwarding the data stream to each neighbor satellite node.
In a traditional satellite network communication scene based on TCP/IP, most congestion control in a transmission layer protocol is 'passive', namely, other nodes of the network are informed of adopting congestion control strategies only after congestion and packet loss occur to the nodes, so that the data sending rate of the congested nodes is reduced, and serious hysteresis exists in congestion control, so that the congestion control cannot be predicted in advance and the path is avoided when the nodes are about to congestion and packet loss.
In order to timely sense the traffic load condition of each node of the network and thus predict the queue time delay in the node, the data flow is causedThe method can comprehensively measure the propagation delay and the queue delay when each hop of forwarding is carried out, and the satellite node can timely sense the queue load condition of surrounding nodes and the distance to a destination node through state information exchange among the nodes. The queue load condition comprises a queue state identifier and a queue occupancy rate, and the distance to the destination node is a topology-geographic distance factor.
And step 108, selecting the neighbor satellite node corresponding to the maximum state-action value as the next hop forwarding node of the current satellite node, and completing the data forwarding task of the current time slot.
In the distributed load balancing satellite routing method based on deep reinforcement learning, on one hand, the queue occupancy rate and the queue state identification of the neighbor satellite nodes are obtained by sensing the data buffer queue load condition of the neighbor satellite nodes; on one hand, the geographic distance factors and the topological distance factors of the neighbor nodes and the destination nodes are combined to measure the proximity degree of the neighbor nodes and the destination nodes, wherein the topological distance factors consider the track surface proximity degree and the in-track sequential proximity degree of 2 satellite nodes, so that the topological-geographic distance factors can better reflect the proximity degree between the satellite nodes in the same hemisphere or different hemispheres and can be used as driving factors for good satellite network routing to accelerate routing algorithm convergence, communication data flows can be transmitted according to the shortest transmission distance path under the condition of low load in the routing forwarding process, and nodes with high load can be timely bypassed, so that the node congestion condition is reduced as much as possible, the load balance of data flow transmission is realized, the network packet loss rate is reduced, and the throughput is improved.
In one embodiment, a HELLO/ACK detection response message mechanism is adopted to enable the satellite nodes to exchange state information through inter-satellite links; the status information includes: the IP address, virtual topology address, space coordinates, queue occupancy, and time stamp of the satellite node.
Satellite nodeReceiving neighboring satellite nodes via inter-satellite link>The transmitted HELLO notification message is analyzed to contain the node +.>State information of (2) while satellite node->Also->And sending an ACK response message containing the self state information. The HELLO/ACK message contains information such as an IP address, a virtual topology address, space coordinates, a queue occupancy rate, and a timestamp of the transmitting node.
In a low-orbit satellite network, the transmission priority of HELLO/ACK messages in the satellite network is higher than that of service data packets, so that each satellite node can timely receive state information from surrounding nodes, the distributed routing decision is guaranteed to have good instantaneity, and the high-dynamic satellite network environment can be rapidly dealt with. To ensure real-time link state update, the satellite nodes will update at time intervalsPeriodically to neighbor nodesAnd sending an HELLO notification message to ensure that the neighbor node can grasp the queue and link state information in real time to assist in routing decision. When the satellite node sends HELLO notification message to the neighbor node, the monitoring period is set up>If the monitoring period is->Receiving an ACK response message fed back by the neighbor node, wherein the ACK response message indicates that communication can be established with the neighbor node, and discarding an outdated HELLO acknowledgement packet; otherwise, if the monitoring period is passed->And if the HELLO_ACK confirmation message fed back by the neighbor node is not received yet, the communication link with the neighbor node is considered to be interrupted, and at the moment, the link state is set to be disconnected in the running state table of the neighbor node maintained by the node, and the communication connection is not reestablished until the HELLO_ACK confirmation message fed back by the node is received. The neighbor node queue monitoring strategy based on the HELLO notification message can timely sense the running state of the neighbor node so as to estimate the queue time delay of the neighbor node, and provides sufficient routing decision information for the next hop forwarding of the data packet.
In one embodiment, the geographic distance factor of the neighboring satellite node and the target satellite node is calculated according to the spatial coordinates of the neighboring satellite node and the target satellite node of the current satellite node:
wherein,representing time slot->Lower->Neighbor satellite nodes of the individual satellite nodes, +.>Representing the target satellite node>Representing time slot->Lower->The spatial coordinates of the neighboring satellite nodes of the individual satellite nodes,representing the spatial coordinates of the target satellite node, +.>Representing time slot->Lower->Geographic distance factor of neighbor satellite node and target satellite node of individual satellite node, +.>Indicating the speed of light.
From the above equation, the propagation delay of the inter-satellite link between the GDF value and the node is equal, so that the geographic distance between the two nodes is intuitively reflected.
In one embodiment, the topology distance factor of the neighbor satellite node and the target satellite node is calculated according to the virtual topology addresses of the neighbor satellite node and the target satellite node of the current satellite node, and is:
wherein,representing time slot->Lower->The orbital numbers of the neighboring satellite nodes of the individual satellite nodes,representing time slot->Lower->In-orbit sequence numbering of neighbor satellite nodes of the individual satellite nodes,/->Indicates the location orbit number of the target satellite node, < +.>In-orbit sequence number representing the target satellite node, +.>Representing time slot->Lower->Topological distance factor of neighbor satellite node and target satellite node of individual satellite node, +.>Influence factor representing the track surface proximity on the topological distance factor, +.>Influence factor representing sequential proximity in orbit on topological distance factor, +.>Representing the number of satellites in each orbit.
As can be seen from the above description,the smaller the node->And->The closer the track surface is, the smaller the TDF value is, reflecting +.>And->The closer; />Representing node->And->Minimum number of hops in north-south direction, when ∈>And->The closer the in-track sequence is, the smaller the TDF value is, also indicating +.>And->The smaller the distance between them.
In one embodiment, the topology-to-geographic distance factor is derived from the geographic distance factor and the corresponding topological distance factor as:
wherein,representing time slot->Lower->Topology-geographical distance factor of neighbor satellite node and target satellite node of individual satellite node, +.>Representing the influence factor of the topological distance factor on the topological-geographical distance factor,/>The impact factor of the geographical distance factor on the topology-geographical distance factor is represented.
In one embodiment, the queue status identifier, the queue occupancy rate, and the topology-geographical distance factor of each neighboring satellite node and the target satellite node are used as the observation status information in the current time slot and are input into the deep neural network of the routing decision agent corresponding to the current satellite node.
Obtaining a pre-constructed reward function; the expression of the reward function is:
wherein,representing time slot->Lower->Bonus function of individual satellite nodes,/->Representing time slot->Lower->Queue status identification of neighbor satellite nodes of the individual satellite nodes,/-for>Representing time slot->Lower->The occupancy of the queues of the neighboring satellite nodes of the individual satellite nodes.
The rewarding function is used as a 'baton' for driving the intelligent agent to correct the action strategy in reinforcement learning, and for the routing decision intelligent agent, a certain adjacent link point is selected as a next hop forwarding node, and the obtained rewarding of the action of the next hop node directly reflects the potential end-to-end time delay performance of the next hop node to the destination node. The TGDF between the selected next-hop node and the destination node and the queue occupancy QOR of the next-hop node are taken into a reward function for comprehensively evaluating the end-to-end delay performance of the node. In->The next-hop node selected by the routing decision agent is known by the rewarding function expression, and when the next-hop node is the destination node, the agent obtains great positive rewards, which indicates that the transmission task is completed; when the next hop node is not the destination node, the intelligent agent obtains negative rewards, and the negative rewards in the route transmission process can excite the routing decision of the intelligent agent to approach the destination node as soon as possible; queue status identification +.>When the node is in an idle state, the reward function is only related to TGDF between the next hop node and the destination node, and the smaller the TGDF is, the closer the TGDF is to the destination node, so that the obtained negative reward is smaller; queue status identifier of next hop node +.>When the node is in a medium load state, the rewarding function starts to consider the queue occupancy QOR of the node; queue status identifier of next hop node +.>And when the node is in a busy state, the influence weight of the queue occupancy rate of the next-hop node on the reward function is further improved, and the queue delay of the next-hop node plays a dominant role in the end-to-end delay performance.
And obtaining the state-action value of the current satellite node for forwarding the data stream to each neighbor satellite node according to the observed state information and the reward function.
In one embodiment, the step of constructing the deep neural network of routing decision agents comprises:
constructing a first deep neural network and a second deep neural network; the network structures of the first deep neural network and the second deep neural network are the same; the first deep neural network is used as a main network and used for estimating state-action value in real time; the second deep neural network serves as a target network for calculating target state-action values and providing the target state-action values to the first deep neural network for updating network parameters.
The state-action value refers to the optimal state-action value objective function Q value based on the Belman equation, and the calculation formula is as follows:
wherein the method comprises the steps ofIs the target Q value, +.>For rewarding function->To estimate->Network (S)>For the next state of observation, the next time,for the purpose of->Network (S)>To estimate->Parameters of the network->For the purpose of->Parameters of the network.
One DNN is a target Q value network, the other DNN is an estimated Q value network, parameters of the estimated Q value network are all assigned to the target Q value network every time the estimated Q value network is trained, the network used for determining the next hop forwarding node is the estimated Q value network, the target Q network is used for updating parameters of the estimated Q value network more reasonably, and overestimation of the estimated Q value by the estimated Q value network in the training process is prevented.
The method adopts a classical Deep reinforcement learning Double-DQN (Double Deep Q-Network) algorithm to realize a satellite Network routing algorithm based on TGDF driving. The Double-DQN algorithm combines a deep neural network (Deep Neural Networks, DNN) with a reinforced Learning Q-Learning algorithm, and the Q value of the state-action cost function is fitted by using strong nonlinear fitting capacity of the DNN, so that the problem that the required storage space of a Q value table which records the state-action cost function is increased sharply along with the increase of the state space in the Q-Learning algorithm, and the algorithm convergence rate is obviously reduced is solved.
Reinforcement learning is one of the branches of machine learning, mainly used to solve the problem of continuous decision, and can learn how to achieve a set goal in a complex, uncertain environment. As a learning algorithm of 'reward and punishment', reinforcement learning does not depend on priori knowledge of a system, an accurate mathematical model does not need to be established, the problem of inaccurate model design is avoided, and parameters of the algorithm are trained by using reward and punishment signals given by the environment. The basic principle of reinforcement learning is: if the decision-making body receives a positive reward signal (strengthening signal) of the environment after performing a certain action, the tendency of the decision-making body to perform the current action will be strengthened; conversely, the decision body's tendency to perform this action is reduced, and the reinforcement learning framework is shown in FIG. 3. The framework of reinforcement learning mainly consists of five parts, namely Agent (Agent), environment (environment), state (Action), action (Action) and Reward (report). The intelligent agent is a core of reinforcement learning and has the ability of sensing and learning; the state is the parameter form of the intelligent agent in the current environment; the action is an execution operation that is subject to the agent; the environment is the object explored by the agent; rewards are judgments on decision making behaviors of the agent in the current state according to information provided by the environment. After the agent performs a certain action, the environment will be changed to a new state, and a reward signal (positive reward or negative reward) will be given to the new state environment, the agent's strategy is updated according to a certain learning rate, and a new action is performed, and the process is repeated until the agent learns the decision strategy that can obtain the maximum reward gain. The process is a mode that the intelligent agent and the environment interact through states, actions and rewards.
First, five elements of the reinforcement learning model framework in the TGDF-DRL method are determined:
(1) An agent. The distributed thought is adopted, routing decision agents are deployed inside each satellite node, the satellite nodes continuously acquire environmental state information according to a HELLO notification message mechanism, when the satellite nodes need to transmit data streams, the Double-DQN network inside the nodes takes the state information as input and then outputs Q values of all actions taken in the state, and the nodes transmit the data streams to the next hop according to the actions selected by the greedy method.
(2) And (5) routing the environment. The routing environment is the topology state geometry of the whole low orbit satellite network, and the data flowIs located at satellite node->Data stream->The environmental information perceived by the next hop routing comes only from +.>Neighbor node geometry->
(3) State space. At the position ofTime of day, data relay node->State information from neighboring nodes will be observed from the routing environment. Since the overall goal of the routing algorithm is to transmit data with the smallest possible number of hops to minimize propagation delay and queuing delay, the neighbor nodes are assembled ∈ ->To the destination node->TGDF incorporation of (F)Status information to measure +.>The distance that the neighbor node reaches the destination node indicates that the node is positioned in +.>To the destination node->The greater the probability of the smallest end-to-end delay path between. However, TGDF can only measure the potential propagation delay of the data stream, and in order to measure the queue delay of the neighbor node so as to more comprehensively evaluate the end-to-end delay performance of the next hop node, the neighbor node is assembled->Queue occupancy +.>Queue status identifier +.>Status information is included.
Thus, for the followingTime satellite node set->Set of destination nodes->The state space is defined as:
since the routing decision agent is deployed on each low-orbit satellite, for a single satellite nodeThe observed state information is:
satellite nodeObserved state->Finite dimensional vector for conciseness and intuition, and due to state information and current node + ->Neighbor node->Destination node->The self information is irrelevant, and is only used for describing the potential end-to-end delay performance of the task target of ' the data packet at the current node reaches the destination node ', so that the network parameter convergence is irrelevant to the specific combination of the current node and the destination node ', and the routing strategy has good convergence performance and generalization for the whole network node.
(4) An action space. At the position ofTime of day, satellite node->The data stream needs to be->To the next node, so the action space is set to node +.>Is>I.e. action space->
(5) A bonus function.
After modeling a satellite routing scene and reinforcement learning elements, the interior of a satellite node is required to store environment information observed by deep neural network DNN learning, and the correlation between the environment information and a next hop forwarding decision is established according to a reward function rule. The satellite node routing decision agent based on Double-DQN needs to store two DNNs with the same network structure, wherein one DNN is used as a main network MainNet for estimating the Q value in real time, and the other DNN is used as a target network TargetNet for calculating the target Q value and providing a parameter update target for the main network MainNet. For both the primary network and the target network, the network inputs are observed statesThe method comprises the steps of including TGDF of the adjacent node, queue occupancy rate QOR and queue state identification QSF, and outputting the data as the current stateStatus of next actions-action value +.>The deep neural network structure of the method is shown in fig. 4.
After the router feels that the agent finishes training, the next hop forwarding node can be output through the input observation state, and the data packet is routed and forwarded. At the position ofTime of day, satellite node->Firstly, obtaining neighbor node ++according to the resolved HELLO message>Geographic coordinates, topological coordinates, queue occupancy, and queue status identification for the object to be processedTransmitted data stream->Resolving corresponding destination node->Is then node +.>The internal processor calculates +.>And will、/>And->As the input of the routing decision agent, the routing decision agent outputs the state-action cost function Q value corresponding to the node of the next hop selected under the current observation, node +.>According to->Greedy method selects data stream ++>And sending the data to the next hop forwarding node, thereby completing the data forwarding task at the current moment.
As shown in fig. 5, a system implementation architecture of the present method is provided. Where TTL is an abbreviation for Time To Live, which specifies the maximum number of segments allowed To pass before an IP packet is dropped by the router. The method takes the deep reinforcement learning agent as a routing decision core, and the routing decision agent gives a next hop forwarding node by sensing the load condition of a data buffer queue of a neighbor node and combining the approach degree between the neighbor node and a destination node, so that a communication data stream can be transmitted according to the path with the shortest transmission distance under the condition of low load in the routing forwarding process, and the node with high load can be timely bypassed, thereby reducing the node congestion condition as much as possible, realizing the load balance of data flow transmission, reducing the network packet loss rate and improving the throughput.
It should be understood that, although the steps in the flowchart of fig. 1 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of other steps or sub-steps of other steps.
In one embodiment, a distributed load balancing satellite routing apparatus based on deep reinforcement learning is provided, comprising: the system comprises a topology snapshot acquisition module, a distance factor calculation module, a state-action value calculation module and a data forwarding module, wherein:
the topology snapshot acquisition module is used for acquiring a current inter-satellite link according to the low-orbit satellite network topology snapshot of the current time slot; including a plurality of satellite nodes in an inter-satellite link; each satellite node corresponds to a routing decision agent;
the distance factor calculation module is used for calculating geographic distance factors of the neighbor satellite nodes and the target satellite nodes according to the space coordinates of the neighbor satellite nodes and the target satellite nodes of the current satellite node in the inter-satellite link under the current time slot, calculating the topological distance factors of the neighbor satellite nodes and the target satellite nodes according to the virtual topological addresses of the neighbor satellite nodes and the target satellite nodes of the current satellite node, and obtaining the topological-geographic distance factors according to the geographic distance factors and the corresponding topological distance factors; the virtual topology address comprises an orbit number of the satellite node and an in-orbit sequence number of the satellite node;
the state-action value calculation module is used for taking a queue state identifier, a queue occupancy rate and a topology-geographic distance factor of each neighbor satellite node and a target satellite node as observation state information under the current time slot, inputting the observation state information into a deep neural network of a routing decision agent corresponding to the current satellite node, and outputting the state-action value of the current satellite node for forwarding the data stream to each neighbor satellite node;
and the data forwarding module is used for selecting the neighbor satellite node corresponding to the maximum state-action value as the next hop forwarding node of the current satellite node and completing the data forwarding task of the current time slot.
Specific limitations regarding the deep reinforcement learning-based distributed load balancing satellite routing apparatus may be found in the above description of the deep reinforcement learning-based distributed load balancing satellite routing method, and will not be described herein. The various modules in the distributed load balancing satellite routing apparatus described above may be implemented in whole or in part in software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing satellite node position, queue load condition and other data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by a processor implements a distributed load balancing satellite routing method.
It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment a computer device is provided comprising a memory storing a computer program and a processor implementing the steps of the method of the above embodiments when the computer program is executed.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method of the above embodiments.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (7)

1. A distributed load balancing satellite routing method based on deep reinforcement learning, the method comprising:
obtaining a current inter-satellite link according to the low-orbit satellite network topology snapshot of the current time slot; including a plurality of satellite nodes in an inter-satellite link; each satellite node corresponds to a routing decision agent;
in an inter-satellite link under a current time slot, calculating to obtain geographic distance factors of a neighbor satellite node and a target satellite node according to space coordinates of the neighbor satellite node and the target satellite node of the current satellite node, calculating to obtain topological distance factors of the neighbor satellite node and the target satellite node according to virtual topological addresses of the neighbor satellite node and the target satellite node of the current satellite node, and obtaining a topological-geographic distance factor according to the geographic distance factors and the corresponding topological distance factors; the virtual topology address comprises an orbit number of the satellite node and an in-orbit sequence number of the satellite node;
the queue state identifier, the queue occupancy rate and the topology-geographic distance factor of each neighbor satellite node and the target satellite node are used as observation state information under the current time slot and are input into a deep neural network of a routing decision agent corresponding to the current satellite node, and the state-action value of the current satellite node for forwarding the data stream to each neighbor satellite node is output;
and selecting the neighbor satellite node corresponding to the maximum state-action value as the next hop forwarding node of the current satellite node, and completing the data forwarding task of the current time slot.
2. The method of claim 1, wherein calculating the geographic distance factor of the neighboring satellite node and the target satellite node from the spatial coordinates of the neighboring satellite node and the target satellite node of the current satellite node comprises:
the geographic distance factors of the neighbor satellite node and the target satellite node are obtained by calculation according to the space coordinates of the neighbor satellite node and the target satellite node of the current satellite node:
wherein,representing time slot->Lower->Neighbor satellite nodes of the individual satellite nodes, +.>Which represents the point of view of the target satellite node,representing time slot->Lower->The spatial coordinates of the neighboring satellite nodes of the individual satellite nodes,representing the spatial coordinates of the target satellite node, +.>Representing time slot->Lower->Geographic distance factor of neighbor satellite node and target satellite node of individual satellite node, +.>Indicating the speed of light.
3. The method of claim 2, wherein calculating the topology distance factor of the neighboring satellite node and the target satellite node from the virtual topology addresses of the neighboring satellite node and the target satellite node of the current satellite node comprises:
the topological distance factors of the neighbor satellite nodes and the target satellite nodes are obtained by calculation according to the virtual topological addresses of the neighbor satellite nodes and the target satellite nodes of the current satellite node, and are as follows:
wherein,representing time slot->Lower->The track number of the neighbor satellite node of the individual satellite node,/->Representing time slot->Lower->In-orbit sequence numbering of neighbor satellite nodes of the individual satellite nodes,/->Indicates the location orbit number of the target satellite node, < +.>In-orbit sequence number representing the target satellite node, +.>Representing time slot->Lower->Topological distance factor of neighbor satellite node and target satellite node of individual satellite node, +.>Influence factor representing the track surface proximity on the topological distance factor, +.>Influence factor representing sequential proximity in orbit on topological distance factor, +.>Representing the number of satellites in each orbit.
4. A method according to claim 3, wherein deriving the topology-to-geographical distance factor from the geographical distance factor and the corresponding topology distance factor comprises:
obtaining a topology-geographic distance factor according to the geographic distance factor and the corresponding topological distance factor, wherein the topology-geographic distance factor is as follows:
wherein,representing time slot->Lower->Topology-geographical distance factor of neighbor satellite node and target satellite node of individual satellite node, +.>Representing the influence factor of the topological distance factor on the topological-geographical distance factor,/>The impact factor of the geographical distance factor on the topology-geographical distance factor is represented.
5. The method of claim 4, wherein the step of constructing the deep neural network of routing decision agents comprises:
constructing a first deep neural network and a second deep neural network; the network structures of the first deep neural network and the second deep neural network are the same; the first deep neural network is used as a main network and used for estimating state-action value in real time; the second deep neural network serves as a target network for calculating target state-action values and providing the target state-action values to the first deep neural network for updating network parameters.
6. The method of claim 5, wherein taking the queue status identifier, the queue occupancy, and the topology-geographical distance factor of each neighboring satellite node and the target satellite node as the observation status information in the current time slot and inputting the observation status information into the deep neural network of the routing decision agent corresponding to the current satellite node, outputting the status-action value of the current satellite node for forwarding the data stream to each neighboring satellite node, and the method comprises:
the queue state identifier, the queue occupancy rate and the topology-geographic distance factor of each neighbor satellite node and the target satellite node are used as the observation state information under the current time slot and are input into the deep neural network of the routing decision agent corresponding to the current satellite node;
obtaining a pre-constructed reward function; the expression of the reward function is:
wherein,representing time slot->Lower->Bonus function of individual satellite nodes,/->Representing time slot->Lower->Queue status identification of neighbor satellite nodes of the individual satellite nodes,/-for>Representing time slot->Lower->Queue occupancy of neighbor satellite nodes of the individual satellite nodes;
and obtaining the state-action value of the current satellite node for forwarding the data stream to each neighbor satellite node according to the observed state information and the reward function.
7. The method according to claim 1, wherein the method further comprises:
adopting a HELLO/ACK detection response message mechanism to enable the satellite nodes to exchange state information through inter-satellite links; the status information includes: the IP address, virtual topology address, space coordinates, queue occupancy, and time stamp of the satellite node.
CN202311292725.8A 2023-10-08 2023-10-08 Distributed load balancing satellite routing method based on deep reinforcement learning Active CN117041132B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311292725.8A CN117041132B (en) 2023-10-08 2023-10-08 Distributed load balancing satellite routing method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311292725.8A CN117041132B (en) 2023-10-08 2023-10-08 Distributed load balancing satellite routing method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN117041132A CN117041132A (en) 2023-11-10
CN117041132B true CN117041132B (en) 2023-12-08

Family

ID=88635848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311292725.8A Active CN117041132B (en) 2023-10-08 2023-10-08 Distributed load balancing satellite routing method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN117041132B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117395188B (en) * 2023-12-07 2024-03-12 南京信息工程大学 Deep reinforcement learning-based heaven-earth integrated load balancing routing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110012516A (en) * 2019-03-28 2019-07-12 北京邮电大学 A kind of low orbit satellite routing policy method based on deeply study framework
CN111416655A (en) * 2020-04-07 2020-07-14 南京邮电大学 Low-orbit satellite routing improvement method based on virtual topology
CN114567365A (en) * 2022-02-16 2022-05-31 北京电子科技学院 Routing method and system for low-earth-orbit satellite network load balancing
CN116390164A (en) * 2023-04-11 2023-07-04 西安电子科技大学 Low orbit satellite network trusted load balancing routing method, system, equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110012516A (en) * 2019-03-28 2019-07-12 北京邮电大学 A kind of low orbit satellite routing policy method based on deeply study framework
CN111416655A (en) * 2020-04-07 2020-07-14 南京邮电大学 Low-orbit satellite routing improvement method based on virtual topology
CN114567365A (en) * 2022-02-16 2022-05-31 北京电子科技学院 Routing method and system for low-earth-orbit satellite network load balancing
CN116390164A (en) * 2023-04-11 2023-07-04 西安电子科技大学 Low orbit satellite network trusted load balancing routing method, system, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于树突神经网络的低轨卫星智能感知路由算法;刘洋 等;《工程科学学报》;第45卷(第3期);全文 *

Also Published As

Publication number Publication date
CN117041132A (en) 2023-11-10

Similar Documents

Publication Publication Date Title
US9191304B1 (en) Reinforcement learning-based distributed network routing method utilizing integrated tracking and selective sweeping
Kumar et al. Collaborative learning automata-based routing for rescue operations in dense urban regions using vehicular sensor networks
US20050190717A1 (en) MANET routing based on best estimate of expected position
Dudukovich et al. A machine learning concept for DTN routing
WO2021209801A1 (en) Predicting wireless quality of service (qos) for connected vehicles
Du et al. A routing protocol for UAV-assisted vehicular delay tolerant networks
CN117041132B (en) Distributed load balancing satellite routing method based on deep reinforcement learning
Mahajan et al. Adaptive routing in wireless mesh networks using hybrid reinforcement learning algorithm
Tropea et al. Reactive flooding versus link state routing for FANET in precision agriculture
CN113507722A (en) Implementation method of NS 3-based platform for controlling congestion of low-orbit satellite
Madoery et al. Routing heterogeneous traffic in delay-tolerant satellite networks
Cigliano et al. A Machine Learning approach for routing in satellite Mega-Constellations
Zhou et al. Adaptive Routing Strategy Based on Improved Double Q‐Learning for Satellite Internet of Things
Liang et al. The effect of routing under local information using a social insect metaphor
Desai et al. Cooperative reinforcement learning approach for routing in ad hoc networks
Blose et al. Scalable Hybrid Switching-Driven Software Defined Networking Issue: From the Perspective of Reinforcement Learning
Alliche et al. Impact evaluation of control signalling onto distributed learning-based packet routing
Lent Adaptive DTN routing: A neuromorphic networking perspective
Vendramin et al. A Greedy Ant Colony Optimization for routing in delay tolerant networks
Lent Evaluation of cognitive routing for the interplanetary internet
Sun et al. MAMRL: Exploiting Multi-agent Meta Reinforcement Learning in WAN Traffic Engineering
Kruthi et al. Reliable wireless sensor network using ant colony optimization (ACO)
III Building routing overlays in disrupted networks: inferring contacts in challenged sensor internetworks
Ouferhat et al. QoS dynamic routing for wireless sensor networks
Zhang et al. Flowlet-Level Routing Optimization with GNN-Based Multi-Agent Deep Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant