CN118282968B - Method for shielding outdated congestion feedback in network of data center under lossless network - Google Patents
Method for shielding outdated congestion feedback in network of data center under lossless network Download PDFInfo
- Publication number
- CN118282968B CN118282968B CN202410689076.3A CN202410689076A CN118282968B CN 118282968 B CN118282968 B CN 118282968B CN 202410689076 A CN202410689076 A CN 202410689076A CN 118282968 B CN118282968 B CN 118282968B
- Authority
- CN
- China
- Prior art keywords
- cnp
- outdated
- network
- source switch
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000005540 biological transmission Effects 0.000 claims abstract description 35
- 230000000903 blocking effect Effects 0.000 claims description 8
- 230000033228 biological regulation Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/26—Flow control; Congestion control using explicit feedback to the source, e.g. choke packets
- H04L47/263—Rate modification at the source after receiving feedback
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/12—Avoiding congestion; Recovering from congestion
- H04L47/125—Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The embodiment of the invention provides a method for shielding a network in case of outdated congestion feedback under a lossless network of a data center, which belongs to the technical field of data processing and specifically comprises the following steps: step 1, in the data stream transmission process in a data center lossless network, when a link is congested, an exchanger marks a data packet with an ECN mark, and when a receiving end host detects the ECN mark, the receiving end host sends a CNP mark to a source exchanger to inform a sending end of rate adjustment; step 2, the lossless network of the data center performs load balancing, and the data packet is switched from the old path to the new path; and step 3, distinguishing the outdated CNPs, and adjusting the rate by the transmitting end according to the CNPs. By the scheme of the invention, after the paths are switched, the outdated CNP is distinguished, and unnecessary outdated CNP is selectively discarded at the source switch, so that the error influence of the outdated CNP on the flow rate adjustment of the new path is reduced, and the accuracy of the host end speed adjustment and the lossless network transmission performance of the data center are improved.
Description
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a network shielding method for outdated congestion feedback under a lossless network of a data center.
Background
Currently, data centers provide infrastructure services for massive traditional and new applications, including delay-sensitive services such as web search, online recommendation systems, instant messaging, and the like, and computationally intensive services such as high-performance computing, distributed machine learning, and the like. The above applications require that the data center network provide high bandwidth and low latency guarantees. However, due to the ever-increasing memory and CPU/GPU device performance and the ever-increasing scale of distributed AI training, data center networks have become the performance bottleneck for distributed applications in recent years.
For RDMA (Remote Direct Memory Access, remote direct data access) based lossless networks deployed in HPC (High Performance Computing, high-performance computing), intelligent computing center, etc., if conventional load balancing techniques deployed in lossy networks are directly migrated to lossless networks, there are a number of performance problems, such as packet reordering, congestion spreading, etc. Aiming at the multi-path RDMA transmission performance optimization of a data center, researchers provide a series of mechanisms in the aspects of transmission control, load balancing and the like, and explore and improve the network transmission performance.
However, the load balancing method deployed in the lossless network of the data center is the same as that deployed in the lossy network of the data center, for example RoCEv, the outdated CNP (Congestion Notification Packets, congestion notification packet) caused by load balancing may have an error effect on the data flow rate adjustment of the new path. For example, when load balancing is performed, the RDMA data flow is switched from a congestion path to a path with good transmission state and no congestion, CNP generated by congestion of an old path is sent to a sender host as usual, so that the sender host misuses an old path congestion feedback signal to adjust the sending rate of a new path, and RDMA transmission performance of the new path is inhibited. In addition, RDMA transmission control is realized in a hardware network card at a host end, so that great difficulty exists in modifying and redeploying network protocols of all server hardware network cards, and the storage and calculation resources of the network card at the host end are limited.
Therefore, there is a need for a method for blocking the network in the data center, which can improve the accuracy of the speed adjustment at the host end and the transmission performance of the lossless network of the data center.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a method for shielding the outdated congestion feedback in the lossless network of a data center, which at least partially solves the problems of the prior art that the accuracy of the speed adjustment of a host end and the lossless network transmission performance of the data center are poor.
The embodiment of the invention provides a network shielding method for outdated congestion feedback under a lossless network of a data center, which comprises the following steps:
Step 1, in the data stream transmission process in a data center lossless network, when a link is congested, an exchanger marks a data packet with an ECN mark, and when a receiving end host detects the ECN mark, the receiving end host sends a CNP mark to a source exchanger to inform a sending end of rate adjustment;
Step2, the lossless network of the data center performs load balancing, and the data packet is switched from the old path to the new path;
step 3, distinguishing the outdated CNPs, and adjusting the rate by the transmitting end according to the CNPs;
The step 3 specifically includes:
Judging whether network transmission delay can be accurately monitored, if so, carrying out rate adjustment according to the time point reached by the CNP, and if not, constructing a time record table according to the notification and reply modes and carrying out rate adjustment;
The step of rate adjustment according to the time point reached by the CNP comprises the following steps:
Recording the point in time of CNP arrival by the source exchange The fastest arrival time point of the new path generation CNP;
Comparing time pointsAnd point in timeIf the time pointAt the point of timeBefore, judging that the CNP is an outdated CNP generated on an old path, and selectively discarding the CNP by a source switch;
If the time point At the point of timeThen, judging that the CNP is the CNP uploaded by the new path, transmitting the CNP to a transmitting end by a source switch, and adjusting the rate by the transmitting end according to the CNP;
the step of constructing a time record table according to the notification and reply modes and adjusting the speed comprises the following steps:
when the data flow generates path switching, the source switch immediately sends a path switching notification message to the destination switch;
the destination switch receives the path switching notification message sent by the source switch and then sends a path switching reply message to the source switch;
And taking the CNP which arrives before the source switch inquires the state of the received path switching reply message as an outdated CNP, counting the number of outdated CNPs, calculating the average value of the arrival number of the outdated CNPs, updating the outdated CNP to a data table, calculating outdated CNP discarding probability by the source switch according to the average value of the arrival number of the outdated CNPs, performing discarding operation on the outdated CNP according to the comparison of the outdated CNP discarding probability and a threshold value, taking the CNP which arrives after the source switch inquires the state of the received path switching reply message as a normal CNP, sending the CNP to a sending end by the source switch, and performing rate adjustment by the sending end according to the CNP.
According to a specific implementation manner of the embodiment of the present invention, the expression of the outdated CNP arrival number average value is
Wherein,For the number of outdated CNPs that actually arrive,Is the weight between the actual number of arrivals of the stale CNP and the average.
According to a specific implementation manner of the embodiment of the invention, the outdated CNP discarding probabilityThe expression of (2) is
Wherein,For the maximum drop probability of an outdated CNP,Representing a threshold.
According to a specific implementation manner of the embodiment of the present invention, the step of performing the discarding operation on the outdated CNP according to the comparison between the outdated CNP discarding probability and the threshold value includes:
When the outdated CNP number does not reach the threshold When using the discarding probabilityDiscard outdated CNPs and discard probabilitiesThe number of CNP arrivals increases linearly with time, and the number of CNP reaches a thresholdAll will be discarded to smooth out the rate regulation at the transmitting end.
The network-in-screen shielding scheme for outdated congestion feedback under a lossless network of a data center in the embodiment of the invention comprises the following steps: step 1, in the data stream transmission process in a data center lossless network, when a link is congested, an exchanger marks a data packet with an ECN mark, and when a receiving end host detects the ECN mark, the receiving end host sends a CNP mark to a source exchanger to inform a sending end of rate adjustment; step 2, the lossless network of the data center performs load balancing, and the data packet is switched from the old path to the new path; step 3, distinguishing the outdated CNPs, and adjusting the rate by the transmitting end according to the CNPs; the step 3 specifically includes: judging whether network transmission delay can be accurately monitored, if so, carrying out rate adjustment according to the time point reached by the CNP, and if not, constructing a time record table according to the notification and reply modes and carrying out rate adjustment; the step of rate adjustment according to the time point reached by the CNP comprises the following steps: recording the point in time of CNP arrival by the source exchangeThe fastest arrival time point of the new path generation CNP; Comparing time pointsAnd point in timeIf the time pointAt the point of timeBefore, judging that the CNP is an outdated CNP generated on an old path, and selectively discarding the CNP by a source switch; if the time pointAt the point of timeThen, judging that the CNP is the CNP uploaded by the new path, transmitting the CNP to a transmitting end by a source switch, and adjusting the rate by the transmitting end according to the CNP; the step of constructing a time record table according to the notification and reply modes and adjusting the speed comprises the following steps: when the data flow generates path switching, the source switch immediately sends a path switching notification message to the destination switch; the destination switch receives the path switching notification message sent by the source switch and then sends a path switching reply message to the source switch; and taking the CNP which arrives before the source switch inquires the state of the received path switching reply message as an outdated CNP, counting the number of outdated CNPs, calculating the average value of the arrival number of the outdated CNPs, updating the outdated CNP to a data table, calculating outdated CNP discarding probability by the source switch according to the average value of the arrival number of the outdated CNPs, performing discarding operation on the outdated CNP according to the comparison of the outdated CNP discarding probability and a threshold value, taking the CNP which arrives after the source switch inquires the state of the received path switching reply message as a normal CNP, sending the CNP to a sending end by the source switch, and performing rate adjustment by the sending end according to the CNP.
The embodiment of the invention has the beneficial effects that: according to the scheme of the invention, the method for shielding the outdated congestion feedback in the data center under the lossless network is provided for solving the problem that the outdated CNP generated by the old path can influence the flow speed adjustment of the sending end in the new path when the load is balanced and the paths are switched. After the paths are switched, the time point that the outdated CNP reaches the source switch and the time point that the new path generates the CNP and the fastest arrival of the CNP reach the source switch or the state of the path switching reply message in the source switch are compared, the unnecessary outdated CNP is selectively discarded in the source switch, the error influence of the outdated CNP on the flow rate adjustment of the new path is reduced, the lossless network link bandwidth utilization rate is favorably improved, the RDMA transmission delay is reduced, the network protocol of a host end is not required to be modified, and the accuracy of the host end speed adjustment and the lossless network transmission performance of a data center are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow diagram of a method for blocking out obsolete congestion feedback in a lossless network of a data center according to an embodiment of the present invention;
Fig. 2 is a schematic diagram of functional structure of a method for blocking out obsolete congestion feedback in a lossless network of a data center according to an embodiment of the present invention;
Fig. 3 is a schematic diagram of a specific architecture of a method for blocking out outdated congestion feedback under a lossless network of a data center according to an embodiment of the present invention;
Fig. 4 is a schematic diagram of a time axis for distinguishing an outdated CNP under the condition that network transmission delay cannot be accurately monitored according to the embodiment of the present invention;
Fig. 5 is a schematic diagram of discarding probability of an outdated CNP according to an embodiment of the present invention;
fig. 6 is a test data diagram of a network shielding method for obsolete congestion feedback under a lossless network of a data center according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present disclosure, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should also be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided in order to provide a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
The embodiment of the invention provides a network shielding method for outdated congestion feedback under a lossless network of a data center, which can be applied to the data transmission process of an Internet scene.
Referring to fig. 1, a flow chart of a method for blocking out obsolete congestion feedback under a lossless network of a data center according to an embodiment of the present invention is shown. As shown in fig. 1, the method mainly comprises the following steps:
Step 1, in the data stream transmission process in a data center lossless network, when a link is congested, an exchanger marks a data packet with an ECN mark, and when a receiving end host detects the ECN mark, the receiving end host sends a CNP mark to a source exchanger to inform a sending end of rate adjustment;
In a specific implementation, as shown in fig. 2, a functional structural schematic diagram provided by an embodiment of the present invention is shown, and it is assumed that a network topology adopts a Leaf-Spine network commonly used in a data center, and a switch with a programmable network data plane is used, for each data stream entering the network, the switch is divided into a source switch S1, an intermediate switch on two different paths L1 and L2, and a destination switch S2 according to a data stream transmission direction, and a network example further includes a sender host H1 and a receiver host H2. The data flow f1 shown in fig. 2 is sent from the sender host H1, forwarded through the source switch S1, and the path for starting forwarding is L1, and switched to a new path L2 through load balancing. The switch may give the slave a specific threshold value exceeded during the transmission of the data stream f1 The starting data packets are marked with congestion (Explicit Congestion Notification, abbreviated as ECN), and after the data packets with ECN marks reach the receiving end host, the information is converted into CNP and is fed back to the sending end host H1 periodically.
In the data stream transmission process in the lossless network of the data center, when a link is congested, the switch marks the ECN for the data packet, and when the receiving end host detects that the data packet carries the ECN, the receiving end host sends a CNP mark to the source switch to inform the sending end of rate adjustment.
Step2, the lossless network of the data center performs load balancing, and the data packet is switched from the old path to the new path;
In specific implementation, the lossless network of the data center is at the moment Load balancing is performed, and the data flow is switched from the original old path to the new path.
And step 3, distinguishing the outdated CNPs, and adjusting the rate by the transmitting end according to the CNPs.
On the basis of the above embodiment, the step 3 specifically includes:
judging whether the network transmission delay can be accurately monitored, if so, carrying out rate adjustment according to the time point reached by the CNP, and if not, constructing a time record table according to the notification and reply modes and carrying out rate adjustment.
Further, the step of performing rate adjustment according to the time point reached by CNP includes:
Recording the point in time of CNP arrival by the source exchange The fastest arrival time point of the new path generation CNP;
Comparing time pointsAnd point in timeIf the time pointAt the point of timeBefore, judging that the CNP is an outdated CNP generated on an old path, and selectively discarding the CNP by a source switch;
If the time point At the point of timeAnd then, judging that the CNP is the CNP uploaded by the new path, transmitting the CNP to a transmitting end by a source switch, and adjusting the rate by the transmitting end according to the CNP.
Further, the step of constructing a time record table according to the notification and reply modes and performing rate adjustment comprises the following steps:
when the data flow generates path switching, the source switch immediately sends a path switching notification message to the destination switch;
the destination switch receives the path switching notification message sent by the source switch and then sends a path switching reply message to the source switch;
And taking the CNP which arrives before the source switch inquires the state of the received path switching reply message as an outdated CNP, counting the number of outdated CNPs, calculating the average value of the arrival number of the outdated CNPs, updating the outdated CNP to a data table, calculating outdated CNP discarding probability by the source switch according to the average value of the arrival number of the outdated CNPs, performing discarding operation on the outdated CNP according to the comparison of the outdated CNP discarding probability and a threshold value, taking the CNP which arrives after the source switch inquires the state of the received path switching reply message as a normal CNP, sending the CNP to a sending end by the source switch, and performing rate adjustment by the sending end according to the CNP.
Further, the expression of the average value of the number of the outdated CNP arrival is
Wherein,For the number of outdated CNPs that actually arrive,Is the weight between the actual number of arrivals of the stale CNP and the average.
Further, the outdated CNP discard probabilityThe expression of (2) is
Wherein,For the maximum drop probability of an outdated CNP,Representing a threshold.
Further, the step of performing a discarding operation on the outdated CNP according to the comparison of the outdated CNP discarding probability with the threshold value includes:
When the outdated CNP number does not reach the threshold When using the discarding probabilityDiscard outdated CNPs and discard probabilitiesThe number of CNP arrivals increases linearly with time, and the number of CNP reaches a thresholdAll will be discarded to smooth out the rate regulation at the transmitting end.
In specific implementation, in order to avoid the error effect of the over-time CNP on the adjustment of the sending end speed, the outdated CNP is distinguished according to two different situations that the network transmission delay can be accurately monitored and the network transmission delay cannot be accurately monitored, as shown in fig. 3, where (a) represents a framework diagram under the situation that the network transmission delay can be accurately monitored, the system is deployed in a source switch, and a main module of the system is a CNP detection module. The CNP detection module is responsible for time recording and distinguishing outdated CNPs, (b) shows a framework diagram under the condition that network transmission delay cannot be monitored accurately, the system is mainly deployed in a switch, the main system module is a CNP monitoring module, the CNP monitoring module is responsible for time recording (state recording) and distinguishing outdated CNPs, and the outdated CNPs are assisted to be distinguished by using a path switching notification message generated by a source switch and a path switching reply message generated by a destination switch.
Case one: in the case of being able to accurately monitor network transmission delay.
A. Recording the point in time of CNP arrival by the source exchangeThe fastest arrival time point of the new path generation CNP;
B. Comparing the time points of CNP arrivalAnd the fastest arrival time point of the new path generation CNP;
C. The arrival time point of the new path generation CNP must be atAfter that, the process is performed. Thus, if the time pointAt the point of timeBefore, the CNP is an outdated CNP generated on the old path, and the source switch selectively discards the CNP; if the time pointAt the point of timeAnd then, the CNP is the CNP uploaded by the new path, the source switch sends the CNP to the sending end, and the sending end adjusts the proper rate according to the CNP.
For example, in the case of being able to accurately monitor network transmission delay, the rate adjustment procedure may be as follows:
A. the source switch builds a time record table time (flowID, ) Wherein flowID represents a data stream number;
B. Recording the point in time of CNP arrival by the source exchange Simultaneously recording the fastest time point of arrival of the data stream CNP on the new pathAnd saves it in a time record table time (flowID,) A kind of electronic deviceIn (a) and (b);
C. comparing time points And point in time. Outdated CNPs must be at the point in timeBefore reaching, in short, if the point in timeAt the point of timePreviously, considering the CNP as an outdated CNP, counting the number of CNPs, calculating the average value of the number of outdated CNPs by using a formula (1), and updating the average value of the number of outdated CNPs to CountP and avgCNP in a data table data (PathID, countP, avgCNP, T), wherein PathID is a path number, countP is the number of arrived outdated CNPs, avgCNP is the average value of the number of outdated CNPs, and T is an intermediate threshold value, and selectively discarding the CNP by using a formula (2) according to the data table data, as shown in fig. 5; if the time pointAt the point of timeThen, the CNP is considered as the CNP generated on the new path, the source switch forwards the CNP to the transmitting end, and the transmitting end adjusts the proper rate according to the CNP.
And a second case: in the case that the network transmission delay cannot be accurately monitored.
A. When the data flow generates path switching, the source switch immediately sends a path switching notification message to the destination switch;
B. The destination switch receives the path switching notification message sent by the source switch and then sends a path switching reply message to the source switch;
C. The CNP generated by the new path must be after the source switch receives the path switch reply message. Therefore, it is considered that the CNP that arrives before the source switch does not query the state of the received path switch reply message is an outdated CNP, and the source switch will selectively discard the CNP. As shown in the time axis of fig. 4, the data stream is in The path is switched over at the moment,Indicating the time at which the path switch reply message arrives at the source switch,Indicating the moment of arrival of an outdated CNP, i.e. atThe CNP that arrived before at the source switch is considered to be an outdated CNP that is generated on the old path, which is selectively discarded by the source switch; Indicating the moment of arrival of a normal CNP, i.e. at The CNP that then arrives at the source switch is considered to be the CNP on the new path, which is the normal CNP. For a normal CNP, the source switch sends the CNP to the sender, and the sender adjusts the appropriate rate according to the CNP.
The optional discarding of the stale CNP in the above case is used to prevent the congestion control algorithm from oversubscribing the flow rate on the new path. Considering that under DCQCN protocol the sender decreases the flow rate according to CNP and increases the flow rate according to timer and byte counter, discarding the outdated CNP all will result in too fast flow rate increase and discarding the outdated CNP all will result in too low flow rate. Therefore, referring to RED algorithm, firstly, calculating the average value of the number of CNP arrival at the time of the outdatedI.e.
Wherein,For the number of outdated CNPs that actually arrive,Is the weight between the actual number of arrivals of the stale CNP and the average. At the same time set a threshold valueAnd calculate the CNP discard probability whenThe method comprises the following steps:
Wherein, Is the maximum drop probability of the outdated CNP. As shown in fig. 5, when the outdated CNP number does not reach the thresholdWhen using the discarding probabilityDiscard outdated CNPs and discard probabilitiesThe number of CNP arrivals increases linearly with time, and the number of CNP reaches a thresholdAll will be discarded to smooth out the rate regulation at the transmitting end.
For example, in the case where the network transmission delay cannot be accurately monitored, the rate adjustment procedure may be as follows:
A. the source switch builds a time record table time according to the advertisement-reply mode (flowID, Notice_status, reply_status), wherein the flowID represents the data stream number,Denoted as the time (updatable) when the source switch receives the reply_status, the Notice_status indicates whether the source switch sends a path switch advertisement message (0 indicates not sent, 1 indicates sent, initially set to 0), and the reply_status indicates whether the source switch receives a path switch Reply message from the destination switch (0 indicates not received, 1 indicates received, initially set to 0). After the data flow switches paths, the notification-reply mode adds special mark information called Notify to the tail of the first data packet on the new path based on the INT technology of the programmable switch, so as to generate an notification message, and the notification message is used as a path switching notification message, sent from a source switch to a destination switch, and marked as a sent path switching notification message in a time record table time, namely 1;
B. After receiving the data packet carrying the special mark information of Notify sent by the source switch, the destination switch uses ingress mirroring technology to copy the data packet and modify its source destination IP address, takes the original destination IP address as the source IP address, takes the original source IP address as the destination IP address, removes the payload load part and adds the special mark information of Reply to the congestion Reply data packet, thereby generating a Reply type message sent to the source switch as a path switching Reply message. The path switching notification message and the path switching reply message form a notification and reply mode after path switching, which is called a notification-reply mode;
C. after receiving the path switching reply message sent by the destination switch, the source switch records the time point And store it in the time of the time record tableAnd simultaneously marking the reply_status in the time of the time record table as the received path switching Reply message, namely 1. If there is a CNP arriving at the source switch before reply_status is not 1, then consider the CNP as an outdated CNP, count the number of CNPs, calculate the outdated CNP number average using equation (1), update it to CountP and avgCNP in data table data (PathID, countP, avgCNP, T), where PathID is the path number, countP is the outdated CNP number reached, avgCNP is the outdated CNP number average, T is an intermediate threshold, and according to data table data, the source switch selectively discards the CNP using equation (2), as shown in FIG. 5. Otherwise, the CNP is considered as the CNP generated on the new path, the source switch forwards the CNP to the transmitting end, and the transmitting end adjusts the proper rate according to the CNP.
The method of the embodiment of the invention designs a network shielding mechanism of outdated congestion feedback under a lossless network of a data center based on a programmable switch of a network data plane, so as to solve the problem of error influence of outdated CNP (common network protocol) generated by an old path on flow speed adjustment of a transmitting end during RDMA (remote data transfer) load balancing, optimize the existing lossless network load balancing scheme based on DCQCN, and further improve transmission performance. The embodiment of the invention tests the nondestructive network environment of the large-scale data center simulated by NS3, simulates 8x8 Leaf-Spine network topology and 128 server clusters (each Leaf switch is connected with 16 servers), constructs the nondestructive network environment based on DCQCN, and proves that the scheme of the invention has the practicability; and as shown in fig. 6, the present scheme (i.e., cnpMask) reduces the traffic average completion time and tail delay by about 10% compared to not handling outdated congestion feedback under real test load and high load strength of 80% by comparison with the representative lossless network load balancing scheme conweave.
According to the method for blocking the outdated congestion feedback in the lossless network of the data center, which is provided by the embodiment, by orienting to the lossless network based on DCQCN, aiming at the problem that the flow speed adjustment of a new path of a sending end can be influenced by outdated CNP generated by an old path when the paths are switched in a load balancing manner, the method for blocking the outdated congestion feedback in the lossless network of the data center is provided. After the path is switched, comparing the time point that the outdated CNP reaches the source switch with the time point that the new path generates the CNP and the fastest reaches the source switch or the state of the path switching reply message in the source switch, selectively discarding unnecessary outdated CNP in the source switch, reducing the error influence of the outdated CNP on the flow rate adjustment of the new path, being beneficial to improving the bandwidth utilization rate of a lossless network link and reducing RDMA transmission delay without modifying a host network protocol.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.
Claims (4)
1. A method for blocking out outdated congestion feedback in a data center lossless network, comprising:
Step 1, in the data stream transmission process in a data center lossless network, when a link is congested, an exchanger marks a data packet with an ECN mark, and when a receiving end host detects the ECN mark, the receiving end host sends a CNP mark to a source exchanger to inform a sending end of rate adjustment;
Step2, the lossless network of the data center performs load balancing, and the data packet is switched from the old path to the new path;
step 3, distinguishing the outdated CNPs, and adjusting the rate by the transmitting end according to the CNPs;
The step 3 specifically includes:
Judging whether network transmission delay can be accurately monitored, if so, carrying out rate adjustment according to the time point reached by the CNP, and if not, constructing a time record table according to the notification and reply modes and carrying out rate adjustment;
The step of rate adjustment according to the time point reached by the CNP comprises the following steps:
Recording the point in time of CNP arrival by the source exchange The fastest arrival time point of the new path generation CNP;
Comparing time pointsAnd point in timeIf the time pointAt the point of timeBefore, judging that the CNP is an outdated CNP generated on an old path, and selectively discarding the CNP by a source switch;
If the time point At the point of timeThen, judging that the CNP is the CNP uploaded by the new path, transmitting the CNP to a transmitting end by a source switch, and adjusting the rate by the transmitting end according to the CNP;
the step of constructing a time record table according to the notification and reply modes and adjusting the speed comprises the following steps:
when the data flow generates path switching, the source switch immediately sends a path switching notification message to the destination switch;
the destination switch receives the path switching notification message sent by the source switch and then sends a path switching reply message to the source switch;
And taking the CNP which arrives before the source switch receives the path switching reply message state as an outdated CNP, counting the outdated CNP quantity, calculating the outdated CNP arrival quantity average value, updating the outdated CNP arrival quantity average value to a data table, calculating outdated CNP discarding probability by the source switch according to the outdated CNP arrival quantity average value, performing discarding operation on the outdated CNP according to the comparison of the outdated CNP discarding probability and a threshold value, taking the CNP which arrives after the source switch receives the path switching reply message state as a normal CNP, sending the CNP to a sending end by the source switch, and carrying out rate adjustment by the sending end according to the CNP.
2. The method of claim 1 wherein the expression of the outdated CNP arrival number average is
Wherein,For the number of outdated CNPs that actually arrive,Is the weight between the actual number of arrivals of the stale CNP and the average.
3. The method of claim 2, wherein the outdated CNP discard probabilityThe expression of (2) is
Wherein,For the maximum drop probability of an outdated CNP,Representing a threshold.
4. The method of claim 3, wherein the step of performing a discard operation on the stale CNP based on a comparison of the stale CNP discard probability to a threshold comprises:
When the outdated CNP number does not reach the threshold When using the discarding probabilityDiscard outdated CNPs and discard probabilitiesThe number of CNP arrivals increases linearly with time, and the number of CNP reaches a thresholdAll will be discarded to smooth out the rate regulation at the transmitting end.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410689076.3A CN118282968B (en) | 2024-05-30 | 2024-05-30 | Method for shielding outdated congestion feedback in network of data center under lossless network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410689076.3A CN118282968B (en) | 2024-05-30 | 2024-05-30 | Method for shielding outdated congestion feedback in network of data center under lossless network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118282968A CN118282968A (en) | 2024-07-02 |
CN118282968B true CN118282968B (en) | 2024-08-06 |
Family
ID=91643648
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410689076.3A Active CN118282968B (en) | 2024-05-30 | 2024-05-30 | Method for shielding outdated congestion feedback in network of data center under lossless network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118282968B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110351196A (en) * | 2018-04-02 | 2019-10-18 | 华中科技大学 | Load-balancing method and system based on accurate congestion feedback in cloud data center |
CN114679408A (en) * | 2022-05-27 | 2022-06-28 | 湖南工商大学 | Path switching-aware data center congestion control method and system |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130024999A (en) * | 2011-08-26 | 2013-03-11 | 목포대학교산학협력단 | Tcp performance improvement method in mobile ip packet buffering method for marine telematics |
US12074799B2 (en) * | 2020-03-04 | 2024-08-27 | Intel Corporation | Improving end-to-end congestion reaction using adaptive routing and congestion-hint based throttling for IP-routed datacenter networks |
CN115883492B (en) * | 2022-11-18 | 2024-07-09 | 浪潮思科网络科技有限公司 | RoCE-SAN lossless storage network fault convergence method under MLAG environment |
CN116915706B (en) * | 2023-09-13 | 2023-12-26 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Data center network congestion control method, device, equipment and storage medium |
CN117768392A (en) * | 2023-12-15 | 2024-03-26 | 重庆邮电大学 | PTP synchronous message transmission optimization method under network congestion scene |
-
2024
- 2024-05-30 CN CN202410689076.3A patent/CN118282968B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110351196A (en) * | 2018-04-02 | 2019-10-18 | 华中科技大学 | Load-balancing method and system based on accurate congestion feedback in cloud data center |
CN114679408A (en) * | 2022-05-27 | 2022-06-28 | 湖南工商大学 | Path switching-aware data center congestion control method and system |
Also Published As
Publication number | Publication date |
---|---|
CN118282968A (en) | 2024-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12074799B2 (en) | Improving end-to-end congestion reaction using adaptive routing and congestion-hint based throttling for IP-routed datacenter networks | |
US20220303217A1 (en) | Data Forwarding Method, Data Buffering Method, Apparatus, and Related Device | |
US7065086B2 (en) | Method and system for efficient layer 3-layer 7 routing of internet protocol (“IP”) fragments | |
EP3780542B1 (en) | Data transmission method and device | |
JP2004533184A (en) | Adaptive control of data packet size in networks | |
US20070226375A1 (en) | Plug-in architecture for a network stack in an operating system | |
CN113472697A (en) | Network information transmission system | |
Chen et al. | Mp-rdma: enabling rdma with multi-path transport in datacenters | |
CN111935031B (en) | NDN architecture-based traffic optimization method and system | |
US20070291782A1 (en) | Acknowledgement filtering | |
Shukla et al. | TCP PLATO: Packet labelling to alleviate time-out | |
CN111224888A (en) | Method for sending message and message forwarding equipment | |
WO2020073907A1 (en) | Method and apparatus for updating forwarding entry | |
CN117135117A (en) | Near-end control-based cross-data center RDMA network congestion control method and device | |
Dong et al. | Low-cost datacenter load balancing with multipath transport and top-of-rack switches | |
CN118282968B (en) | Method for shielding outdated congestion feedback in network of data center under lossless network | |
CN113438182B (en) | Credit-based flow control system and flow control method | |
WO2023226633A1 (en) | Fault processing method, and related device and system | |
CN117354253A (en) | Network congestion notification method, device and storage medium | |
Ruan et al. | PTCP: A priority-based transport control protocol for timeout mitigation in commodity data center | |
CN114679408A (en) | Path switching-aware data center congestion control method and system | |
CN114828081A (en) | Cooperative hybrid congestion control method based on path recovery | |
Patel et al. | Congestion control techniques in networking | |
CN116346726B (en) | Host load balancing method for self-adaptive burst traffic | |
WO2024125098A1 (en) | Data transmission method and apparatus, and device and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |