[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2023005335A1 - 一种报文传输方法及相关装置 - Google Patents

一种报文传输方法及相关装置 Download PDF

Info

Publication number
WO2023005335A1
WO2023005335A1 PCT/CN2022/091979 CN2022091979W WO2023005335A1 WO 2023005335 A1 WO2023005335 A1 WO 2023005335A1 CN 2022091979 W CN2022091979 W CN 2022091979W WO 2023005335 A1 WO2023005335 A1 WO 2023005335A1
Authority
WO
WIPO (PCT)
Prior art keywords
rdma
message
network device
transmission identifier
forwarding path
Prior art date
Application number
PCT/CN2022/091979
Other languages
English (en)
French (fr)
Inventor
冀智刚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to JP2024505160A priority Critical patent/JP2024527081A/ja
Priority to EP22847949.9A priority patent/EP4366266A4/en
Publication of WO2023005335A1 publication Critical patent/WO2023005335A1/zh
Priority to US18/423,689 priority patent/US20240214312A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/302Route determination based on requested QoS
    • H04L45/306Route determination based on the nature of the carried application
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/38Flow based routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/66Layer 2 routing, e.g. in Ethernet based MAN's
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/113Arrangements for redundant switching, e.g. using parallel planes
    • H04L49/118Address processing within a device, e.g. using internal ID or tags for routing within a switch
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/35Switches specially adapted for specific applications
    • H04L49/356Switches specially adapted for specific applications for storage area networks
    • H04L49/357Fibre channel switches
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/35Switches specially adapted for specific applications
    • H04L49/356Switches specially adapted for specific applications for storage area networks
    • H04L49/358Infiniband Switches
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/06Notations for structuring of protocol data, e.g. abstract syntax notation one [ASN.1]

Definitions

  • the present application relates to the technical field of communications, and in particular to a message transmission method and related devices.
  • TCP/IP Transmission Control Protocol/Internet Protocol
  • RDMA Remote Direct Memory Access
  • the transmission of RDMA data is carried on the connection established by the application. Once the application establishes a connection between two devices, the RDMA data between the two devices will be transmitted in the network based on the same forwarding path. It is easy to cause uneven network load and limit the reasonable utilization of network bandwidth resources.
  • the present application provides a message transmission method, which marks the messages that need to be transmitted in order by carrying the transmission identifier in the message, and the network device selects the forwarding path of the message based on the transmission identifier in the message. While ensuring that packets that need to be transmitted in sequence can be forwarded through the same forwarding path, packets with different transmission identifiers can be forwarded through different forwarding paths, achieving network load balancing and improving the utilization of network bandwidth resources Rate.
  • the first aspect of the present application provides a message transmission method.
  • the first network device acquires an RDMA packet, where the RDMA packet includes a transmission identifier, where RDMA packets with the same transmission identifier need to be transmitted in sequence.
  • the RDMA packet acquired by the first network device is a packet for transmitting RDMA data, and the RDMA packet includes a transmission identifier and RDMA data.
  • the RDMA data is the data that the source server needs to send to the destination server.
  • the first network device determines a target forwarding path according to the transmission identifier and the destination address of the RDMA message.
  • the destination address of the RDMA message is used to determine the forwarding path that can be used to forward the RDMA message
  • the transmission identifier in the RDMA message is used Select the target forwarding path in the forwarding path.
  • the first network device forwards the RDMA message through the target forwarding path.
  • the packets that need to be transmitted in sequence are marked by carrying the transmission identifier in the RDMA packet, and the network device selects the forwarding path of the RDMA packet based on the transmission identifier in the RDMA packet. While ensuring that RDMA packets that need to be transmitted in sequence can be forwarded through the same forwarding path, RDMA packets with different transmission identifiers can be forwarded through different forwarding paths, achieving network load balancing and improving network bandwidth resources utilization rate.
  • the first network device determines multiple forwarding paths that can be used to forward the RDMA message according to the destination address of the RDMA message; the first network device determines from the Determine the target forwarding path among the multiple forwarding paths.
  • each forwarding path in the multiple forwarding paths is identified by an egress port in the first network device.
  • the first network device may search a routing table according to the destination address of the RDMA message, thereby determining multiple outgoing ports that can be used to forward the RDMA message, and the multiple outgoing ports correspond to the multiple forwarding paths respectively .
  • the first network device may select a target outbound port from multiple outbound ports according to the transmission identifier, and forward the RDMA packet through the target outbound port.
  • the first network device selects from the transmission identifier according to one or more of the source address, destination address, source port number, destination port number, and protocol number of the RDMA packet.
  • the target forwarding path is determined among multiple forwarding paths.
  • the first network device may use one or more of the transmission identifier and the source address, destination address, source port number, destination port number, and protocol number of the RDMA packet as a hash Hash factor, calculating a hash value of the multiple hash factors, so as to determine the target forwarding path from the multiple forwarding paths according to the calculated hash value.
  • the first network device sequentially acquires multiple RDMA packets, the multiple RDMA packets all include the transmission identifier, and the destination addresses of the multiple RDMA packets are the same; the first network The device sequentially forwards the multiple RDMA packets through the target forwarding path.
  • the transmission identifier is located in the RDMA header of the RDMA packet.
  • the RDMA header may refer to a standard header of the RDMA packet, for example, a Basic Transport Header (Base Transport Header, BTH) of the RDMA packet.
  • BTH Basic Transport Header
  • the transmission identifier is located in one or more reserved fields in the RDMA header.
  • the RDMA header includes one or more reserved fields, and the transmission identifier can be represented by values in the one or more reserved fields.
  • the transmission identifier is located in an extension field in the RDMA header.
  • the RDMA message includes a Converged Ethernet-based RDMA RoCE message, an iWARP message or an InfiniBand message with unlimited bandwidth.
  • the first network device when obtaining the remote memory direct access RDMA message, obtains a message sent by the application layer; the first network device generates at least one RDMA message according to the message, and the at least one RDMA The packets include the same transport identifier.
  • the first network device receives the RDMA packet sent by the second network device, and the second network device is an upstream device of the first network device.
  • the second aspect of the present application provides a network device, including: an acquisition unit, a processing unit, and a sending unit; the acquisition unit is configured to acquire an RDMA message, and the RDMA message includes a transmission identifier, wherein the The RDMA message needs to be transmitted in sequence; the processing unit is configured to determine a target forwarding path according to the transmission identifier and the destination address of the RDMA message; the sending unit is configured to forward the target forwarding path through the target forwarding path RDMA packets.
  • the processing unit is further configured to: determine multiple forwarding paths that can be used to forward the RDMA message according to the destination address of the RDMA message;
  • the target forwarding path is determined in the forwarding path.
  • the processing unit is specifically configured to, according to the transmission identifier, and one or more of the source address, destination address, source port number, destination port number, and protocol number of the RDMA packet, from The target forwarding path is determined among the multiple forwarding paths.
  • the obtaining unit is further configured to sequentially obtain multiple RDMA messages, the multiple RDMA messages all include the transmission identifier, and the destination addresses of the multiple RDMA messages are the same; the processing The unit is further configured to sequentially forward the multiple RDMA packets through the target forwarding path.
  • the transmission identifier is located in the RDMA header of the RDMA message.
  • the transmission identifier is located in one or more reserved fields in the RDMA header.
  • the transmission identifier is located in an extension field in the RDMA header.
  • the RDMA message includes a Converged Ethernet-based RDMA RoCE message, an iWARP message, or an InfiniBand message.
  • the obtaining unit is configured to obtain a message sent by the application layer; the processing unit is configured to generate at least one RDMA message according to the message, and the at least one RDMA message includes the same transmission identifier.
  • the acquiring unit is configured to receive the RDMA message sent by the second network device, where the second network device is an upstream device of the first network device.
  • a third aspect of the present application provides a network device, where the network device includes: a processor, configured to enable the network device to implement the method described in the foregoing first aspect or any possible implementation manner of the first aspect.
  • the device may further include a memory, and the memory is coupled to the processor. When the processor executes the instructions stored in the memory, the network device may implement the method described in any possible implementation manner of the foregoing first aspect.
  • the device may further include a communication interface, which is used for the device to communicate with other devices.
  • the communication interface may be a transceiver, a circuit, a bus, a module, or other types of communication interfaces.
  • Coupling in this application is an indirect coupling or connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
  • the fourth aspect of the present application provides a computer storage medium, which may be non-volatile; computer-readable instructions are stored in the computer storage medium, and the first aspect is realized when the computer-readable instructions are executed by a processor Or the method described in any possible implementation of the first aspect.
  • the fifth aspect of the present application provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method described in the first aspect or any possible implementation manner of the first aspect.
  • a sixth aspect of the present application provides a network system, and the network system includes a plurality of network devices according to the second aspect or the third aspect above.
  • FIG. 1 is a schematic diagram of an application architecture of a message transmission method provided in an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a message transmission method 200 provided in an embodiment of the present application.
  • FIG. 3 is a schematic diagram of the format of an RDMA message provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of the format of an RDMA header provided in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a source network device generating a message according to a message provided by an embodiment of the present application
  • FIG. 6 is a schematic diagram of a message transmission process provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of another message transmission process provided by the embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a network device 800 provided in an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a network device 900 provided in an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a network device 1000 provided by an embodiment of the present application.
  • the naming or numbering of the steps in this application does not mean that the steps in the method flow must be executed in the time/logic sequence indicated by the naming or numbering.
  • the execution order of the technical purpose is changed, as long as the same or similar technical effect can be achieved.
  • the division of units presented in this application is a logical division. In actual application, there may be other division methods. For example, multiple units can be combined or integrated in another system, or some features can be ignored. , or not, in addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, and the indirect coupling or communication connection between units may be electrical or other similar forms, this Applications are not limited.
  • the units or subunits described as separate components may or may not be physically separated, may or may not be physical units, or may be distributed into multiple circuit units, and some or all of them may be selected according to actual needs unit to realize the purpose of the application scheme.
  • RDMA enables the rapid movement of data from one system to the memory of a remote system over a network without any impact to the operating system. RDMA eliminates external memory copy and text exchange operations, thus freeing memory bandwidth and CPU cycles and improving the efficiency of data transfer. At present, applications in fields such as high-performance computing and artificial intelligence are widely using the RDMA transmission protocol to realize data transmission.
  • the transmission of RDMA data is carried on the connection established by the application. Once the application establishes a connection between two devices, the RDMA data between the two devices will be transmitted in the network based on the same forwarding path. It is easy to cause uneven network load and limit the reasonable utilization of network bandwidth resources.
  • the embodiment of the present application provides a message transmission method, which marks the messages that need to be transmitted in order by carrying the transmission identifier in the message, and the network device selects the forwarding of the message based on the transmission identifier in the message path. While ensuring that packets that need to be transmitted in sequence can be forwarded through the same forwarding path, packets with different transmission identifiers can be forwarded through different forwarding paths, achieving network load balancing and improving the utilization of network bandwidth resources Rate.
  • FIG. 1 is a schematic diagram of an application architecture of a message transmission method provided in an embodiment of the present application.
  • the application architecture includes a source network device, an intermediate network device 1-4, and a destination network device.
  • the source network device and the destination network device may be servers, and the intermediate network devices 1 - 4 may be physical devices such as routers, switches, or gateways.
  • the source network device needs to send an RDMA message to the destination network device, and the RDMA message passes through an intermediate network device between the source network device and the destination network device.
  • the network device responsible for forwarding the RDMA message selects the forwarding path of the RDMA message based on the transmission identifier in the RDMA message, so as to realize the forwarding path based on the same forwarding path. to forward RDMA packets with the same transport ID.
  • FIG. 2 is a schematic flowchart of a packet transmission method 200 provided in an embodiment of the present application. As shown in FIG. 2, the packet transmission method 200 includes the following steps 201-203.
  • the first network device acquires an RDMA packet, where the RDMA packet includes a transmission identifier, wherein RDMA packets with the same transmission identifier need to be transmitted sequentially.
  • the RDMA packet acquired by the first network device is a packet for transmitting RDMA data
  • the RDMA packet includes a transmission identifier and RDMA data.
  • the RDMA data is the data that the source server needs to send to the destination server.
  • the destination server accesses the data on the source server through the RDMA technology
  • the source server sends the RDMA data requested by the destination server to the destination server.
  • the transmission identifier may also be referred to as a transaction (transaction) identifier.
  • the first network device may be, for example, a server.
  • the application layer in the first network device sends a message (message) to the transport layer in the first network device, and the message includes RDMA data sent by the server.
  • the first network device After obtaining the message sent by the application layer, the first network device generates at least one RDMA message according to the message, and the at least one RDMA message includes the same transmission identifier.
  • the RDMA data in the same message sent by the application layer may be divided into one or more by the first network device RDMA packets for transmission.
  • the MTU refers to the size of the largest data packet that can pass through the network.
  • the first network device When the data volume of the message obtained by the first network device is greater than the MTU, the first network device generates multiple RDMA packets according to the message, and the multiple RDMA packets respectively carry part of the RDMA data in the message; when When the data volume of the message acquired by the first network device is less than or equal to the MTU, the first network device generates an RDMA packet according to the message, and the RDMA packet carries all the RDMA data in the message.
  • the first network device When the first network device generates multiple RDMA packets according to one message sent by the application layer, the first network device adds the same transmission identifier to the multiple RDMA packets. Since the RDMA data included in the multiple RDMA messages generated by the first network device belong to the same message, the multiple RDMA messages need to be transmitted in order to ensure that the destination server can receive the complete RDMA data in the message. Therefore, the first network device adds the same transmission identifier to the multiple RDMA packets, so as to ensure that the multiple RDMA packets can be transmitted in sequence.
  • the first network device may assign a transmission identifier to the message, and the multiple RDMAs generated by the first network device based on the message all include the transmission identifier corresponding to the message, thereby ensuring that the multiple RDMAs corresponding to the message RDMA packets include the same transmission identifier.
  • the first network device When the first network device acquires multiple messages sent by the application layer, and the multiple messages are indicated to be transmitted sequentially, the first network device assigns the same transmission identifier to the multiple messages. In this way, when the first network device generates multiple RDMA packets according to the multiple messages, the multiple RDMA packets corresponding to the multiple messages generated by the first network device include the same transmission identifier. For example, when the application layer sends multiple messages to the first network device, and the RDMA data in the multiple messages are interrelated, the application layer indicates that the multiple messages need to be transmitted in order.
  • the first network device may be, for example, a physical device such as a router, a switch, or a gateway, or may be a virtual device that supports packet forwarding.
  • the manner in which the first network device obtains the RDMA message may be: the first network device receives the RDMA message sent by the second network device, and the second network device is the upstream of the first network device equipment.
  • the second network device may be, for example, a physical device such as a source server, a router, a switch, or a gateway, or may be a virtual device that supports packet forwarding.
  • both the first network device and the second network device are network devices on the forwarding path of the RDMA message, and the second network device forwards the RDMA message to the first network device, and the second network device forwards the RDMA message to the first network device, and the second network device A network device continues to forward the RDMA packet to the next-hop device of the first network device.
  • the RDMA message described in this embodiment may include an RDMA (RDMA over Converged Ethernet, RoCE) message, an iWARP message or an InfiniBand message based on Converged Ethernet.
  • the RoCE message is a message transmitted based on the RoCE protocol;
  • the iWARP message is a message transmitted based on the iWARP protocol;
  • the InfiniBand message is a message transmitted based on the InfiniBand protocol.
  • the RoCE protocol is a network protocol that allows RDMA to be performed through Ethernet
  • the iWARP protocol is a network protocol that allows RDMA to be performed through Transmission Control Protocol (TCP)
  • TCP Transmission Control Protocol
  • the iWARP protocol is actually an Internet wide area RDMA protocol
  • InfiniBand protocol is a new generation network protocol that supports RDMA.
  • the RDMA message described in this embodiment may also be a message transmitted based on other protocols, and this embodiment does not limit the specific form of the RDMA message.
  • Step 202 the first network device determines a target forwarding path according to the transmission identifier and the destination address of the RDMA message.
  • the first network device determines the target forwarding of the RDMA message based on the transmission identifier and the destination address in the RDMA message path.
  • the destination address of the RDMA message is used to determine the forwarding path that can be used to forward the RDMA message
  • the transmission identifier in the RDMA message is used Select the target forwarding path in the forwarding path.
  • the first network device may determine multiple forwarding paths that can be used to forward the RDMA message according to the destination address of the RDMA message. Wherein, each forwarding path in the multiple forwarding paths is identified by an egress port in the first network device.
  • the first network device may search a routing table according to the destination address of the RDMA message, thereby determining multiple outgoing ports that can be used to forward the RDMA message, and the multiple outgoing ports correspond to the multiple forwarding paths respectively .
  • the first network device determines the target forwarding path from the multiple forwarding paths according to the transmission identifier. That is, the first network device selects a target outbound port from multiple outbound ports according to the transmission identifier, and forwards the RDMA packet through the target outbound port.
  • the first network device may use the transmission identifier as a hash factor, calculate a hash value of the transmission identifier, and select from the multiple forwarding paths according to the hash value of the transmission identifier Determine the target forwarding path.
  • the first network device may use one or more of the transmission identifier and the source address, destination address, source port number, destination port number, and protocol number of the RDMA packet as a hash Hash factor, calculating a hash value of the multiple hash factors, so as to determine the target forwarding path from the multiple forwarding paths according to the calculated hash value.
  • the first network device may use the transmission identifier and the source address and destination address of the RDMA message as hash factors to calculate the hash value of these three hash factors, so that according to the calculated hash The hash value determines the target forwarding path.
  • the first network device may use the transmission identifier and the source address, destination address, source port number, destination port number, and protocol number of the RDMA message as hash factors to calculate the six hashes The hash value of the factor, so as to determine the target forwarding path according to the calculated hash value.
  • Step 203 the first network device forwards the RDMA message through the target forwarding path.
  • the first network device After the first network device determines the target forwarding path, the first network device forwards the RDMA packet according to the determined target forwarding path.
  • the target forwarding path is identified by a target outbound port
  • the first network device forwards the RDMA message through the target outbound port, so as to forward the RDMA message to the The next hop device of the first network device.
  • the RDMA messages that need to be transmitted in sequence are marked by carrying the transmission identifier in the RDMA message, and the network device selects the forwarding path of the RDMA message based on the transmission identifier in the RDMA message. While ensuring that RDMA packets that need to be transmitted in sequence can be forwarded through the same forwarding path, RDMA packets with different transmission identifiers can be forwarded through different forwarding paths, achieving network load balancing and improving network bandwidth resources utilization rate. Since the RDMA packets that need to be transmitted in sequence carry the same transmission identifier, these RDMA packets that need to be transmitted in sequence can be forwarded through the same forwarding path, thereby ensuring that the RDMA packets are transmitted in sequence.
  • the first network device may acquire multiple RDMA packets in sequence, the multiple RDMA packets all include the transmission identifier, and the destination addresses of the multiple RDMA packets are the same . Since the destination addresses of the multiple RDMA packets and the included transmission identifiers are the same, the first network device determines the same target forwarding path among the multiple forwarding paths. In addition, the first network device sequentially forwards the multiple RDMA packets through the target forwarding path.
  • the first network device selects the forwarding path of the RDMA message based on the transmission identifier in the RDMA message, so that the first network device can sequentially forward multiple RDMA messages that need to be transmitted sequentially based on the same forwarding path , which ensures the in-order transmission of RDMA packets.
  • the above describes the process of the network device forwarding the RDMA packet based on the transmission identifier in the RDMA packet, and the following describes the manner in which the RDMA packet carries the transmission identifier.
  • the transmission identifier in the RDMA message is located in the RDMA header of the RDMA message.
  • the RDMA header may refer to the header of the RDMA packet, for example, the Basic Transport Header (Base Transport Header, BTH) of the RDMA packet.
  • BTH Basic Transport Header
  • FIG. 3 is a schematic diagram of a format of an RDMA message provided in an embodiment of the present application.
  • the RDMA message includes Ethernet Header (Eth Header), Internet Protocol Header (Internet Protocol Header, IP Header), User Datagram Protocol Header (User Datagram Protocol Header, UDP Header), BTH and payload (payload).
  • the transmission identifier in the RDMA message may be located in the BTH of the RDMA message.
  • the transmission identifier is located in one or more reserved fields in the RDMA header.
  • the RDMA header includes one or more reserved fields, and the transmission identifier can be represented by values in the one or more reserved fields.
  • FIG. 4 is a schematic diagram of a format of an RDMA header provided in an embodiment of the present application.
  • the RDMA header includes multiple reserved fields. Wherein, the pad count (pad count) field (2 bits), the first reserved field (8 bits) and the second reserved field (7 bits) are all reserved fields in the RDMA header, and the transmission identifier can be passed through Values in several reserved fields are represented.
  • the length of the transmission identifier is 17 bits; for another example, the transmission identifier is represented by the first reserved field and the second reserved field In the case of , the length of the transmission identifier is 15 bits.
  • the transmission identifier is located in an extension field in the RDMA header.
  • the RDMA header in the standard protocol can be extended, so that the RDMA header includes a newly extended extension field.
  • the size of the extension field in the RDMA header may be determined according to actual applications, for example, the length of the extension field in the RDMA header may be 10 bits or 15 bits.
  • the position of the extension field in the RDMA header may also be determined according to actual applications, for example, the extension field is at the end of the RDMA header or any middle position. The embodiment of the present application does not specifically limit the size and position of the extension field in the RDMA header.
  • the source network device After the source network device obtains the message sent by the application layer, it generates an RDMA message, and the RDMA message includes RDMA data and a transmission identifier to be transmitted to the destination network device. After the source network device generates the RDMA message, it transmits the RDMA message through the intermediate network device between the source network device and the destination network device, so that the RDMA message can be forwarded by the intermediate network device to the destination network device.
  • FIG. 5 is a schematic diagram of a source network device generating a packet according to a message according to an embodiment of the present application.
  • the source network device receives message 1, message 2, message 3, and message 4 sent by the application layer, wherein message 2 and message 3 are messages that need to be transmitted in order. Then, the source network device generates corresponding packets according to message 1, message 2, message 3 and message 4 based on the MTU.
  • the source network device can generate message 1 according to message 1, and message 1 includes the RDMA data in message 1 and the transmission identifier 1.
  • the source network device Since the data volume of message 2 is larger than the MTU, the source network device generates message 2A and message 2B according to message 2, and message 2A and message 2B respectively carry part of the RDMA data in message 2. Through message 2A and message 2B, the source network device can transmit all RDMA data in message 2. In addition, since the data volume of message 3 is smaller than the MTU, the source network device can generate message 3 according to message 3. Based on the fact that message 2 and message 3 are transmitted sequentially, the source network device may allocate the same transmission identifier, that is, transmission identifier 2, to message 2 and message 3 . In this way, the transmission identifier 2 is included in the message 2A, the message 2B and the message 3 generated by the source network device.
  • the source network device Since the data volume of message 4 is larger than the MTU, the source network device generates message 4A and message 4B according to message 4, and message 4A and message 4B respectively carry part of the RDMA data in message 4. Moreover, the message 4A and the message 4B include the same transmission identifier, that is, the transmission identifier 3 .
  • FIG. 6 is a schematic diagram of message transmission provided by an embodiment of the present application.
  • the source network device generates message 1, message 2A, message 2B, message 3, message 4A and message 4B, and sends the above-mentioned multiple messages to the intermediate network device 1 in sequence .
  • the intermediate network device 1 When the intermediate network device 1 sends the above-mentioned message, it determines the forwarding path 1 and the forwarding path 2 according to the destination address of the message.
  • the forwarding path 1 is the path to the destination network device through the intermediate network device 2, and the forwarding path 1 can be identified by the outgoing port connected to the intermediate network device 2 in the intermediate network device 1; the forwarding path 2 is to go to the destination through the intermediate network device 3
  • the path of the network device, the forwarding path 2 may be identified by the outgoing port connected to the intermediate network device 3 in the intermediate network device 1 .
  • the intermediate network device 1 selects the forwarding path 1 according to the transmission identifier 1 in the message 1, so as to send the message 1 to the intermediate network device 2.
  • intermediate network device 1 When forwarding message 2A, message 2B and message 3, intermediate network device 1 selects forwarding path 2 according to the transmission identifier 2 in message 2A, message 2B and message 3, and then sends to intermediate network device 3 in sequence Message 2A, message 2B and message 3.
  • intermediate network device 1 When forwarding message 4A and message 4B, intermediate network device 1 selects forwarding path 1 according to transmission identifier 3 in message 4A and message 4B, thereby sending message 4A and message 4B to intermediate network device 2 in sequence.
  • intermediate network device 2 sends message 1, message 4A and message 4B to intermediate network device 4 in sequence
  • intermediate network device 3 sends message 2A, message 2B and message 3 to intermediate network device 4 in sequence.
  • the intermediate network device 4 sequentially receives the message 1, the message 4A and the message 4B, it sequentially sends the message 1, the message 4A and the message 4B to the destination network device; and, the intermediate network device 4 receives the message After the message 2A, the message 2B and the message 3, send the message 2A, the message 2B and the message 3 to the destination network device in sequence.
  • message 1, message 4A and 4B are forwarded through forwarding path 1
  • packets 2A, 2B, and 3 are forwarded through forwarding path 2, thereby realizing network load balancing and improving utilization of network bandwidth resources.
  • FIG. 6 introduces the process in which the intermediate network device selects a target forwarding path from multiple forwarding paths according to the transmission identifier in the message.
  • the source network device may also be dual-homed to multiple intermediate network devices, so the source network device may also select a target forwarding path from multiple forwarding paths according to the transmission identifier in the packet.
  • the source server is dual-homed to two switches, and data is forwarded through the two switches.
  • FIG. 7 is a schematic diagram of another message transmission provided by the embodiment of the present application.
  • the source network device is also connected to the intermediate network device 1', that is, the source network device is dual-homed to the intermediate network device 1 and the intermediate network device 1'.
  • the intermediate network device 1 is connected to the intermediate network device 2 and the intermediate network device 3 respectively, and the intermediate network device 1' is also connected to the intermediate network device 2 and the intermediate network device 3 respectively.
  • the source network device After the source network device generates the message 1, the message 2A, the message 2B, the message 3, the message 4A and the message 4B, it will forward the message according to the transmission identifier in the above message.
  • the source network device selects a forwarding path through the intermediate network device 1 according to the transmission identifier 1 in the message 1, so as to send the message 1 to the intermediate network device 1.
  • the intermediate network device 1 selects a forwarding path through the intermediate network device 2 according to the transmission identifier 1 in the message 1, thereby sending the message 1 to the intermediate network device 2 .
  • the source network device selects a forwarding path through the intermediate network device 1' according to the transmission identifier 2 in message 2A, message 2B, and message 3, thereby sequentially sending
  • the intermediate network device 1' sends the message 2A, the message 2B and the message 3.
  • the intermediate network device 1' selects a forwarding path through the intermediate network device 3 according to the transmission identifier 2, so as to send the message 2A, message 2B and message 3 to the intermediate network device 3.
  • the source network device selects a forwarding path through intermediate network device 1 according to the transmission identifier 3 in message 4A and message 4B, thereby sending message 4A and message 4B to intermediate network device 1 in sequence.
  • the intermediate network device 1 selects a forwarding path passing through the intermediate network device 2 according to the transmission identifier 3 , so as to send the message 4A and the message 4B to the intermediate network device 2 .
  • the source network device when the source network device is dual-homed to the intermediate network device, the source network device can also select the corresponding forwarding path from multiple forwarding paths based on the transmission identifier in the message to forward the message , so as to achieve network load balancing and improve the utilization of network bandwidth resources.
  • the present application further provides a network device.
  • FIG. 8 is a schematic structural diagram of a network device 800 provided in an embodiment of the present application.
  • the network device 800 includes: an acquisition unit 801, a processing unit 802, and a sending unit 803; the acquisition unit 801 is configured to acquire an RDMA message, and the RDMA message includes a transmission identifier, wherein RDMA packets with the same transmission identifier need to be transmitted in sequence; the processing unit 802 is configured to determine a target forwarding path according to the transfer identifier and the destination address of the RDMA packet; the sending unit 803 is configured to pass the The target forwarding path forwards the RDMA message.
  • the processing unit 802 is further configured to: determine multiple forwarding paths that can be used to forward the RDMA message according to the destination address of the RDMA message; The target forwarding path is determined from the forwarding paths.
  • the processing unit 802 is specifically configured to, according to the transmission identifier, and one or more of the source address, destination address, source port number, destination port number, and protocol number of the RDMA packet,
  • the target forwarding path is determined from the multiple forwarding paths.
  • the obtaining unit 801 is further configured to sequentially obtain multiple RDMA messages, the multiple RDMA messages all include the transmission identifier, and the destination addresses of the multiple RDMA messages are the same; the The processing unit 802 is further configured to sequentially forward the multiple RDMA packets through the target forwarding path.
  • the transmission identifier is located in the RDMA header of the RDMA packet.
  • the transmission identifier is located in one or more reserved fields in the RDMA header.
  • the transmission identifier is located in an extension field in the RDMA header.
  • the RDMA message includes a Converged Ethernet-based RDMA RoCE message, an iWARP message, or an InfiniBand message.
  • the obtaining unit 801 is configured to obtain a message sent by the application layer; the processing unit 802 is configured to generate at least one RDMA message according to the message, and the at least one RDMA message includes the same transmission identifier .
  • the obtaining unit 801 is configured to receive the RDMA message sent by the second network device, where the second network device is an upstream device of the first network device.
  • FIG. 9 is a schematic structural diagram of a network device 900 provided in an embodiment of the present application.
  • the network device 900 shown in FIG. 9 shows some specific features, those skilled in the art will realize from the embodiments of the present application that for the sake of brevity, various other features are not shown in FIG. 9 so as not to confuse the present invention. Further relevant aspects of the embodiments disclosed in the application examples.
  • the network device 900 includes one or more processing units (e.g., CPUs) 901, a network interface 902, a programming interface 903, a memory 904, and one or more communication buses 905 for Interconnect the various components.
  • the network device 900 may also omit or add some functional components or units based on the above examples.
  • the network interface 902 is used to connect with one or more other network devices/servers in the network system.
  • communication bus 905 includes circuitry that interconnects and controls communication between system components.
  • Memory 904 can include nonvolatile memory, for example, read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM) , Electrically Erasable Programmable Read-Only Memory (electrically EPROM, EEPROM) or flash memory.
  • Memory 904 may also include volatile memory, which may be random access memory (RAM), which acts as an external cache.
  • the memory 904 or the non-transitory computer-readable storage medium of the memory 904 stores the following programs, modules, and data structures, or a subset thereof, such as including an acquisition unit (not shown in the figure), a sending unit (in the figure) not shown) and a processing unit 9041.
  • the network device 900 may have any function of the first network device in the method embodiment corresponding to FIG. 2 above.
  • the network device 900 corresponds to the first network device in the foregoing method embodiments, and each module in the network device 900 and the above-mentioned other operations and/or functions are respectively intended to implement the first network device in the foregoing method embodiments.
  • each module in the network device 900 and the above-mentioned other operations and/or functions are respectively intended to implement the first network device in the foregoing method embodiments.
  • details of various steps and methods refer to the method embodiment corresponding to FIG. 2 above, and for the sake of brevity, details are not repeated here.
  • the network interface 902 on the network device 900 can complete the data sending and receiving operation, or the processor can call the program code in the memory, and cooperate with the network interface 902 to realize the function of the sending and receiving unit when necessary .
  • the network device 900 is configured to execute the packet transmission method provided in the embodiment of the present application, for example, execute the packet transmission method corresponding to the embodiment shown in FIG. 2 above.
  • the specific structure of the network device described in FIG. 9 of this application may be as shown in FIG. 10 .
  • FIG. 10 is a schematic structural diagram of a network device 1000 provided by an embodiment of the present application.
  • the network device 1000 includes: a main control board 1010 and an interface board 1030 .
  • the main control board 1010 is also called a main processing unit (main processing unit, MPU) or a route processor (route processor). , equipment maintenance, protocol processing functions.
  • the main control board 1010 includes: a CPU 1011 and a memory 1012 .
  • the interface board 1030 is also called a line processing unit (line processing unit, LPU), a line card (line card), or a service board.
  • the interface board 1030 is used to provide various service interfaces and implement forwarding of data packets.
  • Service interfaces include but are not limited to Ethernet interfaces, POS (Packet over SONET/SDH) interfaces, etc.
  • the interface board 1030 includes: a central processing unit 1031 , a network processor 1032 , a forwarding entry storage 1034 and a physical interface card (physical interface card, PIC) 1033 .
  • the CPU 1031 on the interface board 1030 is used to control and manage the interface board 1030 and communicate with the CPU 1011 on the main control board 1010 .
  • the network processor 1032 is configured to implement message forwarding processing.
  • the form of the network processor 1032 may be a forwarding chip.
  • the physical interface card 1033 is used to realize the interconnection function of the physical layer, through which the original traffic enters the interface board 1030 , and the processed packets are sent out from the physical interface card 1033 .
  • the physical interface card 1033 includes at least one physical interface, which is also called a physical interface, and the physical interface may be a Flexible Ethernet (FlexE) physical interface.
  • the physical interface card 1033 is also called a daughter card, which can be installed on the interface board 1030, and is responsible for converting the photoelectric signal into a message and forwarding the message to the network processor 1032 for processing after checking the validity of the message.
  • the central processing unit 1031 of the interface board 1030 can also execute the functions of the network processor 1032 , such as implementing software forwarding based on a general-purpose CPU, so that the network processor 1032 is not required in the interface board 1030 .
  • the network device 1000 includes multiple interface boards.
  • the network device 1000 further includes an interface board 1040, and the interface board 1040 includes: a central processing unit 1041, a network processor 1042, a forwarding entry storage 1044, and a physical interface card 1043.
  • the network device 1000 further includes a switching fabric unit 1020 .
  • the SFU 1020 may also be called a SFU (switch fabric unit, SFU).
  • SFU switch fabric unit
  • the switching fabric board 1020 is used to complete the data exchange between the interface boards.
  • the interface board 1030 and the interface board 1040 may communicate through the switching fabric board 1020 .
  • the main control board 1010 is coupled to the interface board.
  • the main control board 1010, the interface board 1030, the interface board 1040, and the switching fabric board 1020 are connected through a system bus and/or a system backplane to implement intercommunication.
  • an inter-process communication protocol IPC
  • IPC inter-process communication
  • the network device 1000 includes a control plane and a forwarding plane.
  • the control plane includes a main control board 1010 and a central processing unit 1031.
  • the forwarding plane includes various components for performing forwarding, such as a forwarding entry storage 1034, a physical interface card 1033, and a network processing device 1032.
  • the control plane performs functions such as publishing routes, generating forwarding tables, processing signaling and protocol messages, configuring and maintaining device status, etc., and the control plane sends the generated forwarding tables to the forwarding plane.
  • the network processor 1032 The forwarding table issued above looks up and forwards the packets received by the physical interface card 1033.
  • the forwarding table delivered by the control plane may be stored in the forwarding table item storage 1034 . In some embodiments, the control plane and the forwarding plane may be completely separated and not on the same device.
  • the obtaining unit 801 and the sending unit 803 in the network device 800 may be equivalent to the physical interface card 1033 or the physical interface card 1043 in the network device 1000; the processing unit 802 in the network device 800 may be equivalent to the central
  • the processor 1011 or the central processing unit 1031 may also be equivalent to the program codes or instructions stored in the memory 1012 .
  • the operations on the interface board 1040 in the embodiment of the present application are consistent with the operations on the interface board 1030 , and are not repeated for brevity.
  • the network device 1000 in this embodiment may correspond to the first network device in the foregoing method embodiments, and the main control board 1010, the interface board 1030, and/or the interface board 1040 in the network device 1000 may implement the foregoing methods.
  • the functions of the first network device and/or various steps implemented by the first network device will not be repeated here.
  • main control boards there may be one or more main control boards, and when there are multiple main control boards, it may include the main main control board and the standby main control board. There may be one or more interface boards. The stronger the data processing capability of the network device, the more interface boards it provides. There may also be one or more physical interface cards on the interface board. There may be no SFU, or there may be one or more SFUs. When there are multiple SFUs, they can jointly implement load sharing and redundant backup. Under the centralized forwarding architecture, the network device does not need a switching network board, and the interface board undertakes the processing function of the service data of the entire system.
  • the network device can have at least one SFU, through which the data exchange between multiple interface boards can be realized, and large-capacity data exchange and processing capabilities can be provided.
  • the form of the network device can also be that there is only one board, that is, there is no switching fabric board, and the functions of the interface board and the main control board are integrated on this board.
  • the central processing unit and the main control board on the interface board can be combined into one central processing unit on the one board to perform the superimposed functions of the two. Which architecture to use depends on the specific networking deployment scenario, and there is no unique limitation here.
  • the foregoing first network device may be implemented as a virtualization device.
  • the virtualization device may be a virtual machine (virtual machine, VM) running a program for sending packets, a virtual router or a virtual switch.
  • Virtualization devices are deployed on hardware devices (eg, physical servers).
  • the first network device may be implemented based on a common physical server combined with a network functions virtualization (network functions virtualization, NFV) technology.
  • network functions virtualization network functions virtualization
  • an embodiment of the present application also provides a computer program product, which, when running on a network device, causes the network device to execute the method performed by the first network device in the above method embodiment corresponding to FIG. 2 .
  • the embodiment of the present application also provides a chip system, including a processor and an interface circuit, and the interface circuit is configured to receive instructions and transmit them to the processor.
  • the processor is configured to implement the method in any one of the foregoing method embodiments.
  • the chip system further includes a memory, and there may be one or more processors in the chip system.
  • the processor can be realized by hardware or by software.
  • the processor may be a logic circuit, an integrated circuit, or the like.
  • the processor may be a general-purpose processor, and implements the method in any of the above method embodiments by reading the software code stored in the memory.
  • the memory can be integrated with the processor, or can be set separately from the processor, which is not limited in this application.
  • the memory can be a non-transitory processor, such as a read-only memory ROM, which can be integrated with the processor on the same chip, or can be respectively arranged on different chips.
  • the setting method of the processor is not specifically limited.
  • B corresponding to A means that B is associated with A, and B can be determined according to A.
  • determining B according to A does not mean determining B only according to A, and B may also be determined according to A and/or other information.
  • the disclosed system, device and method can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请公开了一种报文传输方法及相关装置,用于实现网络负载均衡,提高网络带宽资源的利用率。本申请方法包括:第一网络设备获取远程内存直接访问RDMA报文,RDMA报文包括传输标识,其中,具有相同传输标识的RDMA报文需要按序传输;第一网络设备根据传输标识以及RDMA报文的目的地址确定目标转发路径;第一网络设备通过目标转发路径转发RDMA报文。

Description

一种报文传输方法及相关装置
本申请要求于2021年7月29日提交中国国家知识产权局、申请号为202110866839.3、发明名称为“一种报文传输方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种报文传输方法及相关装置。
背景技术
目前,高性能计算以及大数据分析等领域的应用不断快速发展,这些应用往往具有高并发、低时延的存储输入输出(input/output,IO)要求。现有的传输控制协议/网际协议(Transmission Control Protocol/Internet Protocol,TCP/IP)越来越不能满足这些应用的需求。这主要体现在TCP/IP需要基于CPU来实现数据的传输,处理延时过大,导致数据传输之间具有较大的延时。
远端直接内存访问(Remote Direct Memory Access,RDMA)技术的出现很好地满足了上述应用高并发、低时延的存储IO要求。RDMA能够通过网络将数据从一个系统快速移动到远程系统的存储器中,而不对操作系统造成任何影响。RDMA消除了外部存储器复制和文本交换操作,因而能解放内存带宽和CPU周期,提高数据传输的效率。目前,高性能计算以及人工智能等领域的应用都在广泛使用RDMA传输协议来实现数据的传输。
然而,目前RDMA数据的传输是承载在应用所建立的连接之上,一旦应用在两个设备之间建立了连接,这两个设备之间的RDMA数据都会基于相同的转发路径在网络中传输,容易造成网络负载不均,限制了网络带宽资源的合理利用。
发明内容
本申请提供了一种报文传输方法,通过在报文中携带传输标识来标记需要按序传输的报文,且网络设备基于报文中的传输标识来选择报文的转发路径。在保证了需要按序传输的报文能够通过相同的转发路径被转发的同时,使得具有不同传输标识的报文能够通过不同的转发路径被转发,实现网络负载均衡,提高了网络带宽资源的利用率。
本申请第一方面提供一种报文传输方法。第一网络设备获取RDMA报文,所述RDMA报文包括传输标识,其中,具有相同传输标识的RDMA报文需要按序传输。第一网络设备获取到的RDMA报文是用于传输RDMA数据的报文,且所述RDMA报文中包括传输标识以及RDMA数据。其中,RDMA数据是源服务器需要向目的服务器发送的数据。
然后,所述第一网络设备根据所述传输标识以及所述RDMA报文的目的地址确定目标转发路径。其中,所述RDMA报文的目的地址用于确定能够用于转发所述RDMA报文的转发路径,所述RDMA报文中的传输标识则用于在多条能够用于转发所述RDMA报文的转发路径中选择目标转发路径。
最后,所述第一网络设备通过所述目标转发路径转发所述RDMA报文。
本申请中,通过在RDMA报文中携带传输标识来标记需要按序传输的报文,且网络设备基于RDMA报文中的传输标识来选择RDMA报文的转发路径。在保证了需要按序传输的RDMA报文能够通过相同的转发路径被转发的同时,使得具有不同传输标识的RDMA报文能够通过不同的转发路径被转发,实现网络负载均衡,提高了网络带宽资源的利用率。
可选的,所述第一网络设备根据所述RDMA报文的目的地址,确定能够用于转发所述RDMA报文的多条转发路径;所述第一网络设备根据所述传输标识,从所述多条转发路径中确定所述目标转发路径。
其中,所述多条转发路径中的每一条转发路径均通过所述第一网络设备中的一个出端口来标识。所述第一网络设备可以根据RDMA报文的目的地址查找路由表,从而确定能够用于转发所述RDMA报文的多个出端口,所述多个出端口分别与所述多条转发路径对应。所述第一网络设备可以根据所述传输标识,从多个出端口中选择目标出端口,并通过目标出端口转发所述RDMA报文。
可选的,所述第一网络设备根据所述传输标识,以及所述RDMA报文的源地址、目的地址、源端口号、目的端口号和协议号中的一种或多种,从所述多条转发路径中确定所述目标转发路径。
也就是说,所述第一网络设备可以是将所述传输标识以及所述RDMA报文的源地址、目的地址、源端口号、目的端口号和协议号中的一种或多种共同作为哈希因子,计算该多个哈希因子的哈希值,从而根据计算得到的哈希值从所述多条转发路径中确定所述目标转发路径。
可选的,所述第一网络设备依次获取多个RDMA报文,所述多个RDMA报文均包括所述传输标识,且所述多个RDMA报文的目的地址相同;所述第一网络设备通过所述目标转发路径,依次转发所述多个RDMA报文。
可选的,所述传输标识位于所述RDMA报文的RDMA头中。所述RDMA头可以是指所述RDMA报文的标准报文头,例如所述RDMA报文的基本传输头(Base Transport Header,BTH)。
可选的,所述传输标识位于所述RDMA头中的一个或多个预留字段中。在标准协议中,所述RDMA头中包括一个或多个预留字段,所述传输标识可以通过所述一个或多个预留字段中的值来表示。
通过所述RDMA头中的一个或多个预留字段来表示传输标识,能够减少对现有技术的改动,提高方案的可实现性。
可选的,所述传输标识位于所述RDMA头中的扩展字段中。
可选的,所述RDMA报文包括基于融合以太网的RDMA RoCE报文、iWARP报文或无限带宽InfiniBand报文。
可选的,当获取远程内存直接访问RDMA报文时,所述第一网络设备获取应用层发送的消息;所述第一网络设备根据所述消息生成至少一个RDMA报文,所述至少一个RDMA报文包括相同的传输标识。
可选的,所述第一网络设备接收第二网络设备发送的所述RDMA报文,所述第二网络 设备为第一网络设备的上游设备。
本申请第二方面提供一种网络设备,包括:获取单元、处理单元和发送单元;所述获取单元,用于获取RDMA报文,所述RDMA报文包括传输标识,其中,具有相同传输标识的RDMA报文需要按序传输;所述处理单元,用于根据所述传输标识以及所述RDMA报文的目的地址确定目标转发路径;所述发送单元,用于通过所述目标转发路径转发所述RDMA报文。
可选的,所述处理单元,还用于:根据所述RDMA报文的目的地址,确定能够用于转发所述RDMA报文的多条转发路径;根据所述传输标识,从所述多条转发路径中确定所述目标转发路径。
可选的,所述处理单元,具体用于根据所述传输标识,以及所述RDMA报文的源地址、目的地址、源端口号、目的端口号和协议号中的一种或多种,从所述多条转发路径中确定所述目标转发路径。
可选的,所述获取单元,还用于依次获取多个RDMA报文,所述多个RDMA报文均包括所述传输标识,且所述多个RDMA报文的目的地址相同;所述处理单元,还用于通过所述目标转发路径,依次转发所述多个RDMA报文。
可选的,所述传输标识位于所述RDMA报文的RDMA头中。
可选的,所述传输标识位于所述RDMA头中的一个或多个预留字段中。
可选的,所述传输标识位于所述RDMA头中的扩展字段中。
可选的,所述RDMA报文包括基于融合以太网的RDMARoCE报文、iWARP报文或无限带宽InfiniBand报文。
可选的,所述获取单元,用于获取应用层发送的消息;所述处理单元,用于根据所述消息生成至少一个RDMA报文,所述至少一个RDMA报文包括相同的传输标识。
可选的,所述获取单元,用于接收第二网络设备发送的所述RDMA报文,所述第二网络设备为第一网络设备的上游设备。
本申请第三方面提供一种网络设备,该网络设备包括:处理器,用于使得网络设备实现如前述第一方面或第一方面的任一可能的实现方式中描述的方法。该设备还可以包括存储器,存储器与处理器耦合,处理器执行存储器中存储的指令时,可以使得网络设备实现前述第一方面任一种可能的实现方式描述的方法。该设备还可以包括通信接口,通信接口用于该装置与其它设备通信,示例性的,通信接口可以是收发器、电路、总线、模块或其它类型的通信接口。
本申请中存储器中的指令可以预先存储也可以在使用该网络设备时从互联网下载后存储,本申请对于存储器中指令的来源不进行具体限定。本申请中的耦合是装置、单元或模块之间的间接耦合或连接,其可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。
本申请第四方面提供一种计算机存储介质,该计算机存储介质可以是非易失性的;该 计算机存储介质中存储有计算机可读指令,当该计算机可读指令被处理器执行时实现第一方面或第一方面的任一可能的实现方式中描述的方法。
本申请第五方面提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如第一方面或第一方面的任一可能的实现方式中描述的方法。
本申请第六方面提供了一种网络系统,该网络系统包括多个如上述第二方面或第三方面的网络设备。
上述第二方面至第六方面提供的方案,用于实现或配合实现上述第一方面提供的方法,因此可以与第一方面达到相同或相应的有益效果,此处不再进行赘述。
附图说明
图1为本申请实施例提供的报文传输方法的应用架构示意图;
图2为本申请实施例提供的一种报文传输方法200的流程示意图;
图3为本申请实施例提供的一种RDMA报文的格式示意图;
图4为本申请实施例提供的一种RDMA头的格式示意图;
图5为本申请实施例提供的源网络设备根据消息生成报文的示意图;
图6为本申请实施例提供的一种报文传输过程的示意图;
图7为本申请实施例提供的另一种报文传输过程的示意图;
图8为本申请实施例提供的一种网络设备800的结构示意图;
图9为本申请实施例提供的一种网络设备900的结构示意图;
图10为本申请实施例提供的一种网络设备1000的结构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,下面结合附图,对本申请的实施例进行描述。显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着新应用场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的描述在适当情况下可以互换,以便使实施例能够以除了在本申请图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或模块。在本申请中出现的对步骤进行的命名或者编号,并不意味着必须按照命名或者编号所指示的时间/逻辑先后顺序执行方法流程中的步骤,已经命名或者编号的流程步骤可以根据要实现的技术目的变更执行顺序,只要能达到相同或者相类似的技术效果即可。本申请中所出现的单元的划分,是一种逻辑上的划分,实际应用中实现时可以有另外的划分方式,例如多个单元可以结合成或集成在另一个系统中,或一些特征可以忽略,或 不执行,另外,所显示的或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元之间的间接耦合或通信连接可以是电性或其他类似的形式,本申请中均不作限定。并且,作为分离部件说明的单元或子单元可以是也可以不是物理上的分离,可以是也可以不是物理单元,或者可以分布到多个电路单元中,可以根据实际的需要选择其中的部分或全部单元来实现本申请方案的目的。
RDMA能够通过网络将数据从一个系统快速移动到远程系统的存储器中,而不对操作系统造成任何影响。RDMA消除了外部存储器复制和文本交换操作,因而能解放内存带宽和CPU周期,提高数据传输的效率。目前,高性能计算以及人工智能等领域的应用都在广泛使用RDMA传输协议来实现数据的传输。
然而,目前RDMA数据的传输是承载在应用所建立的连接之上,一旦应用在两个设备之间建立了连接,这两个设备之间的RDMA数据都会基于相同的转发路径在网络中传输,容易造成网络负载不均,限制了网络带宽资源的合理利用。
有鉴于此,本申请实施例提供一种报文传输方法,通过在报文中携带传输标识来标记需要按序传输的报文,且网络设备基于报文中的传输标识来选择报文的转发路径。在保证了需要按序传输的报文能够通过相同的转发路径被转发的同时,使得具有不同传输标识的报文能够通过不同的转发路径被转发,实现网络负载均衡,提高了网络带宽资源的利用率。
可以参阅图1,图1为本申请实施例提供的报文传输方法的应用架构示意图。如图1所示,该应用架构包括源网络设备,中间网络设备1-中间网络设备4,以及目的网络设备。源网络设备和目的网络设备可以为服务器,中间网络设备1-中间网络设备4则可以为路由器、交换机或网关等物理设备。
其中,源网络设备需要向目的网络设备发送RDMA报文,该RDMA报文经过源网络设备和目的网络设备之间的中间网络设备。在RDMA报文从源网络设备传输到目的网络设备的过程中,负责转发RDMA报文的网络设备基于RDMA报文中的传输标识来选择该RDMA报文的转发路径,从而实现基于同一条转发路径来转发具有相同传输标识的RDMA报文。
可以参阅图2,图2为本申请实施例提供的一种报文传输方法200的流程示意图。如图2所示,该报文传输方法200包括以下的步骤201-203。
步骤201,第一网络设备获取RDMA报文,所述RDMA报文包括传输标识,其中,具有相同传输标识的RDMA报文需要按序传输。
本实施例中,第一网络设备所获取到的RDMA报文是用于传输RDMA数据的报文,且所述RDMA报文中包括传输标识以及RDMA数据。其中,RDMA数据是源服务器需要向目的服务器发送的数据。当目的服务器通过RDMA技术访问源服务器上的数据时,源服务器则向目的服务器发送目的服务器请求访问的RDMA数据。所述传输标识也可以称为事务(transaction)标识。
在一种可能的实现方式中,第一网络设备例如可以为服务器。
示例性地,在所述第一网络设备为源服务器的情况下,所述第一网络设备中的应用层 向第一网络设备中的传输层发送消息(message),所述消息包括需要向目的服务器发送的RDMA数据。所述第一网络设备在获取到应用层发送的消息之后,根据所述消息生成至少一个RDMA报文,所述至少一个RDMA报文包括相同的传输标识。
由于RDMA数据在网络中传输时通常会受到最大传输单元(maximum transmission unit,MTU)的限制,因此应用层所发送的同一个消息中的RDMA数据可能会被第一网络设备切分成一个或多个RDMA报文来进行传输。其中,MTU是指网络中所能通过的最大的数据报文的大小。当第一网络设备获取到的消息的数据量大于MTU时,第一网络设备则根据所述消息生成多个RDMA报文,该多个RDMA报文分别携带所述消息中的部分RDMA数据;当第一网络设备获取到的消息的数据量小于或等于MTU时,第一网络设备则根据所述消息生成一个RDMA报文,该RDMA报文携带了所述消息中所有的RDMA数据。
在第一网络设备根据应用层所发送的一个消息生成多个RDMA报文的情况下,第一网络设备为所述多个RDMA报文添加相同的传输标识。由于第一网络设备所生成的多个RDMA报文中所包括的RDMA数据是属于同一个消息的,所述多个RDMA报文需要按序传输才能保证目的服务器能够接收到该消息中完整的RDMA数据,因此第一网络设备为所述多个RDMA报文添加相同的传输标识,以保证所述多个RDMA报文能够按序传输。示例性地,第一网络设备可以为所述消息分配一个传输标识,第一网络设备基于所述消息所生成的多个RDMA均包括所述消息对应的传输标识,从而保证所述消息对应的多个RDMA报文包括相同的传输标识。
在第一网络设备获取到应用层所发送的多个消息,且所述多个消息被指示为需要按序传输时的情况下,第一网络设备为所述多个消息分配同一个传输标识。这样,第一网络设备在根据所述多个消息生成多个RDMA报文时,第一网络设备所生成的与所述多个消息对应的多个RDMA报文包括同一个传输标识。例如,在应用层向所述第一网络设备发送多个消息,且所述多个消息中的RDMA数据是相互关联的情况下,应用层指示所述多个消息需要按序传输。
在另一种可能的实现方式中,第一网络设备例如可以为路由器、交换机或网关等物理设备,也可以是支持报文转发的虚拟设备等。
具体地,所述第一网络设备获取RDMA报文的方式可以是:所述第一网络设备接收第二网络设备发送的所述RDMA报文,所述第二网络设备为第一网络设备的上游设备。其中,第二网络设备例如可以为源服务器、路由器、交换机或网关等物理设备,也可以是支持报文转发的虚拟设备等。简单来说,第一网络设备和第二网络设备均为所述RDMA报文转发路径上的网络设备,所述第二网络设备将所述RDMA报文转发给所述第一网络设备,由第一网络设备继续将所述RDMA报文转发给所述第一网络设备的下一跳设备。
可选的,本实施例中所述的RDMA报文可以包括基于融合以太网的RDMA(RDMA over Converged Ethernet,RoCE)报文、iWARP报文或无限带宽(InfiniBand)报文。其中,RoCE报文是基于RoCE协议来传输的报文;iWARP报文是基于iWARP协议来传输的报文;InfiniBand报文则是基于InfiniBand协议来传输的报文。其中,RoCE协议是一种允许通 过以太网执行RDMA的网络协议;iWARP协议是一种允许通过传输控制协议(Transmission Control Protocol,TCP)执行RDMA的网络协议,iWARP协议实际上是一种互联网广域RDMA协议;InfiniBand协议是支持RDMA的新一代网络协议。
需要说明的是,本实施例中所述的RDMA报文也可以是基于其他的协议来传输的报文,本实施例并不限定RDMA报文的具体形式。
步骤202,所述第一网络设备根据所述传输标识以及所述RDMA报文的目的地址确定目标转发路径。
本实施例中,在所述第一网络设备转发所述RDMA报文的过程中,所述第一网络设备基于所述RDMA报文中的传输标识以及目的地址确定所述RDMA报文的目标转发路径。其中,所述RDMA报文的目的地址用于确定能够用于转发所述RDMA报文的转发路径,所述RDMA报文中的传输标识则用于在多条能够用于转发所述RDMA报文的转发路径中选择目标转发路径。
示例性地,所述第一网络设备可以根据所述RDMA报文的目的地址,确定能够用于转发所述RDMA报文的多条转发路径。其中,所述多条转发路径中的每一条转发路径均通过所述第一网络设备中的一个出端口来标识。所述第一网络设备可以根据RDMA报文的目的地址查找路由表,从而确定能够用于转发所述RDMA报文的多个出端口,所述多个出端口分别与所述多条转发路径对应。然后,所述第一网络设备根据所述传输标识,从所述多条转发路径中确定所述目标转发路径。即,所述第一网络设备根据所述传输标识,从多个出端口中选择目标出端口,并通过目标出端口转发所述RDMA报文。
可选的,所述第一网络设备可以是将所述传输标识作为哈希因子,计算所述传输标识的哈希值,并根据所述传输标识的哈希值从所述多条转发路径中确定所述目标转发路径。
可选的,所述第一网络设备可以是将所述传输标识以及所述RDMA报文的源地址、目的地址、源端口号、目的端口号和协议号中的一种或多种共同作为哈希因子,计算该多个哈希因子的哈希值,从而根据计算得到的哈希值从所述多条转发路径中确定所述目标转发路径。例如,所述第一网络设备可以是将所述传输标识以及所述RDMA报文的源地址和目的地址作为哈希因子,计算这三个哈希因子的哈希值,从而根据计算得到的哈希值确定目标转发路径。又例如,所述第一网络设备可以将所述传输标识以及所述RDMA报文的源地址、目的地址、源端口号、目的端口号和协议号均作为哈希因子,计算这六个哈希因子的哈希值,从而根据计算得到的哈希值确定目标转发路径。
步骤203,所述第一网络设备通过所述目标转发路径转发所述RDMA报文。
在所述第一网络设备确定目标转发路径之后,所述第一网络设备根据所确定的目标转发路径转发所述RDMA报文。示例性地,在所述目标转发路径通过目标出端口来标识的情况下,所述第一网络设备通过所述目标出端口来转发所述RDMA报文,从而将所述RDMA报文转发至所述第一网络设备的下一跳设备。
本实施例中,通过在RDMA报文中携带传输标识来标记需要按序传输的RDMA报文,且网络设备基于RDMA报文中的传输标识来选择RDMA报文的转发路径。在保证了需要按序传 输的RDMA报文能够通过相同的转发路径被转发的同时,使得具有不同传输标识的RDMA报文能够通过不同的转发路径被转发,实现网络负载均衡,提高了网络带宽资源的利用率。由于需要按序传输的RDMA报文携带了相同的传输标识,因此这些需要按序传输的RDMA报文都能够通过相同的转发路径被转发,从而保证RDMA报文是按序传输的。
在一个可能的实施例中,所述第一网络设备可以是依次获取多个RDMA报文,所述多个RDMA报文均包括所述传输标识,且所述多个RDMA报文的目的地址相同。由于所述多个RDMA报文的目的地址以及所包括的传输标识均相同,所述第一网络设备则在多条转发路径中确定同一条目标转发路径。并且,所述第一网络设备通过所述目标转发路径,依次转发所述多个RDMA报文。
本实施例中,第一网络设备基于RDMA报文中的传输标识来选择RDMA报文的转发路径,可以使得第一网络设备能够基于相同的转发路径依次转发需要按序传输的多个RDMA报文,保证了RDMA报文的按序传输。
以上介绍了网络设备基于RDMA报文中的传输标识来转发RDMA报文的过程,以下将介绍RDMA报文携带传输标识的方式。
在一个可能的实施例中,RDMA报文中的传输标识位于所述RDMA报文的RDMA头中。所述RDMA头可以是指所述RDMA报文的报文头,例如所述RDMA报文的基本传输头(Base Transport Header,BTH)。示例性地,可以参阅图3,图3为本申请实施例提供的一种RDMA报文的格式示意图。如图3所示,RDMA报文中分别包括以太头(Eth Header)、互联网协议头(Internet Protocol Header,IP Header)、用户数据报协议头(User Datagram Protocol Header,UDP Header)、BTH以及有效载荷(payload)。RDMA报文中的传输标识则可以位于RDMA报文的BTH中。
可选的,所述传输标识位于所述RDMA头中的一个或多个预留字段中。在标准协议中,所述RDMA头中包括一个或多个预留字段,所述传输标识则可以通过所述一个或多个预留字段中的值来表示。示例性地,可以参阅图4,图4为本申请实施例提供的一种RDMA头的格式示意图。如图4所示,RDMA头中包括多个预留字段。其中,填充计数(pad count)字段(2比特)、第一个reserved字段(8比特)以及第二个reserved字段(7比特)均为RDMA头中的预留字段,所述传输标识可以通过这多个预留字段中的值来表示。例如,所述传输标识通过上述的三个预留字段来表示的情况下,所述传输标识的长度为17比特;又例如,所述传输标识通过第一个reserved字段和第二个reserved字段来表示的情况下,所述传输标识的长度则为15比特。
可选的,所述传输标识位于所述RDMA头中的扩展字段中。具体地,在实际应用中,可以对标准协议中的RDMA头进行扩展,以使得RDMA头中包括新扩展的扩展字段。其中,所述RDMA头中的扩展字段的大小可以根据实际应用来确定,例如所述RDMA头中的扩展字段的长度可以为10比特或15比特。此外,所述扩展字段在所述RDMA头中的位置也可以是根据实际应用来确定,例如所述扩展字段在所述RDMA头的结尾位置或任意一个中间位置。本申请实施例并不对所述RDMA头中的扩展字段的大小以及位置做具体限定。
为了便于理解本申请实施例所提供的报文传输方法,以下将结合具体例子详细描述本申请实施例提供的报文传输方法。
以图1所示的系统架构为例,源网络设备获取到应用层所发送的消息之后,生成RDMA报文,该RDMA报文包括需要传输给目的网络设备的RDMA数据以及传输标识。源网络设备在生成RDMA报文之后,通过源网络设备与目的网络设备之间的中间网络设备传输RDMA报文,以使得RDMA报文能够由中间网络设备转发至目的网络设备。
可以参阅图5,图5为本申请实施例提供的源网络设备根据消息生成报文的示意图。如图5所示,源网络设备接收到应用层所发送的消息1、消息2、消息3和消息4,其中消息2和消息3为需要按序传输的消息。然后,源网络设备基于MTU,根据消息1、消息2、消息3和消息4生成相应的报文。
具体地,由于消息1的数据量小于MTU,因此源网络设备可以根据消息1生成报文1,报文1包括消息1中的RDMA数据以及传输标识1。
由于消息2的数据量大于MTU,因此源网络设备根据消息2生成报文2A和报文2B,报文2A和报文2B分别携带消息2中的部分RDMA数据。通过报文2A和报文2B,源网络设备可以传输消息2中所有RDMA数据。此外,由于消息3的数据量小于MTU,因此源网络设备可以根据消息3生成报文3。基于消息2和消息3是按序传输的,源网络设备可以给消息2和消息3分配同一个传输标识,即传输标识2。这样,在源网络设备所生成的报文2A、报文2B和报文3中,均包括传输标识2。
由于消息4的数据量大于MTU,因此源网络设备根据消息4生成报文4A和报文4B,报文4A和报文4B分别携带消息4中的部分RDMA数据。并且,报文4A和报文4B中包括相同的传输标识,即传输标识3。
可以参阅图6,图6为本申请实施例提供的一种报文传输的示意图。如图6所示,源网络设备生成了报文1、报文2A、报文2B、报文3、报文4A和报文4B,并将上述的多个报文依次发送给中间网络设备1。
中间网络设备1在发送上述的报文时,根据报文的目的地址确定转发路径1和转发路径2。其中,转发路径1是通过中间网络设备2前往目的网络设备的路径,转发路径1可以用中间网络设备1中与中间网络设备2连接的出端口标识;转发路径2是通过中间网络设备3前往目的网络设备的路径,转发路径2可以用中间网络设备1中与中间网络设备3连接的出端口标识。
具体地,在转发报文1时,中间网络设备1根据报文1中的传输标识1选择了转发路径1,从而向中间网络设备2发送报文1。
在转发报文2A、报文2B和报文3时,中间网络设备1根据报文2A、报文2B以及报文3中的传输标识2选择了转发路径2,从而依次向中间网络设备3发送报文2A、报文2B以及报文3。
在转发报文4A和报文4B时,中间网络设备1根据报文4A和报文4B中的传输标识3选择了转发路径1,从而依次向中间网络设备2发送报文4A和报文4B。
最后,中间网络设备2依次向中间网络设备4发送报文1、报文4A和报文4B,中间网络设备3则依次向中间网络设备4发送报文2A、报文2B以及报文3。中间网络设备4在依次接收到报文1、报文4A和报文4B之后,依次向目的网络设备发送报文1、报文4A和报文4B;并且,中间网络设备4在依次接收到报文2A、报文2B以及报文3之后,依次向目的网络设备发送报文2A、报文2B以及报文3。
由图6可以看出,对于源网络设备生成的六个报文(即报文1、报文2A、报文2B、报文3、报文4A和报文4B),报文1、报文4A和报文4B通过转发路径1进行转发,报文2A、报文2B和报文3则通过转发路径2进行转发,从而实现了网络的负载均衡,提高了网络带宽资源的利用率。
上述图6介绍的是中间网络设备根据报文中的传输标识从多条转发路径中选择目标转发路径的过程。在实际应用中,源网络设备通常也可能会双归接入多个中间网络设备,因此源网络设备也可以根据报文中的传输标识从多条转发路径中选择目标转发路径。例如,在数据中心的场景中,源服务器双归接入两个交换机,并通过两个交换机实现数据的转发。
可以参阅图7,图7为本申请实施例提供的另一种报文传输的示意图。相较于图6,源网络设备还接入了中间网络设备1’,即源网络设备双归接入中间网络设备1和中间网络设备1’。并且,中间网络设备1分别与中间网络设备2和中间网络设备3连接,中间网络设备1’也分别与中间网络设备2和中间网络设备3连接。
源网络设备在生成报文1、报文2A、报文2B、报文3、报文4A和报文4B之后,将根据上述报文中的传输标识来转发报文。
具体地,在转发报文1时,源网络设备根据报文1中的传输标识1选择了经过中间网络设备1的转发路径,从而向中间网络设备1发送报文1。中间网络设备1则根据报文1中的传输标识1选择了经过中间网络设备2的转发路径,从而向中间网络设备2发送报文1。
在转发报文2A、报文2B和报文3时,源网络设备根据报文2A、报文2B以及报文3中的传输标识2选择了经过中间网络设备1’的转发路径,从而依次向中间网络设备1’发送报文2A、报文2B以及报文3。中间网络设备1’则根据传输标识2选择经过中间网络设备3的转发路径,从而向中间网络设备3发送报文2A、报文2B以及报文3。
在转发报文4A和报文4B时,源网络设备根据报文4A和报文4B中的传输标识3选择了经过中间网络设备1的转发路径,从而依次向中间网络设备1发送报文4A和报文4B。中间网络设备1则根据传输标识3选择了经过中间网络设备2的转发路径,从而向中间网络设备2发送报文4A和报文4B。
由图7可以看出,在源网络设备双归接入中间网络设备的情况下,源网络设备同样可以基于报文中的传输标识从多条转发路径选择相应的转发路径来进行报文的转发,从而实现网络的负载均衡,提高了网络带宽资源的利用率。
为了实现上述实施例,本申请还提供了一种网络设备。可以参阅图8,图8为本申请 实施例提供的一种网络设备800的结构示意图。
如图8所示,该网络设备800,包括:获取单元801、处理单元802和发送单元803;所述获取单元801,用于获取RDMA报文,所述RDMA报文包括传输标识,其中,具有相同传输标识的RDMA报文需要按序传输;所述处理单元802,用于根据所述传输标识以及所述RDMA报文的目的地址确定目标转发路径;所述发送单元803,用于通过所述目标转发路径转发所述RDMA报文。
可选的,所述处理单元802,还用于:根据所述RDMA报文的目的地址,确定能够用于转发所述RDMA报文的多条转发路径;根据所述传输标识,从所述多条转发路径中确定所述目标转发路径。
可选的,所述处理单元802,具体用于根据所述传输标识,以及所述RDMA报文的源地址、目的地址、源端口号、目的端口号和协议号中的一种或多种,从所述多条转发路径中确定所述目标转发路径。
可选的,所述获取单元801,还用于依次获取多个RDMA报文,所述多个RDMA报文均包括所述传输标识,且所述多个RDMA报文的目的地址相同;所述处理单元802,还用于通过所述目标转发路径,依次转发所述多个RDMA报文。
可选的,所述传输标识位于所述RDMA报文的RDMA头中。
可选的,所述传输标识位于所述RDMA头中的一个或多个预留字段中。
可选的,所述传输标识位于所述RDMA头中的扩展字段中。
可选的,所述RDMA报文包括基于融合以太网的RDMARoCE报文、iWARP报文或无限带宽InfiniBand报文。
可选的,所述获取单元801,用于获取应用层发送的消息;所述处理单元802,用于根据所述消息生成至少一个RDMA报文,所述至少一个RDMA报文包括相同的传输标识。
可选的,所述获取单元801,用于接收第二网络设备发送的所述RDMA报文,所述第二网络设备为第一网络设备的上游设备。
可以参阅图9,图9为本申请实施例提供的一种网络设备900的结构示意图。图9所示的网络设备900尽管示出了某些特定特征,但是本领域的技术人员将从本申请实施例中意识到,为了简洁起见,图9未示出各种其他特征,以免混淆本申请实施例所公开的实施方式的更多相关方面。为此,作为示例,在一些实现方式中,网络设备900包括一个或多个处理单元(如,CPU)901、网络接口902、编程接口903、存储器904和一个或多个通信总线905,用于将各种组件互连。在另一些实现方式中,网络设备900也可以在上述示例基础上省略或增加部分功能部件或单元。
在一些实现方式中,网络接口902用于在网络系统中和一个或多个其他的网络设备/服务器连接。在一些实现方式中,通信总线905包括互连和控制系统组件之间的通信的电路。存储器904可以包括非易失性存储器,例如,只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。存储 器904也可以包括易失性存储器,易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。
在一些实现中,存储器904或存储器904的非暂时性计算机可读存储介质存储以下程序、模块和数据结构,或其子集,例如包括获取单元(图中未示出)、发送单元(图中未示出)和处理单元9041。
在一个可能的实施例中,该网络设备900可以具有上述图2对应的方法实施例中的第一网络设备中的任意功能。
应理解,网络设备900对应于上述方法实施例中的第一网络设备,网络设备900中的各模块和上述其他操作和/或功能分别为了实现上述方法实施例中的第一网络设备所实施的各种步骤和方法,具体细节可参见上述图2对应的方法实施例,为了简洁,在此不再赘述。
应理解,本申请可以是由网络设备900上的网络接口902来完成数据的收发操作,也可以是由处理器调用存储器中的程序代码,并在需要时配合网络接口902来实现收发单元的功能。
在各种实现中,网络设备900用于执行本申请实施例提供的报文传输方法,例如是执行上述图2所示的实施例所对应的报文传输方法。
本申请图9所述的网络设备具体结构可以为图10所示。
图10为本申请实施例提供的一种网络设备1000的结构示意图,网络设备1000包括:主控板1010和接口板1030。
主控板1010也称为主处理单元(main processing unit,MPU)或路由处理器(route processor),主控板1010用于对网络设备1000中各个组件的控制和管理,包括路由计算、设备管理、设备维护、协议处理功能。主控板1010包括:中央处理器1011和存储器1012。
接口板1030也称为线路处理单元(line processing unit,LPU)、线卡(line card)或业务板。接口板1030用于提供各种业务接口并实现数据包的转发。业务接口包括但不限于以太网接口、POS(Packet over SONET/SDH)接口等。接口板1030包括:中央处理器1031、网络处理器1032、转发表项存储器1034和物理接口卡(physical interface card,PIC)1033。
接口板1030上的中央处理器1031用于对接口板1030进行控制管理并与主控板1010上的中央处理器1011通信。
网络处理器1032用于实现报文的转发处理。网络处理器1032的形态可以是转发芯片。
物理接口卡1033用于实现物理层的对接功能,原始的流量由此进入接口板1030,以及处理后的报文从该物理接口卡1033发出。物理接口卡1033包括至少一个物理接口,物理接口也称物理口,物理接口可以为灵活以太(Flexible Ethernet,FlexE)物理接口。物理接口卡1033也称为子卡,可安装在接口板1030上,负责将光电信号转换为报文并对报 文进行合法性检查后转发给网络处理器1032处理。在一些实施例中,接口板1030的中央处理器1031也可执行网络处理器1032的功能,比如基于通用CPU实现软件转发,从而接口板1030中不需要网络处理器1032。
可选的,网络设备1000包括多个接口板,例如网络设备1000还包括接口板1040,接口板1040包括:中央处理器1041、网络处理器1042、转发表项存储器1044和物理接口卡1043。
可选的,网络设备1000还包括交换网板1020。交换网板1020也可以称为交换网板单元(switch fabric unit,SFU)。在网络设备有多个接口板1030的情况下,交换网板1020用于完成各接口板之间的数据交换。例如,接口板1030和接口板1040之间可以通过交换网板1020通信。
主控板1010和接口板耦合。例如,主控板1010、接口板1030和接口板1040,以及交换网板1020之间通过系统总线和/或系统背板相连实现互通。在一种可能的实现方式中,主控板1010和接口板1030之间建立进程间通信协议(inter-process communication,IPC)通道,主控板1010和接口板1030之间通过IPC通道进行通信。
在逻辑上,网络设备1000包括控制面和转发面,控制面包括主控板1010和中央处理器1031,转发面包括执行转发的各个组件,比如转发表项存储器1034、物理接口卡1033和网络处理器1032。控制面执行发布路由、生成转发表、处理信令和协议报文、配置与维护设备的状态等功能,控制面将生成的转发表下发给转发面,在转发面,网络处理器1032基于控制面下发的转发表对物理接口卡1033收到的报文查表转发。控制面下发的转发表可以保存在转发表项存储器1034中。在有些实施例中,控制面和转发面可以完全分离,不在同一设备上。
应理解,网络设备800中的获取单元801和发送单元803可以相当于网络设备1000中的物理接口卡1033或物理接口卡1043;网络设备800中的处理单元802可以相当于网络设备1000中的中央处理器1011或中央处理器1031,也可以相当于存储器1012中存储的程序代码或指令。
应理解,本申请实施例中接口板1040上的操作与接口板1030的操作一致,为了简洁,不再赘述。应理解,本实施例的网络设备1000可对应于上述各个方法实施例中的第一网络设备,该网络设备1000中的主控板1010、接口板1030和/或接口板1040可以实现上述各个方法实施例中的第一网络设备所具有的功能和/或所实施的各种步骤,为了简洁,在此不再赘述。
值得说明的是,主控板可能有一块或多块,有多块的时候可以包括主用主控板和备用主控板。接口板可能有一块或多块,网络设备的数据处理能力越强,提供的接口板越多。接口板上的物理接口卡也可以有一块或多块。交换网板可能没有,也可能有一块或多块,有多块的时候可以共同实现负荷分担冗余备份。在集中式转发架构下,网络设备可以不需要交换网板,接口板承担整个系统的业务数据的处理功能。在分布式转发架构下,网络设备可以有至少一块交换网板,通过交换网板实现多块接口板之间的数据交换,提供大容量的数据交换和处理能力。可选的,网络设备的形态也可以是只有一块板卡,即没有交换网 板,接口板和主控板的功能集成在该一块板卡上,此时接口板上的中央处理器和主控板上的中央处理器在该一块板卡上可以合并为一个中央处理器,执行两者叠加后的功能。具体采用哪种架构,取决于具体的组网部署场景,此处不做唯一限定。
在一些可能的实施例中,上述第一网络设备可以实现为虚拟化设备。虚拟化设备可以是运行有用于发送报文功能的程序的虚拟机(virtual machine,VM),虚拟路由器或虚拟交换机。虚拟化设备部署在硬件设备上(例如,物理服务器)。例如,可以基于通用的物理服务器结合网络功能虚拟化(network functions virtualization,NFV)技术来实现第一网络设备。
应理解,上述各种产品形态的网络设备,分别具有上述方法实施例中第一网络设备的任意功能,此处不再赘述。
进一步地,本申请实施例还提供了一种计算机程序产品,当该计算机程序产品在网络设备上运行时,使得网络设备执行上述图2对应的方法实施例中第一网络设备执行的方法。
本申请实施例还提供了一种芯片系统,包括处理器和接口电路,接口电路,用于接收指令并传输至处理器。其中,所述处理器用于实现上述任一方法实施例中的方法。
可选的,该芯片系统还包括存储器,该芯片系统中的处理器可以为一个或多个。该处理器可以通过硬件实现也可以通过软件实现。当通过硬件实现时,该处理器可以是逻辑电路、集成电路等。当通过软件实现时,该处理器可以是一个通用处理器,通过读取存储器中存储的软件代码来实现上述任一方法实施例中的方法。
可选的,该芯片系统中的存储器也可以为一个或多个。该存储器可以与处理器集成在一起,也可以和处理器分离设置,本申请并不限定。示例性的,存储器可以是非瞬时性处理器,例如只读存储器ROM,其可以与处理器集成在同一块芯片上,也可以分别设置在不同的芯片上,本申请对存储器的类型,以及存储器与处理器的设置方式不作具体限定。
以上对本申请实施例进行了详细介绍,本申请实施例方法中的步骤可以根据实际需要进行顺序调度、合并或删减;本申请实施例装置中的模块可以根据实际需要进行划分、合并或删减。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
应理解,在本申请实施例中,“与A相应的B”表示B与A相关联,根据A可以确定 B。但还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。

Claims (21)

  1. 一种报文传输方法,其特征在于,包括:
    第一网络设备获取远程内存直接访问RDMA报文,所述RDMA报文包括传输标识,其中,具有相同传输标识的RDMA报文需要按序传输;
    所述第一网络设备根据所述传输标识以及所述RDMA报文的目的地址确定目标转发路径;
    所述第一网络设备通过所述目标转发路径转发所述RDMA报文。
  2. 根据权利要求1所述的方法,其特征在于,所述第一网络设备根据所述传输标识以及所述RDMA报文的目的地址确定目标转发路径,包括:
    所述第一网络设备根据所述RDMA报文的目的地址,确定能够用于转发所述RDMA报文的多条转发路径;
    所述第一网络设备根据所述传输标识,从所述多条转发路径中确定所述目标转发路径。
  3. 根据权利要求2所述的方法,其特征在于,所述第一网络设备根据所述传输标识,从所述多条转发路径中确定所述目标转发路径,包括:
    所述第一网络设备根据所述传输标识,以及所述RDMA报文的源地址、目的地址、源端口号、目的端口号和协议号中的一种或多种,从所述多条转发路径中确定所述目标转发路径。
  4. 根据权利要求1-3任意一项所述的方法,其特征在于,所述第一网络设备获取RDMA报文,包括:
    所述第一网络设备依次获取多个RDMA报文,所述多个RDMA报文均包括所述传输标识,且所述多个RDMA报文的目的地址相同;
    所述第一网络设备通过所述目标转发路径上转发所述RDMA报文,包括:
    所述第一网络设备通过所述目标转发路径,依次转发所述多个RDMA报文。
  5. 根据权利要求1-4任意一项所述的方法,其特征在于,所述传输标识位于所述RDMA报文的RDMA头中。
  6. 根据权利要求5所述的方法,其特征在于,所述传输标识位于所述RDMA头中的一个或多个预留字段中。
  7. 根据权利要求5所述的方法,其特征在于,所述传输标识位于所述RDMA头中的扩展字段中。
  8. 根据权利要求1-7任意一项所述的方法,其特征在于,所述RDMA报文包括基于融合 以太网的RDMA RoCE报文、iWARP报文或无限带宽InfiniBand报文。
  9. 根据权利要求1-8中任意一项所述的方法,其特征在于,所述第一网络设备获取远程内存直接访问RDMA报文,包括:
    所述第一网络设备获取应用层发送的消息;
    所述第一网络设备根据所述消息生成至少一个RDMA报文,所述至少一个RDMA报文包括相同的传输标识。
  10. 根据权利要求1-8中任意一项所述的方法,其特征在于,所述第一网络设备获取远程内存直接访问RDMA报文,包括:
    所述第一网络设备接收第二网络设备发送的所述RDMA报文,所述第二网络设备为第一网络设备的上游设备。
  11. 一种网络设备,其特征在于,包括:获取单元、处理单元和发送单元;
    所述获取单元,用于获取远程内存直接访问RDMA报文,所述RDMA报文包括传输标识,其中,具有相同传输标识的RDMA报文需要按序传输;
    所述处理单元,用于根据所述传输标识以及所述RDMA报文的目的地址确定目标转发路径;
    所述发送单元,用于通过所述目标转发路径转发所述RDMA报文。
  12. 根据权利要求11所述的网络设备,其特征在于,所述处理单元,还用于:
    根据所述RDMA报文的目的地址,确定能够用于转发所述RDMA报文的多条转发路径;
    根据所述传输标识,从所述多条转发路径中确定所述目标转发路径。
  13. 根据权利要求12所述的网络设备,其特征在于,所述处理单元,具体用于根据所述传输标识,以及所述RDMA报文的源地址、目的地址、源端口号、目的端口号和协议号中的一种或多种,从所述多条转发路径中确定所述目标转发路径。
  14. 根据权利要求11-13任意一项所述的网络设备,其特征在于,所述获取单元,还用于依次获取多个RDMA报文,所述多个RDMA报文均包括所述传输标识,且所述多个RDMA报文的目的地址相同;
    所述处理单元,还用于通过所述目标转发路径,依次转发所述多个RDMA报文。
  15. 根据权利要求11-14任意一项所述的网络设备,其特征在于,所述传输标识位于所述RDMA报文的RDMA头中。
  16. 根据权利要求15所述的网络设备,其特征在于,所述传输标识位于所述RDMA头中 的一个或多个预留字段中。
  17. 根据权利要求15所述的网络设备,其特征在于,所述传输标识位于所述RDMA头中的扩展字段中。
  18. 根据权利要求11-17任意一项所述的网络设备,其特征在于,所述RDMA报文包括基于融合以太网的RDMA RoCE报文、iWARP报文或无限带宽InfiniBand报文。
  19. 根据权利要求11-18中任意一项所述的网络设备,其特征在于,所述获取单元,用于获取应用层发送的消息;
    所述处理单元,用于根据所述消息生成至少一个RDMA报文,所述至少一个RDMA报文包括相同的传输标识。
  20. 根据权利要求11-18中任意一项所述的网络设备,其特征在于,所述获取单元,用于接收第二网络设备发送的所述RDMA报文,所述第二网络设备为第一网络设备的上游设备。
  21. 一种网络系统,其特征在于,包括多个如权利要求11-20中任意一项所述的网络设备。
PCT/CN2022/091979 2021-07-29 2022-05-10 一种报文传输方法及相关装置 WO2023005335A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2024505160A JP2024527081A (ja) 2021-07-29 2022-05-10 パケット伝送方法および関連装置
EP22847949.9A EP4366266A4 (en) 2021-07-29 2022-05-10 MESSAGE TRANSMISSION METHOD AND ASSOCIATED DEVICE
US18/423,689 US20240214312A1 (en) 2021-07-29 2024-01-26 Packet transmission method and related apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110866839.3A CN115701060A (zh) 2021-07-29 2021-07-29 一种报文传输方法及相关装置
CN202110866839.3 2021-07-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/423,689 Continuation US20240214312A1 (en) 2021-07-29 2024-01-26 Packet transmission method and related apparatus

Publications (1)

Publication Number Publication Date
WO2023005335A1 true WO2023005335A1 (zh) 2023-02-02

Family

ID=85086246

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091979 WO2023005335A1 (zh) 2021-07-29 2022-05-10 一种报文传输方法及相关装置

Country Status (5)

Country Link
US (1) US20240214312A1 (zh)
EP (1) EP4366266A4 (zh)
JP (1) JP2024527081A (zh)
CN (1) CN115701060A (zh)
WO (1) WO2023005335A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140233579A1 (en) * 2013-02-20 2014-08-21 International Business Machines Corporation Directed route load/store packets for distributed switch initialization
CN109691039A (zh) * 2018-01-16 2019-04-26 华为技术有限公司 一种报文传输的方法及装置
CN111049752A (zh) * 2019-12-17 2020-04-21 锐捷网络股份有限公司 多传输线路的负载均衡方法及装置
CN111628921A (zh) * 2019-02-27 2020-09-04 华为技术有限公司 一种报文的处理方法、报文转发装置以及报文处理装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210119930A1 (en) * 2019-10-31 2021-04-22 Intel Corporation Reliable transport architecture

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140233579A1 (en) * 2013-02-20 2014-08-21 International Business Machines Corporation Directed route load/store packets for distributed switch initialization
CN109691039A (zh) * 2018-01-16 2019-04-26 华为技术有限公司 一种报文传输的方法及装置
CN111628921A (zh) * 2019-02-27 2020-09-04 华为技术有限公司 一种报文的处理方法、报文转发装置以及报文处理装置
CN111049752A (zh) * 2019-12-17 2020-04-21 锐捷网络股份有限公司 多传输线路的负载均衡方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4366266A4

Also Published As

Publication number Publication date
EP4366266A4 (en) 2024-10-16
JP2024527081A (ja) 2024-07-19
CN115701060A (zh) 2023-02-07
US20240214312A1 (en) 2024-06-27
EP4366266A1 (en) 2024-05-08

Similar Documents

Publication Publication Date Title
Qi et al. Assessing container network interface plugins: Functionality, performance, and scalability
US10778464B2 (en) NSH encapsulation for traffic steering establishing a tunnel between virtual extensible local area network (VxLAN) tunnel end points (VTEPS) using a NSH encapsulation header comprising a VxLAN header whose VNI field has been replaced by an NSH shim
JP6445015B2 (ja) ミドルウェアおよびアプリケーションの実行のためにエンジニアド・システムにおいてデータサービスを提供するためのシステムおよび方法
US8238324B2 (en) Method and system for network aware virtual machines
US8676980B2 (en) Distributed load balancer in a virtual machine environment
US8913613B2 (en) Method and system for classification and management of inter-blade network traffic in a blade server
US9300483B2 (en) Self-routing multicast in a software defined network fabric
US9426546B2 (en) Maintaining a fabric name across a distributed switch
US10375193B2 (en) Source IP address transparency systems and methods
JP5467541B2 (ja) 通信制御システム、スイッチングノード、通信制御方法、及び通信制御用プログラム
US11165693B2 (en) Packet forwarding
Tanyingyong et al. Using hardware classification to improve pc-based openflow switching
WO2022166465A1 (zh) 一种报文处理方法及相关装置
US10009282B2 (en) Self-protecting computer network router with queue resource manager
RU2740035C1 (ru) Шлюз прямых межсоединений
CN114553699A (zh) 报文传输方法、装置、设备及计算机可读存储介质
WO2022188530A1 (zh) 一种路由处理方法及网络设备
EP3718269A1 (en) Packet value based packet processing
WO2022007702A1 (zh) 一种报文处理方法及网络设备
WO2022261881A1 (zh) 一种网卡管理系统、报文处理方法及设备
WO2024093778A1 (zh) 一种报文处理方法以及相关装置
WO2023005335A1 (zh) 一种报文传输方法及相关装置
US10291750B1 (en) Aggregating data sessions between autonomous systems
CN104052671B (zh) Trill网络中的组播转发表项的处理方法及路由桥
WO2022194193A1 (zh) 用于获取路径的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22847949

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202427005246

Country of ref document: IN

ENP Entry into the national phase

Ref document number: 2024505160

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2022847949

Country of ref document: EP

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112024001703

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2022847949

Country of ref document: EP

Effective date: 20240201

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 112024001703

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20240126