CN114172796A - Fault positioning method and related device for communication network - Google Patents
Fault positioning method and related device for communication network Download PDFInfo
- Publication number
- CN114172796A CN114172796A CN202111603641.2A CN202111603641A CN114172796A CN 114172796 A CN114172796 A CN 114172796A CN 202111603641 A CN202111603641 A CN 202111603641A CN 114172796 A CN114172796 A CN 114172796A
- Authority
- CN
- China
- Prior art keywords
- network
- switches
- switch
- flow
- communication
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004891 communication Methods 0.000 title claims abstract description 146
- 238000000034 method Methods 0.000 title claims abstract description 56
- 230000002159 abnormal effect Effects 0.000 claims abstract description 38
- 230000004044 response Effects 0.000 claims description 30
- 230000005540 biological transmission Effects 0.000 claims description 27
- 239000000523 sample Substances 0.000 claims description 25
- 238000011156 evaluation Methods 0.000 claims description 23
- 230000036541 health Effects 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 10
- 230000005856 abnormality Effects 0.000 claims description 9
- 230000006870 function Effects 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000003993 interaction Effects 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 3
- 238000013024 troubleshooting Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 239000000470 constituent Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 235000019580 granularity Nutrition 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0677—Localisation of faults
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L49/00—Packet switching elements
- H04L49/55—Prevention, detection or correction of errors
- H04L49/555—Error detection
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The application discloses a fault positioning method and a related device of a communication network, which can be used in the technical field of information security or other fields. In the technical scheme provided by the application, the communication network comprises a first end side device, a second end side device and N switches, wherein the first end side device and the second end side device transmit data streams through the N switches, N is a positive integer, network flow when the data streams flow through each of S switches in the N switches is obtained, and S is a positive integer smaller than or equal to N; determining P switches with abnormal network flow of the data flow according to the network flow of the data flow flowing through each switch in the S switches, wherein P is a positive integer less than or equal to S; and then positioning the network fault in the communication network according to the position of each switch in the P switches in the communication network and the network flow information of the data flow in each switch in the P switches.
Description
Technical Field
The present application relates to the field of information security technologies, and in particular, to a method for locating a fault in a communication network and a related device.
Background
The essence of the network is communication, providing a path to ensure that information is sent from the source to the destination without errors. The network is composed of several nodes and links connecting the nodes, and the nodes in the network can be computers, switches, routers or mobile terminals. In the network operation process, each level of nodes may automatically switch paths and establish a data exchange channel, but in this process, network failures such as packet loss or delay may occur. In order to ensure the normal operation of the network, various faults occurring in the network need to be processed in time, and the primary task for solving the network fault is to position the network fault.
Therefore, how to locate the fault of the communication network becomes an urgent problem to be solved.
Disclosure of Invention
The application provides a fault positioning method and a related device of a communication network, which realize real-time positioning of network faults.
In a first aspect, the present application provides a method for locating a fault in a communication network, where the communication network includes a first end-side device, a second end-side device, and N switches, where the first end-side device and the second end-side device transmit data streams through the N switches, where N is a positive integer, and the method includes: acquiring network flow when the data flow flows through each of S switches in the N switches, wherein S is a positive integer less than or equal to N; determining P switches with abnormal network flow of the data flow according to the network flow of the data flow flowing through each switch in the S switches, wherein P is a positive integer less than or equal to S; and positioning the network fault in the communication network according to the position of each switch in the P switches in the communication network and the network traffic information of each switch in the P switches of the data stream, wherein the network traffic information comprises the value of each index in at least one communication index.
According to the method, data flow is transmitted between first end side equipment and second end side equipment in a communication network through N switches, P switches with abnormal network flow of the data flow are determined according to the acquired network flow when the data flow flows through each switch in S switches in the N switches, and network faults in the communication network are positioned according to the position of each switch in the communication network and the flow information of each switch in the P switches of the data flow, wherein N, S and P are positive integers, and N is larger than or equal to S and larger than or equal to P, so that the problem of positioning of the network faults in the communication network is solved. In addition, when the abnormal switch exists in the switches through which the data flow flows is determined according to the acquired network flow when the data flow flows through each switch, the network fault positioning process is triggered immediately, and the fault positioning efficiency and the real-time performance of the communication network are improved.
In a possible implementation manner, the obtaining network traffic of the data stream flowing through each switch of S switches among the N switches includes: the method comprises the steps of obtaining network flow collected by each flow probe in S flow probes, and obtaining the network flow when data flow flows through each switch in S switches, wherein the S flow probes correspond to the S switches one by one, and each flow probe in the S flow probes is used for collecting the network flow flowing through the corresponding switch in the S switches.
In the implementation mode, the S flow probes are used for acquiring the network flow when the data flow flows through each switch in the S switches, and the S flow probes are in one-to-one correspondence with the S switches, so that the efficiency of acquiring the network flow when the data flow flows through each switch is improved.
In a possible implementation manner, the determining, according to the network traffic of the data flow flowing through each switch of the S switches, P switches where the network traffic of the data flow is abnormal, and the network traffic information of each switch of the data flow in the P switches includes: detecting the network flow when the data flow flows through each switch in the S switches to obtain the network flow information when the data flow flows through each switch in the S switches; and judging the network flow information when the data flow flows through each switch in the S switches based on a preset network flow evaluation standard to obtain P switches with abnormal network flow of the data flow in the S switches and the network flow information of each switch of the data flow in the P switches, wherein the network flow evaluation standard indicates the health value of each index in the at least one communication index.
In the implementation mode, the network flow of the acquired data stream flowing through each switch is detected to obtain the network flow information of the data stream flowing through each switch, and the obtained network flow information is judged based on the preset network flow evaluation standard to obtain the P switches with abnormal network flow of the data stream and the network flow information of each switch of the data stream in the P switches, wherein the network flow evaluation standard indicates the health value of each index in at least one communication index, and the accuracy of judging the switches with abnormal network flow of the data stream is improved.
In one possible implementation manner, the locating a network fault in the communication network according to the location of each of the P switches in the communication network and the network traffic information of the data flow in each of the P switches includes: and if the P is larger than 2, if a first value of a first index in first network traffic information corresponding to a first switch in the P switches is not equal to a second value of the first index in second network traffic information corresponding to a second switch, and the first index is an abnormal index, determining that a network fault corresponding to the first index occurs on a transmission path between the first switch and the second switch.
In this implementation manner, when the number of the switches with the abnormal network traffic of the data stream is greater than 2, if a first value of a first index in first network traffic information corresponding to a first switch in the switches with the abnormal network traffic of the data stream is not equal to a second value of a first index in second network traffic information corresponding to a second switch, and the first index is an abnormal index, it is determined that a network fault corresponding to the first index occurs on a transmission path between the first switch and the second switch, and accuracy of fault location in the communication network is improved.
In one possible implementation manner, the locating a network fault in the communication network according to the location of each of the P switches in the communication network and the network traffic information of the data flow in each of the P switches includes: if a third switch in the P switches is a switch closest to the first end-side device in the S switches, determining that a network fault corresponding to a second index occurs on a transmission path before the third switch, where the second index is an index in which an abnormality occurs in network traffic information corresponding to the third switch.
In this implementation manner, if a third switch in the switches in which the network traffic of the data stream is abnormal is a switch closest to the first end-side device in all the switches that acquire the network traffic in the switches through which the data stream flows, it is determined that a network fault corresponding to a second index occurs on a transmission path before the third switch, and the second index is an index in which the network traffic information corresponding to the third switch is abnormal, so that accuracy of fault location in the communication network is improved.
In one possible implementation, the data flow is one of a plurality of network flows in the communication network, and accordingly, the method further includes: and if the transmission paths corresponding to all the data streams in the plurality of network streams and having faults contain the same communication equipment, determining that the same communication equipment has faults.
In this implementation, if the data stream is one of a plurality of network streams in the communication network, and the transmission path, which has a fault, corresponding to all the data streams in the plurality of network streams includes the same communication device, it is determined that the same communication device has a fault, and accuracy of fault location in the communication network is improved.
In one possible implementation, the at least one communication metric includes one or more of the following: flow, number of concurrent connections, link establishment ratio, connection non-response rate, connection failure rate, link termination ratio, retransmission rate, packet loss rate, time delay, response time, response rate, or service 0 window number.
In a second aspect, the present application provides a fault location apparatus for a communication network, where the communication network includes a first end-side device, a second end-side device, and N switches, where the first end-side device and the second end-side device transmit data streams through the N switches, where N is a positive integer, the apparatus includes: an obtaining module, configured to obtain a network traffic when the data stream flows through each of S switches of the N switches, where S is a positive integer smaller than or equal to N; a determining module, configured to determine, according to a network traffic of the data stream flowing through each of the S switches, P switches where the network traffic of the data stream is abnormal, where P is a positive integer smaller than or equal to S; a positioning module, configured to position a network fault in the communication network according to a position of each switch in the P switches in the communication network and network traffic information of each switch in the P switches of the data stream, where the network traffic information includes a value of each indicator in at least one communication indicator.
In a possible implementation manner, the obtaining module is specifically configured to: the method comprises the steps of obtaining network flow collected by each flow probe in S flow probes, and obtaining the network flow when data flow flows through each switch in S switches, wherein the S flow probes correspond to the S switches one by one, and each flow probe in the S flow probes is used for collecting the network flow flowing through the corresponding switch in the S switches.
In a possible implementation manner, the determining module is specifically configured to: detecting the network flow when the data flow flows through each switch in the S switches to obtain the network flow information when the data flow flows through each switch in the S switches; and judging the network flow information when the data flow flows through each switch in the S switches based on a preset network flow evaluation standard to obtain P switches with abnormal network flow of the data flow in the S switches and the network flow information of each switch of the data flow in the P switches, wherein the network flow evaluation standard indicates the health value of each index in the at least one communication index.
In a possible implementation manner, the positioning module is specifically configured to: and if the P is larger than 2, if a first value of a first index in first network traffic information corresponding to a first switch in the P switches is not equal to a second value of the first index in second network traffic information corresponding to a second switch, and the first index is an abnormal index, determining that a network fault corresponding to the first index occurs on a transmission path between the first switch and the second switch.
In a possible implementation manner, the positioning module is specifically configured to: if a third switch in the P switches is a switch closest to the first end-side device in the S switches, determining that a network fault corresponding to a second index occurs on a transmission path before the third switch, where the second index is an index in which an abnormality occurs in network traffic information corresponding to the third switch.
In one possible implementation, the data flow is one of a plurality of network flows in the communication network, and accordingly, the positioning module is further configured to: and if the transmission paths corresponding to all the data streams in the plurality of network streams and having faults contain the same communication equipment, determining that the same communication equipment has faults.
In one possible implementation, the at least one communication metric includes one or more of the following: flow, number of concurrent connections, link establishment ratio, connection non-response rate, connection failure rate, link termination ratio, retransmission rate, packet loss rate, time delay, response time, response rate, or service 0 window number.
The beneficial effects of the second aspect and various possible implementations of the second aspect can be seen in the beneficial effects of the first aspect and various possible implementations of the first aspect, and are not described herein again.
In a third aspect, the present application provides a fault location apparatus for a communication network. The apparatus may include a processor coupled with a memory. Wherein the memory is configured to store program code and the processor is configured to execute the program code in the memory to implement the method of the first aspect or any one of the implementations.
Optionally, the apparatus may further comprise the memory.
In a fourth aspect, the present application provides a chip comprising at least one processor and a communication interface, the communication interface and the at least one processor are interconnected by a line, and the at least one processor is configured to execute a computer program or instructions to perform the method according to the first aspect or any one of the possible implementations thereof.
In a fifth aspect, the present application provides a computer readable medium storing program code for execution by a device, the program code comprising instructions for performing the method according to the first aspect or any one of its possible implementations.
In a sixth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method according to the first aspect or any one of its possible implementations.
In a seventh aspect, the present application provides a computing device comprising at least one processor and a communication interface, the communication interface and the at least one processor being interconnected by a line, the communication interface being in communication with a target system, the at least one processor being configured to execute a computer program or instructions to perform the method according to the first aspect or any one of the possible implementations.
In an eighth aspect, the present application provides a computing system comprising at least one processor and a communication interface, the communication interface and the at least one processor being interconnected by a line, the communication interface being in communication with a target system, the at least one processor being configured to execute a computer program or instructions to perform the method according to the first aspect or any one of the possible implementations thereof.
Drawings
FIG. 1 is a diagram of a system architecture according to an embodiment of the present application;
fig. 2 is a flowchart illustrating a method for locating a fault in a communication network according to an embodiment of the present application;
fig. 3 is a flowchart illustrating a method for locating a fault in a communication network according to an embodiment of the present application;
FIG. 4 is a schematic block diagram of a fault location device of a communication network according to one embodiment of the present application;
fig. 5 is a schematic structural diagram of a fault location apparatus of a communication network according to an embodiment of the present application.
Detailed Description
Technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the fault location method and the related apparatus for the communication network disclosed in the present application may be used in the technical field of information security, and may also be used in any fields other than the technical field of information security.
Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present application. As shown in fig. 1, the network fault location system 100 includes a communication network 110 and a network fault location server 120.
Communication network 110 may include a first end-side device 111, a second end-side device 112, switch 1, switch 2 …, switch N. The first end-side device 111 and the second end-side device 112 perform data communication through a switch, and the number of the switches may be one or more, which is not limited in the present application.
The communication network 110 may comprise more end-side devices, the first end-side device 111 and the second end-side device 112 being only any two end-side devices of the plurality of end-side devices in the communication network 110. The end-side devices can be clients and servers. In general, the first end-side device is a client, and the second end-side device is a server.
Network fault location server 120 is a device for providing location of network faults in a communication network. The network fault location server 120 may be a blade server, a rack server, or the like, and the network fault location server 120 may also be deployed in a server cluster in the cloud, which is not limited in this application.
It is to be understood that the system architecture shown in fig. 1 is merely one example of a network fault location system provided herein, and in other embodiments of the present application, the network fault location system 100 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Fig. 2 is a flowchart illustrating a method for locating a fault in a communication network according to an embodiment of the present application, where as shown in fig. 2, the method at least includes S201 to S203.
S201, network flow when data flow flows through each of S switches in the N switches is obtained, wherein S is a positive integer smaller than or equal to N.
The communication network comprises a first end side device, a second end side device and X switches, wherein the first end side device and the second end side device transmit data streams through N switches in the X switches in the communication network, N is a positive integer, and X is a positive integer greater than or equal to N.
More end-side devices may be included in the communication network, the first end-side device and the second end-side device being only any two of the plurality of end-side devices in the communication network. The end-side devices include clients and servers.
The method comprises the steps that network flow when a data flow flows through each of S switches in N switches is obtained, and when S is equal to N, communication flow when the data flow passes through each switch is obtained; when S is smaller than N, it means that only the communication traffic of the partial switch through which the data flow flows is acquired.
There are several possible implementations of obtaining network traffic when a data stream flows through each of S switches of N switches:
in a possible implementation manner, S traffic probes are arranged in the communication network, the S traffic probes correspond to the S switches one to one, each traffic probe in the S traffic probes is used to collect network traffic flowing through a corresponding switch in the S switches, and network traffic when data flows flow through each switch in the S switches can be obtained through the network traffic collected by each traffic probe in the S traffic probes.
Illustratively, the flow probe may be a sniffer (sniffer) probe.
In another possible implementation, network traffic of a data stream flowing through each of S switches in a communication network is collected by a telemetry (telemetric) function of the S switches.
Illustratively, the remote measuring function of each switch in S switches in the communication network is started, and the remote measuring function is used for cooperatively collecting the mutual access traffic between the servers.
In yet another possible implementation, the communication traffic between servers and within servers is collected by a server Transmission Control Protocol (TCP).
Illustratively, the server executes a TCP acquisition script according to a preset period, and randomly reports network traffic of a data stream to an acquirer corresponding to each of the S switches through a hypertext transfer protocol (HTTP) in a time window.
S202, determining P switches with abnormal network flow of the data flow according to the network flow of the data flow flowing through each switch in the S switches, wherein P is a positive integer less than or equal to S.
The switch with the abnormal network traffic of the data flow refers to a switch corresponding to the network traffic with network faults such as packet loss, time delay and the like in the acquired network traffic.
In a possible implementation manner, network traffic when the data stream flows through each switch in the S switches is detected to obtain network traffic information when the data stream flows through each switch in the S switches, and then the network traffic information when the data stream flows through each switch in the S switches is judged based on a preset network traffic evaluation standard to obtain P switches in the S switches where the network traffic of the data stream is abnormal and network traffic information of each switch in the P switches where the data stream flows, where the network traffic evaluation standard indicates a health value of each index in at least one communication index.
As an example, the network traffic information includes a value of each of the at least one communication metric, and the network traffic evaluation criterion indicates a health value of each of the at least one communication metric. The communication index may include traffic, the number of concurrent connections, the link establishment ratio, the connection non-response rate, the connection failure rate, the link termination ratio, the retransmission rate, the packet loss rate, the time delay, the response time, the response rate, the number of times of the service 0 window, and the like.
As an example, aiming at the multi-source of the acquisition modes and the principle of network faults, 12 indexes of 4 dimensions of service trend, link establishment/link removal, transmission performance and load interaction are screened out and used for establishing a flow health index system, data of each acquisition mode is abstracted and converted according to the health index system so as to shield the difference of different data, and the data acquisition and upper-layer analysis functions are decoupled.
Illustratively, the service trend dimension in the established traffic health index system comprises two indexes of traffic and concurrent connection, the link establishment/link disconnection dimension comprises four indexes of link establishment proportion, connection non-response rate, connection failure rate and link termination proportion, the transmission performance dimension comprises three indexes of retransmission rate, packet loss rate and time delay, and the load interaction dimension comprises three indexes of response time, response rate and service 0 window times.
After a characteristic index system is constructed, the network flow condition can be analyzed by using an intelligent algorithm, the intelligent algorithm forms an evaluation standard for the health condition of the network flow by learning a large amount of monitoring data, and the abnormality of the network flow can be identified with high accuracy by matching with an expert rule; meanwhile, the system can detect the full network flow in real time, find abnormality and output prompt information to help troubleshooting.
Illustratively, an intelligent algorithm is used to analyze a large amount of monitored network traffic data to obtain a network traffic evaluation criterion, which indicates a health value of each index in at least one index in a traffic health index system.
As an example, judging network traffic information when a data flow flows through each switch of S switches based on a preset network traffic evaluation criterion, and obtaining P switches in which the network traffic of the data flow in the S switches is abnormal includes: and comparing an actual value corresponding to each communication index in the network flow information when the data stream flows through each switch in the S switches with a health value corresponding to a corresponding communication index in a preset network flow evaluation standard, and marking the switch corresponding to the network flow information of which the actual value corresponding to the communication index is smaller than the health value corresponding to the communication index in the preset network flow evaluation standard as the switch with abnormal network flow of the data stream.
S203, according to the position of each switch in the P switches in the communication network and the network flow information of the data flow in each switch in the P switches, positioning the network fault in the communication network, wherein the network flow information comprises the value of each index in at least one communication index.
In a possible implementation manner, when P is greater than 2, that is, when the number of switches in which network traffic information is abnormal in switches through which a data stream between a first end-side device and a second end-side device flows is greater than 2, if a first value of a first index in first network traffic information corresponding to a first switch in the P switches is not equal to a second value of a first index in second network traffic information corresponding to a second switch, and the first index is an index in which an abnormality occurs, it is determined that a network fault corresponding to the first index occurs on a transmission path between the first switch and the second switch.
The network fault corresponding to the first index on the transmission path between the first switch and the second switch represents that the network fault corresponding to the first index occurs on one or more communication devices or links on the transmission path, and the transmission path comprises the first switch and the second switch.
In another possible implementation manner, if a third switch of the P switches is a switch closest to the first end-side device among the S switches, it is determined that a network fault corresponding to a second index occurs on a transmission path before the third switch, where the second index is an index in which an abnormality occurs in network traffic information corresponding to the third switch.
The switch closest to the first end-side device indicates a switch closest to the first end-side device on a transmission path of the data stream, and does not indicate a switch closest to the first end-side device in physical distance.
As an example, communication devices through which a data stream flows are a first end-side device, a first switch, a second switch, and a second end-side device in sequence, and when a first value of a first indicator in first network traffic information corresponding to the first switch is 0 and a second value of the first indicator in second network traffic information corresponding to the second switch is 1, it indicates that the data stream generates packet loss when the first switch flows to the second switch; when a first value of a first index in first network traffic information corresponding to a first switch and a second value of the first index in second network traffic information corresponding to a second switch are both 1, indicating that packet loss occurs when a data stream flows from a first end-side device to the first switch; when a first value of a first index in first network traffic information corresponding to the first switch and a second value of the first index in second network traffic information corresponding to the second switch are both 0, and a packet loss situation occurs in the feedback of the second end-side device, it indicates that a packet loss occurs when a data stream flows from the second switch to the second end-side device.
As another example, the communication device through which the data stream flows is a first end-side device, a first switch, a second switch, and a second end-side device in sequence, where the network traffic information includes two indicators, namely, a link establishment delay and a server response delay, when the values of the link establishment delay and the server response delay in the first network traffic information corresponding to the first switch are both 0, and the values of the link establishment delay and the server response delay in the second network traffic information corresponding to the second switch are both 1, it indicates that the data stream is delayed when the first switch flows to the second switch; when the values of the link establishment delay and the server response delay in the first network traffic information corresponding to the first switch are both 1, and the values of the link establishment delay and the server response delay in the second network traffic information corresponding to the second switch are both 1, indicating that a delay occurs when a data stream flows from the first end-side device to the first switch; when the values of the link establishment delay and the server response delay in the first network traffic information corresponding to the first switch are both 0, and the values of the link establishment delay and the server response delay in the second network traffic information corresponding to the second switch are both 0, it indicates that the delay occurs in the second end-side device.
As another example, the data stream flows from the first end-side device to the second end-side device via the multiple switches, but only the third network traffic information of the third switch among the multiple switches is collected, and when the second index is packet loss, if a value of client packet loss in the third network traffic information is 1, it indicates that packet loss occurs when the data stream flows from the first end-side device to the third switch; when the value of the server packet loss in the third network traffic information is 1, which indicates that packet return occurs, packet loss occurs when the data stream flows from the second end-side device to the third switch.
As another example, the data stream flows from the first end-side device to the second end-side device via the plurality of switches, but only the third network traffic information of the third switch among the plurality of switches is collected, and in the case where the second index is delay, when the value of the server response delay in the third network traffic information is 1 and the value of the link establishment delay is 0, it indicates that the delay occurs on the first end-side device side; when the value of the server response delay and the value of the link establishment delay in the third network traffic information are both 1, it indicates that the delay occurs on the second end-side device side.
In another possible implementation manner, the data flow is one of a plurality of network flows in the communication network, and if the failed transmission paths corresponding to all the data flows in the plurality of network flows include the same communication device, it is determined that the same communication device fails.
As an example, when the number of network flows with packet loss in a communication network exceeds a preset threshold in a time window, it indicates that a large number of data flows have concentrated packet loss in a short time, and if transmission paths with packet loss corresponding to all the data flows in the multiple network flows include the same communication device, it is determined that the same communication device has packet loss, where the communication device may include an access switch, a convergence switch, a server, or the like.
According to the technical scheme, the network fault in the communication network is located according to the position of the switch with the abnormal network flow information in the switches through which the data stream flows in the communication network and the network flow information of each switch in the switch with the abnormal network flow information, so that the real-time automatic troubleshooting of the abnormal network fault in the communication network is realized, the efficiency of locating the network fault is improved, the time is saved, and the human resources are saved.
Fig. 3 is a flowchart illustrating a method for locating a fault in a communication network according to an embodiment of the present application. As shown in fig. 3, the method includes at least S301 to S304.
S301, network flow when data flow flows through each of S switches in the N switches is obtained, wherein S is a positive integer smaller than or equal to N.
It should be noted that S201 may be referred to as S301, and details are not repeated here.
And S302, detecting the network flow when the data flow flows through each switch in the S switches to obtain the network flow information when the data flow flows through each switch in the S switches.
As an example, the network traffic information includes a value of each of at least one communication index, and the communication index may include traffic, a number of concurrent connections, a link establishment ratio, a connection non-response rate, a connection failure rate, a link termination ratio, a retransmission rate, a packet loss rate, a time delay, a response time, a response rate, a service 0 window number, and the like.
It should be noted that, the method for detecting the network traffic of the data stream flowing through the switch to obtain the network traffic information of the data stream flowing through the switch may refer to the existing method for obtaining the network traffic information according to the network traffic, and details are not described here.
And S303, judging the network traffic information when the data flow flows through each switch in the S switches based on a preset network traffic evaluation standard, and obtaining P switches with abnormal network traffic in the S switches and the network traffic information of each switch in the P switches of the data flow, wherein P is a positive integer less than or equal to S.
The switch with the abnormal network traffic of the data flow refers to a switch corresponding to the network traffic with network faults such as packet loss, time delay and the like in the acquired network traffic. The network traffic evaluation criteria indicates a health value for each of the at least one communication metric.
In a possible implementation mode, aiming at the multi-source of the acquisition mode and the principle of network faults, 12 indexes of 4 dimensions of service trend, chain establishment/chain removal, transmission performance and load interaction are screened out and used for establishing a flow health index system, data of each acquisition mode is abstracted and converted according to the health index system so as to shield the difference of different data, and the data acquisition and upper layer analysis functions are decoupled.
Illustratively, the service trend dimension in the established traffic health index system comprises two indexes of traffic and concurrent connection, the link establishment/link disconnection dimension comprises four indexes of link establishment proportion, connection non-response rate, connection failure rate and link termination proportion, the transmission performance dimension comprises three indexes of retransmission rate, packet loss rate and time delay, and the load interaction dimension comprises three indexes of response time, response rate and service 0 window times.
After a characteristic index system is constructed, the network flow condition can be analyzed by using an intelligent algorithm, the intelligent algorithm forms an evaluation standard for the health condition of the network flow by learning a large amount of monitoring data, and the abnormality of the network flow can be identified with high accuracy by matching with an expert rule; meanwhile, the system can detect the full network flow in real time, find abnormality and output prompt information to help troubleshooting.
Illustratively, an intelligent algorithm is used to analyze a large amount of monitored network traffic data to obtain a network traffic evaluation criterion, which indicates a health value of each index in at least one index in a traffic health index system.
As an example, judging network traffic information when a data flow flows through each switch of S switches based on a preset network traffic evaluation criterion, and obtaining P switches in which the network traffic of the data flow in the S switches is abnormal includes: and comparing an actual value corresponding to each communication index in the network flow information when the data stream flows through each switch in the S switches with a health value corresponding to a corresponding communication index in a preset network flow evaluation standard, and marking the switch corresponding to the network flow information of which the actual value corresponding to the communication index is smaller than the health value corresponding to the communication index in the preset network flow evaluation standard as the switch with abnormal network flow of the data stream.
S304, according to the position of each switch in the P switches in the communication network and the network flow information of the data flow in each switch in the P switches, the network fault in the communication network is positioned.
It should be noted that S304 may refer to S203, and details are not repeated here.
According to the technical scheme, through reasonably setting an index system, the network communication quality is analyzed, the traditional problems of access failure, slow access and the like are refined into multiple granularities of abnormal session link establishment, abnormal transmission delay, abnormal transmission packet loss and the like, the requirements of different flow models and different technical stacks on network monitoring are met, the fault perception capability is improved, and meanwhile the fault positioning accuracy of the communication network is further improved.
Fig. 4 is a schematic configuration diagram of a fault location device of a communication network according to an embodiment of the present application. As shown in fig. 4, the apparatus 400 may include an acquisition module 401, a determination module 402, and a location module 403.
Any module of the acquisition module, the determination module and the positioning module in the embodiments of the present application may be wholly or partially implemented by software and/or hardware. The part realized by software can be run on the processor to realize corresponding functions, and the part realized by hardware can be a constituent part of the processor.
The apparatus 400 may be used to implement the methods shown in fig. 2 or fig. 3.
Fig. 5 is a schematic structural diagram of a fault location apparatus of a communication network according to an embodiment of the present application. The apparatus 500 shown in fig. 5 may be used to perform the method described in any of the previous embodiments.
As shown in fig. 5, the apparatus 500 of the present embodiment includes: memory 501, processor 502, communication interface 503, and bus 504. The memory 501, the processor 502 and the communication interface 503 are connected to each other by a bus 504.
The memory 501 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 501 may store a program and the processor 502 may be adapted to perform the steps of the methods shown in fig. 2 or fig. 3 when the program stored in the memory 501 is executed by the processor 502.
The processor 502 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, and is configured to execute related programs to implement the method for locating a fault in a communication network according to the embodiment of the present application.
The processor 502 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method of the embodiments of the present application may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 502.
The processor 502 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 501, and the processor 502 reads the information in the memory 501, and performs the functions required to be performed by each method in the embodiments of the present application in combination with the hardware thereof, for example, the steps/functions of the embodiments shown in fig. 2 or fig. 3 may be performed.
The communication interface 503 may enable communication between the apparatus 500 and other devices or communication networks using, but not limited to, transceiver means such as transceivers.
Bus 504 may include a path that transfers information between various components of apparatus 500 (e.g., memory 501, processor 502, communication interface 503).
It should be understood that the apparatus 500 shown in the embodiments of the present application may be an electronic device, or may also be a chip configured in the electronic device.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In addition, the "/" in this document generally indicates that the former and latter associated objects are in an "or" relationship, but may also indicate an "and/or" relationship, which may be understood with particular reference to the former and latter text.
In the present application, "at least one" means one or more, "a plurality" means two or more. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (12)
1. A fault positioning method of a communication network is characterized in that the communication network comprises a first end-side device, a second end-side device and N switches, the first end-side device and the second end-side device transmit data streams through the N switches, N is a positive integer, and the method comprises the following steps:
acquiring network flow when the data flow flows through each of S switches in the N switches, wherein S is a positive integer less than or equal to N;
determining P switches with abnormal network flow of the data flow according to the network flow of the data flow flowing through each switch in the S switches, wherein P is a positive integer less than or equal to S;
and positioning the network fault in the communication network according to the position of each switch in the P switches in the communication network and the network traffic information of each switch in the P switches of the data stream, wherein the network traffic information comprises the value of each index in at least one communication index.
2. The method of claim 1, wherein the obtaining network traffic for the data flow as it flows through each of S switches of the N switches comprises:
the method comprises the steps of obtaining network flow collected by each flow probe in S flow probes, and obtaining the network flow when data flow flows through each switch in S switches, wherein the S flow probes correspond to the S switches one by one, and each flow probe in the S flow probes is used for collecting the network flow flowing through the corresponding switch in the S switches.
3. The method according to claim 1, wherein the determining P switches with abnormal network traffic of the data flow according to the network traffic of the data flow flowing through each switch of the S switches, and the network traffic information of each switch of the P switches of the data flow comprises:
detecting the network flow when the data flow flows through each switch in the S switches to obtain the network flow information when the data flow flows through each switch in the S switches;
and judging the network flow information when the data flow flows through each switch in the S switches based on a preset network flow evaluation standard to obtain P switches with abnormal network flow of the data flow in the S switches and the network flow information of each switch of the data flow in the P switches, wherein the network flow evaluation standard indicates the health value of each index in the at least one communication index.
4. The method of any of claims 1 to 3, wherein said locating the network failure in the communication network based on the location of said each of the P switches in the communication network and the network traffic information of the data flow at each of the P switches comprises:
and if the P is larger than 2, if a first value of a first index in first network traffic information corresponding to a first switch in the P switches is not equal to a second value of the first index in second network traffic information corresponding to a second switch, and the first index is an abnormal index, determining that a network fault corresponding to the first index occurs on a transmission path between the first switch and the second switch.
5. The method of any of claims 1 to 3, wherein said locating the network failure in the communication network based on the location of said each of the P switches in the communication network and the network traffic information of the data flow at each of the P switches comprises:
if a third switch in the P switches is a switch closest to the first end-side device in the S switches, determining that a network fault corresponding to a second index occurs on a transmission path before the third switch, where the second index is an index in which an abnormality occurs in network traffic information corresponding to the third switch.
6. The method according to any of claims 1 to 3, wherein the data flow is one of a plurality of network flows in the communication network, and accordingly, the method further comprises:
and if the transmission paths corresponding to all the data streams in the plurality of network streams and having faults contain the same communication equipment, determining that the same communication equipment has faults.
7. The method of claim 1, wherein the at least one communication metric comprises one or more of: flow, number of concurrent connections, link establishment ratio, connection non-response rate, connection failure rate, link termination ratio, retransmission rate, packet loss rate, time delay, response time, response rate, or service 0 window number.
8. A fault location device of a communication network, wherein the communication network includes a first end-side device, a second end-side device and N switches, the first end-side device and the second end-side device transmit data streams through the N switches, N is a positive integer, the device includes:
an obtaining module, configured to obtain a network traffic when the data stream flows through each of S switches of the N switches, where S is a positive integer smaller than or equal to N;
a determining module, configured to determine, according to a network traffic of the data stream flowing through each of the S switches, P switches where the network traffic of the data stream is abnormal, where P is a positive integer smaller than or equal to S;
a positioning module, configured to position a network fault in the communication network according to a position of each switch in the P switches in the communication network and network traffic information of each switch in the P switches of the data stream, where the network traffic information includes a value of each indicator in at least one communication indicator.
9. A fault location device for a communication network, comprising: a memory and a processor;
the memory is to store program instructions;
the processor is configured to invoke program instructions in the memory to perform the method of any of claims 1 to 7.
10. A chip comprising at least one processor and a communication interface, the communication interface and the at least one processor being interconnected by a line, the at least one processor being configured to execute a computer program or instructions to perform the method of any one of claims 1 to 7.
11. A computer-readable medium, characterized in that the computer-readable medium stores program code for computer execution, the program code comprising instructions for performing the method of any one of claims 1 to 7.
12. A computer program product comprising instructions that, when executed, cause a computer to perform the method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111603641.2A CN114172796B (en) | 2021-12-24 | 2021-12-24 | Fault positioning method and related device for communication network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111603641.2A CN114172796B (en) | 2021-12-24 | 2021-12-24 | Fault positioning method and related device for communication network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114172796A true CN114172796A (en) | 2022-03-11 |
CN114172796B CN114172796B (en) | 2024-01-30 |
Family
ID=80488121
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111603641.2A Active CN114172796B (en) | 2021-12-24 | 2021-12-24 | Fault positioning method and related device for communication network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114172796B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117193272A (en) * | 2023-11-07 | 2023-12-08 | 常州华纳电气有限公司 | Electronic control test data management system and method based on big data |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9438471B1 (en) * | 2012-02-20 | 2016-09-06 | F5 Networks, Inc. | Multi-blade network traffic management apparatus with improved failure handling and methods thereof |
CN107835098A (en) * | 2017-11-28 | 2018-03-23 | 车智互联(北京)科技有限公司 | A kind of network fault detecting method and system |
CN110380907A (en) * | 2019-07-26 | 2019-10-25 | 京信通信系统(中国)有限公司 | A kind of network fault diagnosis method, device, the network equipment and storage medium |
CN113162800A (en) * | 2021-03-12 | 2021-07-23 | 电子科技大学 | Network link performance index abnormity positioning method based on reinforcement learning |
WO2021244415A1 (en) * | 2020-06-03 | 2021-12-09 | 华为技术有限公司 | Network failure detection method and apparatus |
-
2021
- 2021-12-24 CN CN202111603641.2A patent/CN114172796B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9438471B1 (en) * | 2012-02-20 | 2016-09-06 | F5 Networks, Inc. | Multi-blade network traffic management apparatus with improved failure handling and methods thereof |
CN107835098A (en) * | 2017-11-28 | 2018-03-23 | 车智互联(北京)科技有限公司 | A kind of network fault detecting method and system |
CN110380907A (en) * | 2019-07-26 | 2019-10-25 | 京信通信系统(中国)有限公司 | A kind of network fault diagnosis method, device, the network equipment and storage medium |
WO2021244415A1 (en) * | 2020-06-03 | 2021-12-09 | 华为技术有限公司 | Network failure detection method and apparatus |
CN113162800A (en) * | 2021-03-12 | 2021-07-23 | 电子科技大学 | Network link performance index abnormity positioning method based on reinforcement learning |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117193272A (en) * | 2023-11-07 | 2023-12-08 | 常州华纳电气有限公司 | Electronic control test data management system and method based on big data |
CN117193272B (en) * | 2023-11-07 | 2024-01-26 | 常州华纳电气有限公司 | Electronic control test data management system and method based on big data |
Also Published As
Publication number | Publication date |
---|---|
CN114172796B (en) | 2024-01-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110493042B (en) | Fault diagnosis method and device and server | |
CN111869163B (en) | Fault detection method, device and system | |
US8443074B2 (en) | Constructing an inference graph for a network | |
JP6097889B2 (en) | Monitoring system, monitoring device, and inspection device | |
KR102455332B1 (en) | Methods and devices for determining the state of a network device | |
CN108900319B (en) | Fault detection method and device | |
CN108092854B (en) | Test method and device for train-level Ethernet equipment based on IEC61375 protocol | |
US11038587B2 (en) | Method and apparatus for locating fault cause, and storage medium | |
US10447561B2 (en) | BFD method and apparatus | |
CN111600759B (en) | Method and device for positioning deadlock fault in topological structure | |
CN113938407A (en) | Data center network fault detection method and device based on in-band network telemetry system | |
KR102055363B1 (en) | System for performing anomaly detection using traffic classification | |
JP2021502788A (en) | Detection of sources of computer network failures | |
CN114172796B (en) | Fault positioning method and related device for communication network | |
CN111865667A (en) | Network connectivity fault root cause positioning method and device | |
JP2019102974A (en) | Data collection system, controller, control program, gateway unit, and gateway program | |
CN111082979A (en) | Intelligent substation process layer secondary circuit fault diagnosis method based on switch and fault diagnosis host | |
CN106506237A (en) | A kind of Fault Locating Method of substation communication network and device | |
CN110943864B (en) | Network anomaly positioning method and device of distributed storage system | |
CN110609761B (en) | Method and device for determining fault source, storage medium and electronic equipment | |
CN111654405A (en) | Method, device, equipment and storage medium for fault node of communication link | |
CN113454950B (en) | Network equipment based on flow statistics and link real-time fault detection method and system | |
CN110896544B (en) | Fault delimiting method and device | |
CN113810332B (en) | Encrypted data message judging method and device and computer equipment | |
CN115242610A (en) | Link quality monitoring method and device, electronic equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |