WO2022247675A1

WO2022247675A1 - Device operation and maintenance method, network device, and storage medium

Info

Publication number: WO2022247675A1
Application number: PCT/CN2022/093050
Authority: WO
Inventors: 张洁
Original assignee: 中兴通讯股份有限公司
Priority date: 2021-05-24
Filing date: 2022-05-16
Publication date: 2022-12-01
Also published as: CN115396295A

Abstract

Embodiments of the present disclosure provide a device operation and maintenance method, a network device, and a storage medium. The network device is configured with two management modules and a plurality of monitoring modules, wherein the two management modules are in a master-standby relationship. On this basis, the method comprises: monitoring modules monitor a target object, and send feedback information to a master management module of two management modules when a preset condition is satisfied, wherein the feedback information is used for instructing the master management module to implement a preset operation and maintenance action; and a standby management module of the two management modules switches to be a master management module when determining that an anomaly occurs in the current master management module.

Description

Equipment operation and maintenance method, network equipment and storage medium

Cross References to Related Applications

This disclosure claims the priority of the Chinese patent application CN202110566486.5 filed on May 24, 2021, entitled "Equipment Operation and Maintenance Method, Network Equipment and Storage Medium", the entire content of which is incorporated into this disclosure by reference.

technical field

The present disclosure relates to the field of communication technologies, and in particular, to a device operation and maintenance method, network device and storage medium.

Background technique

In traditional network management technology, the operation and maintenance method for network equipment is usually as follows: the network management platform captures abnormal information in the protocol information sent by the network equipment, and the management personnel maintain the network equipment according to the abnormal information. Send NETCONF (Network Configuration Protocol, Network Configuration Protocol) information or SNMP (Simple Network Management Protocol, Simple Network Management Protocol) information, when the network management platform captures abnormal information, it can notify the management personnel in the form of human-computer interaction, and the management personnel can Analyze the information and maintain the network equipment according to the analysis results.

It can be seen that traditional technology does not possess intelligence, so there are certain limitations. For example, traditional technology has a certain lag in the fault perception of network equipment, which leads to a lag in operation and maintenance; another example, traditional technology requires a lot of manpower and material resources, resulting in high operation and maintenance costs, etc.

Contents of the invention

Based on this, the embodiments of the present disclosure provide a device operation and maintenance method, a network device, and a storage medium, so as to solve the problem of limitation in the operation and maintenance mode of the network device in the traditional technology.

In the first aspect, the embodiment of the present disclosure provides a device operation and maintenance method, which is applied to a network device. The network device is configured with two management modules and several monitoring modules. The two management modules are in a master-backup relationship with each other. The method includes: the monitoring module monitors the target object, and sends feedback information to the main management module of the two management modules when a preset condition is met, and the feedback information is used to indicate that the two management modules The main management module of the two management modules implements preset operation and maintenance actions; the standby management module of the two management modules switches to the main management module when it is determined that the main management module of the two management modules is abnormal.

In a second aspect, an embodiment of the present disclosure provides a network device, including a processor and a memory; the memory is used to store a computer program; the processor is used to execute the computer program and execute the computer program Realize the device operation and maintenance method as described in the first aspect.

In a third aspect, an embodiment of the present disclosure provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the processor implements the method described in the first aspect. The equipment operation and maintenance method described above.

Description of drawings

FIG. 1 is a schematic configuration diagram of a network device in an embodiment of the present disclosure;

FIG. 2 is a schematic flow diagram of a device operation and maintenance method in an embodiment of the present disclosure;

3 is a schematic diagram of two management modules sharing a communication interface in an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an application scenario in an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of another application scenario in the embodiments of the present disclosure;

Fig. 6 is a schematic flow diagram of determining whether the main management module is abnormal by the standby management module in the embodiment of the present disclosure;

Fig. 7 is a schematic structural block diagram of a network device in an embodiment of the present disclosure.

Detailed ways

The following will clearly and completely describe the technical solutions in the embodiments of this specification with reference to the drawings in the embodiments of this specification. Apparently, the described embodiments are part of the embodiments of this specification, not all of them. Based on the embodiments in this specification, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts fall within the protection scope of the present disclosure.

The flow charts shown in the drawings are just illustrations, and do not necessarily include all contents and operations/steps, nor must they be performed in the order described. For example, some operations/steps can be decomposed, combined or partly combined, so the actual order of execution may change according to the actual situation.

Some implementations of this specification will be described in detail below with reference to the accompanying drawings. In the case of no conflict, the following embodiments and features in the embodiments can be combined with each other.

A device operation and maintenance method provided by an embodiment of the present disclosure may be applied to a network device, and the network device may include a switch, a router, and the like. For example, the network device may be a service switch, a service router, an access router, and the like. As shown in Figure 1, the network device is configured with two management modules and several monitoring modules, wherein the two management modules are in a master-backup relationship with each other, that is, one master and one backup, and in some implementations, the management modules can be included in Processes running on the device system; the monitoring module may also include processes running on the device system, but in some implementations, the monitoring module may also include some hardware equipped with network devices, such as sensors and the like.

Based on this, the device operation and maintenance method in the embodiment of the present disclosure, as shown in FIG. 2 , may include but not limited to step S10 to step S20.

Step S10, the monitoring module monitors the target object, and sends feedback information to the main management module of the two management modules when preset conditions are met, wherein the feedback information is used to instruct the main management module to implement preset operation and maintenance actions.

Step S20, when the standby management module of the two management modules determines that an abnormality occurs in the current main management module, it switches to the main management module.

Equipment operation and maintenance can be understood as when the operating status of the equipment does not meet expectations, the manager performs a series of operations on the equipment to make the equipment operate as expected. Therefore, in this embodiment, the monitoring module can monitor the target object in real time, and send feedback information to the main management module when the preset condition is met, so that the main management module can implement preset actions according to the received feedback information. Operation and maintenance actions. Wherein, the target object refers to an object related to the operation of the network device. For example, the target object may include but not limited to at least one of the following: other devices connected in communication with the network device, hardware of the network device itself, and Applications and processes running on the system; preset conditions refer to preset trigger conditions, and preset operation and maintenance actions refer to a series of preset specific operations, that is, when the preset conditions are met Operation and maintenance actions are required. In addition, it should be noted that one target object may be monitored by one or more monitoring modules, or one monitoring module may monitor one or more target objects, which is not limited in this embodiment.

It can be understood that the network equipment can realize automatic operation and maintenance through the configured management module and monitoring module, which makes the network equipment intelligent and solves the problem of limitations in the operation and maintenance of network equipment in the traditional technology. For example, in the traditional technology, there are Lag and high cost of operation and maintenance, and this embodiment through the management module and monitoring module, not only can detect the fault in time and carry out operation and maintenance, but also can effectively reduce the cost of manpower and material resources.

It can be seen from the previous discussion that the two management modules are in the master-backup relationship with each other, that is, they serve as each other’s redundant backup. Based on this, the standby management module can switch to the master management module when it is determined that the master management module is abnormal, and continue to perform master management. The tasks that the module needs to perform, in one embodiment, the standby management module can monitor whether the main management module is abnormal in real time, if it is determined that an abnormality occurs, it will switch to the main management module, or other modules will monitor whether the main management module is abnormal in real time, If it is determined that an abnormality occurs, the module notifies the standby management module, so that the standby management module switches to the main management module. Therefore, compared with the implementation in which only one management module is provided, this embodiment can effectively improve the reliability of the network equipment, that is, make the network equipment robust. For example, the two management modules are respectively the first management module and the second management module. module, at this time the first management module and the second management module are the main management module and the standby management module respectively, then during the monitoring process of the monitoring module and the process of implementing the operation and maintenance actions of the first management module, the second management module determines When the first management module is abnormal, it is switched to be the main management module, so that the device abnormality caused by the first management module can be avoided.

In one embodiment, as shown in Figure 3, the two management modules communicate with the monitoring module through the same communication interface, wherein the communication interface is used to realize data interaction between the management module and the monitoring module, which may include Processes running on the device system, etc. In this way, after the standby management module is switched to the main management module, since the two management modules share the same communication interface, the switched management module can still continue to receive the feedback information sent by the monitoring module through the originally established communication link, wherein, The originally established communication link is a communication link established based on the communication interface between the abnormal management module and the monitoring module. For example, the two management modules are respectively the first management module and the second management module. At this time, the first management module and the second management module are respectively the main management module and the backup management module. It can be understood that the first management module has already passed the communication interface After the communication link is established, when an abnormality occurs in the first management module, the second management module can continue to receive the feedback information sent by the monitoring module directly through the communication link.

Therefore, it can be understood that the standby management module does not need to re-establish the communication link when switching, and can interact with the monitoring module through the established communication link, which is similar to the implementation mode in which the two management modules communicate with the monitoring module through different communication interfaces. Compared with this embodiment, the switching time of the standby management module can be effectively saved in this embodiment, thereby improving the reliability of the network device. In one embodiment, the two management modules are respectively the first management module and the second management module. At this time, the first management module and the second management module are respectively the main management module and the backup management module, and the first management module passes The first communication interface communicates with the monitoring module, and the second management module communicates with the monitoring module through the second communication interface. When an abnormality occurs in the first management module, the second management module needs to establish a monitoring module based on the second communication interface when switching. The communication link, which will lead to too long switching time, and the two management modules in this embodiment share the same communication interface, and there is no need to re-establish the communication link, so the switching time can be effectively saved.

In an embodiment, step S10 may include the following content: the monitoring module monitors the target object based on the preset monitoring strategy issued by the main management module, and when the monitored operating condition of the target object meets the trigger set by the preset monitoring strategy When conditions are met, feedback information is sent to the main management module.

The main management module can send the preset monitoring strategy to the monitoring module in advance, so that the monitoring module can monitor the target object in real time according to the preset monitoring strategy, so that when the monitoring module meets the trigger conditions set by the preset monitoring strategy Send feedback information to the main management module, and the feedback information is used to instruct the main management module to implement operation and maintenance actions.

In an embodiment, the preset monitoring strategy may include at least one of a first monitoring strategy, a second monitoring strategy and a third monitoring strategy.

The first monitoring strategy is used to instruct the monitoring module to monitor the target device connected to the network device, and to send the first feedback information to the main management module when it detects connection oscillation between the target device and the network device, wherein the first The feedback information is used to instruct the main management module to disconnect the connection between the network device and the target device, and to re-establish the connection between the network device and the target device after a first preset time period. In one embodiment, the occurrence of connection oscillation between the target device and the network device refers to the repeated establishment and disconnection between two devices within a certain period of time, for example, the number of establishment and disconnection within 10 minutes is greater than 3 (repeated connection establishment and disconnection), it is understandable that long-term connection oscillations may cause oscillations in the entire network route, so in order to avoid this from happening, the monitoring module detects that the connection between the target device and the network device During the vibration, the first feedback information can be sent to the main management module, so that the main management module can disconnect the connection between the network device and the target device, and avoid the vibration of the entire network route. In addition, after a certain period of time, the main management module can re-establish the connection between the network device and the target device, so as to restore the normal operation of the network device. Exemplarily, as shown in FIG. 4, the network device communicates with the first device and the second device respectively, and the three form a network route. Based on this, the monitoring module can monitor the first device (ie, the target device) in real time based on the first monitoring strategy. device) to monitor, and send the first feedback information to the main management module (which can be reasonably set according to the actual situation) when the connection oscillation is detected, so the main management module can disconnect the connection between the network device and the first device, thereby avoiding The entire network route is prevented from oscillating (for example, the connection between the network device and the second device is prevented from oscillating), and the connection between the two is re-established after the first preset time period (for example, 60 minutes later), and the network device operates normally.

The second monitoring strategy is used to instruct the monitoring module to monitor the target service, and when the received traffic of the target service is detected to exceed the first preset threshold, send second feedback information to the main management module, and the second feedback information is used to instruct the main The management module limits the bandwidth of other services and/or increases the bandwidth of the target service. In one embodiment, if the received traffic of the target business increases while the bandwidth of the target business remains unchanged, it will affect the quality of service (Quality of Service, QoS) of the target business, so in order to avoid this situation, the monitoring module When it is detected that the received traffic of the target service exceeds the first preset threshold, the second feedback information can be sent to the main management module, so that the main management module can limit the bandwidth of other services and/or increase the bandwidth of the target service to ensure the target The quality of service of the business. In one embodiment, the target service may include a video conferencing service. For example, as shown in FIG. Monitor the video conferencing service (that is, the target service), and send second feedback information (which can be reasonably set according to the actual situation) to the main management module when it is detected that the received traffic of the video conferencing service exceeds the first preset threshold (which can be reasonably set according to the actual situation). setting), so the main management module can limit the bandwidth of other services and/or increase the bandwidth of the target service.

The third monitoring strategy is used to instruct the monitoring module to monitor the target hardware, and when the monitored utilization rate of the target hardware exceeds a second preset threshold, send third feedback information to the main management module, wherein the third feedback information is used for Instructs the primary management module to adjust the health status of the target hardware and/or other hardware. In one embodiment, the target hardware may include a CPU of a network device, and when the monitoring module detects that the utilization rate of the CPU exceeds a second preset threshold (which may be reasonably set according to actual conditions), it sends third feedback information (which may be set to the main management module) to the main management module. Reasonably set according to the actual situation), so the main management module can adjust the running state of the target hardware (such as increasing the operating frequency of the CPU) and/or the running state of other hardware (such as increasing the output power of the fan), so as to ensure the network equipment normal operation.

It should be noted that in other implementation manners, the preset monitoring strategy may also include other strategies, such as service process status monitoring, service quality monitoring, etc., which are not limited in this embodiment.

In an embodiment, the method may further include: pre-determining one of the two management modules as the main management module. In one embodiment, before the monitoring module starts monitoring, the network device needs to determine one of the two management modules as the main management module. The respective ID values of the modules determine one of the two management modules as the main management module, for example, based on a preset selection policy, determine the management module with the smaller ID value among the two management modules as the main management module.

In an embodiment, the method may further include: performing data synchronization processing on the two management modules. It can be understood that since the two management modules are active and standby with each other, data synchronization processing is required. In one embodiment, data synchronization processing can be performed on the two management modules at regular intervals, for example, data synchronization processing is performed every 5 minutes; in other embodiments, when data changes occur in the main management module, the backup management module can be changed synchronously .

In an embodiment, the standby management module determines that the main management module is abnormal, which may include the following content: the standby management module periodically sends a preset detection message to the main management module; , if no response message returned by the main management module is received within the second preset time period, it is determined that the main management module is abnormal. As shown in Figure 6, after the standby management module sends the detection message, if it receives a response message, it is determined that the main management module is not abnormal, and the standby management module can continue to send the detection message to the main management module; If no response message is received within the preset time period, it can be determined that the main management module is abnormal, and the standby management module is switched to the main management module. For example, if no response message is received within three times the time period, it is determined that the main management module An exception occurred in the management module.

In an embodiment, after the standby management module is switched to be the main management module, the method may further include: the main management module restarts the abnormal management module, and instructs the abnormal management module to switch to the standby management module after restarting. Exemplarily, the two management modules are respectively the first management module and the second management module. At this time, the first management module and the second management module are respectively the main management module and the backup management module. If the first management module is abnormal, then The second management module switches to the main management module. After the switching is completed, the second management module restarts the first management module, and instructs the first management module to switch to the standby management module after restarting, so as to restore the normal operation of the first management module.

An embodiment of the present disclosure also provides a network device, as shown in FIG. 7 , including a processor and a memory, where the memory is used to store a computer program; the processor is used to execute the computer program and realize the implementation provided by the embodiment of the present disclosure when executing the computer program. Any equipment operation and maintenance method.

It should be understood that the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein, the general-purpose processor can be a microprocessor, or the processor can also be any conventional processor and the like.

The embodiment of the present disclosure also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor can realize any one of the equipment operation and maintenance provided by the embodiments of the present disclosure method.

Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer-readable storage media, which may include computer-readable storage media (or non-transitory media) and communication media (or transitory media).

As known to those of ordinary skill in the art, the term computer-readable storage medium includes both volatile and non-volatile media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data. Volatile, removable and non-removable media. Computer-readable storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, Or any other medium that can be used to store desired information and that can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Exemplarily, the computer-readable storage medium may be an internal storage unit of the network device described in the foregoing embodiments, such as a hard disk or a memory of the network device. The computer-readable storage medium can also be an external storage device of the network device, such as a plug-in hard disk equipped on the network device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card ( Flash Card), etc.

The above is only a specific embodiment of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person familiar with the technical field can easily think of various equivalents within the technical scope of the present disclosure. Modifications or replacements should be covered within the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure should be determined by the protection scope of the claims.

Claims

A device operation and maintenance method, which is applied to a network device, the network device is configured with two management modules and several monitoring modules, and the two management modules are in a master-backup relationship with each other, and the method includes:

The monitoring module monitors the target object, and sends feedback information to the main management module of the two management modules when a preset condition is met, and the feedback information is used to indicate that the main management module of the two management modules The module implements preset operation and maintenance actions;

When the standby management module of the two management modules determines that the main management module of the two management modules is abnormal, it switches to the main management module.
According to the method according to claim 1, the two management modules communicate with the monitoring module through the same communication interface, so that the backup management module of the two management modules can pass the original management module after switching to the main management module. The established communication link continues to receive the feedback information sent by the monitoring module, wherein the previously established communication link is a communication link established based on the communication interface between the abnormal management module and the monitoring module.
The method according to claim 1, wherein the monitoring module monitors the target object, and sends feedback information to the main management module of the two management modules when preset conditions are met, including:

The monitoring module monitors the target object based on the preset monitoring strategy issued by the main management module, and when the monitored operating status of the target object meets the trigger condition set by the preset monitoring strategy, the The main management module sends feedback information.
The method according to claim 3, wherein the preset monitoring strategy comprises at least one of a first monitoring strategy, a second monitoring strategy and a third monitoring strategy;

The first monitoring strategy is used to instruct the monitoring module to monitor the target device connected to the network device, and when detecting connection oscillation between the target device and the network device, send a message to the master The management module sends first feedback information; the first feedback information is used to instruct the main management module to disconnect the connection between the network device and the target device, and to re-establish the a connection between a network device and said target device;

The second monitoring strategy is used to instruct the monitoring module to monitor the target service, and send second feedback information to the main management module when it is detected that the received traffic of the target service exceeds a first preset threshold, Wherein, the target service includes a video conference service; the second feedback information is used to instruct the main management module to limit the bandwidth of other services and/or increase the bandwidth of the target service;

The third monitoring strategy is used to instruct the monitoring module to monitor the target hardware, and send third feedback information to the main management module when it is detected that the utilization rate of the target hardware exceeds a second preset threshold, Wherein, the target hardware includes a CPU; the third feedback information is used to instruct the main management module to adjust the running state of the target hardware and/or the running state of other hardware.
The method according to any one of claims 1-4, wherein the method further comprises:

Predetermining one of the two management modules as the main management module; and/or

Perform data synchronization processing on the two management modules.
The method according to claim 5, wherein said determining one of the two management modules as the main management module comprises:

Based on a preset selection strategy, one of the two management modules is determined as the main management module according to the respective ID values of the two management modules.
The method according to any one of claims 1-4, wherein the standby management module of the two management modules determines that an abnormality occurs in the main management module of the two management modules, comprising:

The standby management module periodically sends a preset detection message to the main management module;

If the standby management module does not receive a response message returned by the main management module within a second preset time period after sending the detection message each time, it determines that the main management module is abnormal.
The method according to any one of claims 1-4, wherein after the standby management module among the two management modules is switched to be the main management module, the method further comprises:

The main management module restarts the abnormal management module, and instructs the abnormal management module to switch to the standby management module after restarting.
A network device, including a processor and a memory;

The memory is used to store computer programs;

The processor is configured to execute the computer program and implement the equipment operation and maintenance method according to any one of claims 1 to 8 when executing the computer program.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor implements the method according to any one of claims 1 to 8 Equipment operation and maintenance method.