WO2022247675A1 - Device operation and maintenance method, network device, and storage medium - Google Patents
Device operation and maintenance method, network device, and storage medium Download PDFInfo
- Publication number
- WO2022247675A1 WO2022247675A1 PCT/CN2022/093050 CN2022093050W WO2022247675A1 WO 2022247675 A1 WO2022247675 A1 WO 2022247675A1 CN 2022093050 W CN2022093050 W CN 2022093050W WO 2022247675 A1 WO2022247675 A1 WO 2022247675A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- management module
- module
- monitoring
- modules
- main
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000012423 maintenance Methods 0.000 title claims abstract description 34
- 238000012544 monitoring process Methods 0.000 claims abstract description 73
- 230000009471 action Effects 0.000 claims abstract description 10
- 238000004891 communication Methods 0.000 claims description 29
- 230000002159 abnormal effect Effects 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 8
- 230000005856 abnormality Effects 0.000 claims description 7
- 230000010355 oscillation Effects 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 5
- 238000007726 management method Methods 0.000 description 167
- 238000005516 engineering process Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/66—Arrangements for connecting between networks having differing types of switching systems, e.g. gateways
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0663—Performing the actions predefined by failover planning, e.g. switching to standby network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0876—Aspects of the degree of configuration automation
Definitions
- the present disclosure relates to the field of communication technologies, and in particular, to a device operation and maintenance method, network device and storage medium.
- the operation and maintenance method for network equipment is usually as follows: the network management platform captures abnormal information in the protocol information sent by the network equipment, and the management personnel maintain the network equipment according to the abnormal information.
- Send NETCONF Network Configuration Protocol, Network Configuration Protocol
- SNMP Simple Network Management Protocol, Simple Network Management Protocol
- the network management platform captures abnormal information, it can notify the management personnel in the form of human-computer interaction, and the management personnel can Analyze the information and maintain the network equipment according to the analysis results.
- the embodiments of the present disclosure provide a device operation and maintenance method, a network device, and a storage medium, so as to solve the problem of limitation in the operation and maintenance mode of the network device in the traditional technology.
- the embodiment of the present disclosure provides a device operation and maintenance method, which is applied to a network device.
- the network device is configured with two management modules and several monitoring modules.
- the two management modules are in a master-backup relationship with each other.
- the method includes: the monitoring module monitors the target object, and sends feedback information to the main management module of the two management modules when a preset condition is met, and the feedback information is used to indicate that the two management modules
- the main management module of the two management modules implements preset operation and maintenance actions; the standby management module of the two management modules switches to the main management module when it is determined that the main management module of the two management modules is abnormal.
- an embodiment of the present disclosure provides a network device, including a processor and a memory; the memory is used to store a computer program; the processor is used to execute the computer program and execute the computer program Realize the device operation and maintenance method as described in the first aspect.
- an embodiment of the present disclosure provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the processor implements the method described in the first aspect.
- FIG. 1 is a schematic configuration diagram of a network device in an embodiment of the present disclosure
- FIG. 2 is a schematic flow diagram of a device operation and maintenance method in an embodiment of the present disclosure
- FIG. 3 is a schematic diagram of two management modules sharing a communication interface in an embodiment of the present disclosure
- FIG. 4 is a schematic diagram of an application scenario in an embodiment of the present disclosure.
- FIG. 5 is a schematic diagram of another application scenario in the embodiments of the present disclosure.
- Fig. 6 is a schematic flow diagram of determining whether the main management module is abnormal by the standby management module in the embodiment of the present disclosure
- Fig. 7 is a schematic structural block diagram of a network device in an embodiment of the present disclosure.
- a device operation and maintenance method provided by an embodiment of the present disclosure may be applied to a network device, and the network device may include a switch, a router, and the like.
- the network device may be a service switch, a service router, an access router, and the like.
- the network device is configured with two management modules and several monitoring modules, wherein the two management modules are in a master-backup relationship with each other, that is, one master and one backup, and in some implementations, the management modules can be included in Processes running on the device system; the monitoring module may also include processes running on the device system, but in some implementations, the monitoring module may also include some hardware equipped with network devices, such as sensors and the like.
- the device operation and maintenance method in the embodiment of the present disclosure may include but not limited to step S10 to step S20.
- Step S10 the monitoring module monitors the target object, and sends feedback information to the main management module of the two management modules when preset conditions are met, wherein the feedback information is used to instruct the main management module to implement preset operation and maintenance actions.
- Step S20 when the standby management module of the two management modules determines that an abnormality occurs in the current main management module, it switches to the main management module.
- Equipment operation and maintenance can be understood as when the operating status of the equipment does not meet expectations, the manager performs a series of operations on the equipment to make the equipment operate as expected. Therefore, in this embodiment, the monitoring module can monitor the target object in real time, and send feedback information to the main management module when the preset condition is met, so that the main management module can implement preset actions according to the received feedback information. Operation and maintenance actions.
- the target object refers to an object related to the operation of the network device.
- the target object may include but not limited to at least one of the following: other devices connected in communication with the network device, hardware of the network device itself, and Applications and processes running on the system; preset conditions refer to preset trigger conditions, and preset operation and maintenance actions refer to a series of preset specific operations, that is, when the preset conditions are met Operation and maintenance actions are required.
- preset conditions refer to preset trigger conditions
- preset operation and maintenance actions refer to a series of preset specific operations, that is, when the preset conditions are met Operation and maintenance actions are required.
- one target object may be monitored by one or more monitoring modules, or one monitoring module may monitor one or more target objects, which is not limited in this embodiment.
- the network equipment can realize automatic operation and maintenance through the configured management module and monitoring module, which makes the network equipment intelligent and solves the problem of limitations in the operation and maintenance of network equipment in the traditional technology.
- the traditional technology there are Lag and high cost of operation and maintenance, and this embodiment through the management module and monitoring module, not only can detect the fault in time and carry out operation and maintenance, but also can effectively reduce the cost of manpower and material resources.
- the standby management module can switch to the master management module when it is determined that the master management module is abnormal, and continue to perform master management.
- the tasks that the module needs to perform in one embodiment, the standby management module can monitor whether the main management module is abnormal in real time, if it is determined that an abnormality occurs, it will switch to the main management module, or other modules will monitor whether the main management module is abnormal in real time, If it is determined that an abnormality occurs, the module notifies the standby management module, so that the standby management module switches to the main management module.
- this embodiment can effectively improve the reliability of the network equipment, that is, make the network equipment robust.
- the two management modules are respectively the first management module and the second management module.
- the first management module and the second management module are the main management module and the standby management module respectively, then during the monitoring process of the monitoring module and the process of implementing the operation and maintenance actions of the first management module, the second management module determines When the first management module is abnormal, it is switched to be the main management module, so that the device abnormality caused by the first management module can be avoided.
- the two management modules communicate with the monitoring module through the same communication interface, wherein the communication interface is used to realize data interaction between the management module and the monitoring module, which may include Processes running on the device system, etc.
- the switched management module can still continue to receive the feedback information sent by the monitoring module through the originally established communication link, wherein,
- the originally established communication link is a communication link established based on the communication interface between the abnormal management module and the monitoring module.
- the two management modules are respectively the first management module and the second management module.
- the first management module and the second management module are respectively the main management module and the backup management module. It can be understood that the first management module has already passed the communication interface After the communication link is established, when an abnormality occurs in the first management module, the second management module can continue to receive the feedback information sent by the monitoring module directly through the communication link.
- the standby management module does not need to re-establish the communication link when switching, and can interact with the monitoring module through the established communication link, which is similar to the implementation mode in which the two management modules communicate with the monitoring module through different communication interfaces.
- the switching time of the standby management module can be effectively saved in this embodiment, thereby improving the reliability of the network device.
- the two management modules are respectively the first management module and the second management module. At this time, the first management module and the second management module are respectively the main management module and the backup management module, and the first management module passes The first communication interface communicates with the monitoring module, and the second management module communicates with the monitoring module through the second communication interface.
- the second management module needs to establish a monitoring module based on the second communication interface when switching.
- the communication link which will lead to too long switching time, and the two management modules in this embodiment share the same communication interface, and there is no need to re-establish the communication link, so the switching time can be effectively saved.
- step S10 may include the following content: the monitoring module monitors the target object based on the preset monitoring strategy issued by the main management module, and when the monitored operating condition of the target object meets the trigger set by the preset monitoring strategy When conditions are met, feedback information is sent to the main management module.
- the main management module can send the preset monitoring strategy to the monitoring module in advance, so that the monitoring module can monitor the target object in real time according to the preset monitoring strategy, so that when the monitoring module meets the trigger conditions set by the preset monitoring strategy Send feedback information to the main management module, and the feedback information is used to instruct the main management module to implement operation and maintenance actions.
- the preset monitoring strategy may include at least one of a first monitoring strategy, a second monitoring strategy and a third monitoring strategy.
- the first monitoring strategy is used to instruct the monitoring module to monitor the target device connected to the network device, and to send the first feedback information to the main management module when it detects connection oscillation between the target device and the network device, wherein the first The feedback information is used to instruct the main management module to disconnect the connection between the network device and the target device, and to re-establish the connection between the network device and the target device after a first preset time period.
- the occurrence of connection oscillation between the target device and the network device refers to the repeated establishment and disconnection between two devices within a certain period of time, for example, the number of establishment and disconnection within 10 minutes is greater than 3 (repeated connection establishment and disconnection), it is understandable that long-term connection oscillations may cause oscillations in the entire network route, so in order to avoid this from happening, the monitoring module detects that the connection between the target device and the network device During the vibration, the first feedback information can be sent to the main management module, so that the main management module can disconnect the connection between the network device and the target device, and avoid the vibration of the entire network route.
- the main management module can re-establish the connection between the network device and the target device, so as to restore the normal operation of the network device.
- the network device communicates with the first device and the second device respectively, and the three form a network route.
- the monitoring module can monitor the first device (ie, the target device) in real time based on the first monitoring strategy.
- the main management module (which can be reasonably set according to the actual situation) when the connection oscillation is detected, so the main management module can disconnect the connection between the network device and the first device, thereby avoiding
- the entire network route is prevented from oscillating (for example, the connection between the network device and the second device is prevented from oscillating), and the connection between the two is re-established after the first preset time period (for example, 60 minutes later), and the network device operates normally.
- the second monitoring strategy is used to instruct the monitoring module to monitor the target service, and when the received traffic of the target service is detected to exceed the first preset threshold, send second feedback information to the main management module, and the second feedback information is used to instruct the main
- the management module limits the bandwidth of other services and/or increases the bandwidth of the target service.
- the monitoring module When it is detected that the received traffic of the target service exceeds the first preset threshold, the second feedback information can be sent to the main management module, so that the main management module can limit the bandwidth of other services and/or increase the bandwidth of the target service to ensure the target The quality of service of the business.
- the target service may include a video conferencing service. For example, as shown in FIG.
- Monitor the video conferencing service that is, the target service
- second feedback information which can be reasonably set according to the actual situation
- the main management module can limit the bandwidth of other services and/or increase the bandwidth of the target service.
- the third monitoring strategy is used to instruct the monitoring module to monitor the target hardware, and when the monitored utilization rate of the target hardware exceeds a second preset threshold, send third feedback information to the main management module, wherein the third feedback information is used for Instructs the primary management module to adjust the health status of the target hardware and/or other hardware.
- the target hardware may include a CPU of a network device, and when the monitoring module detects that the utilization rate of the CPU exceeds a second preset threshold (which may be reasonably set according to actual conditions), it sends third feedback information (which may be set to the main management module) to the main management module.
- the main management module can adjust the running state of the target hardware (such as increasing the operating frequency of the CPU) and/or the running state of other hardware (such as increasing the output power of the fan), so as to ensure the network equipment normal operation.
- the preset monitoring strategy may also include other strategies, such as service process status monitoring, service quality monitoring, etc., which are not limited in this embodiment.
- the method may further include: pre-determining one of the two management modules as the main management module.
- the network device needs to determine one of the two management modules as the main management module.
- the respective ID values of the modules determine one of the two management modules as the main management module, for example, based on a preset selection policy, determine the management module with the smaller ID value among the two management modules as the main management module.
- the method may further include: performing data synchronization processing on the two management modules. It can be understood that since the two management modules are active and standby with each other, data synchronization processing is required. In one embodiment, data synchronization processing can be performed on the two management modules at regular intervals, for example, data synchronization processing is performed every 5 minutes; in other embodiments, when data changes occur in the main management module, the backup management module can be changed synchronously .
- the standby management module determines that the main management module is abnormal, which may include the following content: the standby management module periodically sends a preset detection message to the main management module; , if no response message returned by the main management module is received within the second preset time period, it is determined that the main management module is abnormal.
- the standby management module sends the detection message, if it receives a response message, it is determined that the main management module is not abnormal, and the standby management module can continue to send the detection message to the main management module; If no response message is received within the preset time period, it can be determined that the main management module is abnormal, and the standby management module is switched to the main management module. For example, if no response message is received within three times the time period, it is determined that the main management module An exception occurred in the management module.
- the method may further include: the main management module restarts the abnormal management module, and instructs the abnormal management module to switch to the standby management module after restarting.
- the two management modules are respectively the first management module and the second management module.
- the first management module and the second management module are respectively the main management module and the backup management module. If the first management module is abnormal, then The second management module switches to the main management module. After the switching is completed, the second management module restarts the first management module, and instructs the first management module to switch to the standby management module after restarting, so as to restore the normal operation of the first management module.
- An embodiment of the present disclosure also provides a network device, as shown in FIG. 7 , including a processor and a memory, where the memory is used to store a computer program; the processor is used to execute the computer program and realize the implementation provided by the embodiment of the present disclosure when executing the computer program. Any equipment operation and maintenance method.
- the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
- the general-purpose processor can be a microprocessor, or the processor can also be any conventional processor and the like.
- the embodiment of the present disclosure also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor can realize any one of the equipment operation and maintenance provided by the embodiments of the present disclosure method.
- the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.
- the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute.
- Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit .
- Such software may be distributed on computer-readable storage media, which may include computer-readable storage media (or non-transitory media) and communication media (or transitory media).
- computer-readable storage medium includes both volatile and non-volatile media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data. Volatile, removable and non-removable media.
- Computer-readable storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, Or any other medium that can be used to store desired information and that can be accessed by a computer.
- communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
- the computer-readable storage medium may be an internal storage unit of the network device described in the foregoing embodiments, such as a hard disk or a memory of the network device.
- the computer-readable storage medium can also be an external storage device of the network device, such as a plug-in hard disk equipped on the network device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card ( Flash Card), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Automation & Control Theory (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Embodiments of the present disclosure provide a device operation and maintenance method, a network device, and a storage medium. The network device is configured with two management modules and a plurality of monitoring modules, wherein the two management modules are in a master-standby relationship. On this basis, the method comprises: monitoring modules monitor a target object, and send feedback information to a master management module of two management modules when a preset condition is satisfied, wherein the feedback information is used for instructing the master management module to implement a preset operation and maintenance action; and a standby management module of the two management modules switches to be a master management module when determining that an anomaly occurs in the current master management module.
Description
相关申请的交叉引用Cross References to Related Applications
本公开要求享有2021年05月24日提交的名称为“设备运维方法、网络设备及存储介质”的中国专利申请CN202110566486.5的优先权,其全部内容通过引用并入本公开中。This disclosure claims the priority of the Chinese patent application CN202110566486.5 filed on May 24, 2021, entitled "Equipment Operation and Maintenance Method, Network Equipment and Storage Medium", the entire content of which is incorporated into this disclosure by reference.
本公开涉及通信技术领域,尤其涉及一种设备运维方法、网络设备及存储介质。The present disclosure relates to the field of communication technologies, and in particular, to a device operation and maintenance method, network device and storage medium.
在传统网管技术中,针对网络设备的运维方式通常为:网管平台在网络设备发送的协议信息中捕获异常信息,管理人员根据异常信息对网络设备进行维护,示例性的,网络设备向网管平台发送NETCONF(Network Configuration Protocol,网络配置协议)信息或SNMP(Simple Network Management Protocol,简单网络管理协议)信息,当网管平台捕获到异常信息时可以以人机交互的方式告知管理人员,管理人员根据异常信息进行分析,并根据分析结果对网络设备进行维护。In traditional network management technology, the operation and maintenance method for network equipment is usually as follows: the network management platform captures abnormal information in the protocol information sent by the network equipment, and the management personnel maintain the network equipment according to the abnormal information. Send NETCONF (Network Configuration Protocol, Network Configuration Protocol) information or SNMP (Simple Network Management Protocol, Simple Network Management Protocol) information, when the network management platform captures abnormal information, it can notify the management personnel in the form of human-computer interaction, and the management personnel can Analyze the information and maintain the network equipment according to the analysis results.
由此可知,传统技术并不具备智能性,因此存在一定的局限性。例如,传统技术对网络设备的故障感知存在一定的滞后性,从而使得运维存在滞后性;又例如,传统技术需要花费较大的人力物力,从而导致运维成本过高等等。It can be seen that traditional technology does not possess intelligence, so there are certain limitations. For example, traditional technology has a certain lag in the fault perception of network equipment, which leads to a lag in operation and maintenance; another example, traditional technology requires a lot of manpower and material resources, resulting in high operation and maintenance costs, etc.
发明内容Contents of the invention
基于此,本公开实施例提供了一种设备运维方法、网络设备及存储介质,以解决传统技术中针对网络设备的运维方式存在局限性的问题。Based on this, the embodiments of the present disclosure provide a device operation and maintenance method, a network device, and a storage medium, so as to solve the problem of limitation in the operation and maintenance mode of the network device in the traditional technology.
第一方面,本公开实施例提供了一种设备运维方法,应用于网络设备,所述网络设备配置有两个管理模块以及若干监测模块,所述两个管理模块互为主备关系,所述方法包括:所述监测模块对目标对象进行监测,并在满足预设条件时向所述两个管理模块中的主管理模块发送反馈信息,所述反馈信息用于指示所述两个管理模块中的主管理模块实施预设的运维动作;所述两个管理模块中的备用管理模块在确定所述两个管理模块中的主管理模块发生异常时,切换为主管理模块。In the first aspect, the embodiment of the present disclosure provides a device operation and maintenance method, which is applied to a network device. The network device is configured with two management modules and several monitoring modules. The two management modules are in a master-backup relationship with each other. The method includes: the monitoring module monitors the target object, and sends feedback information to the main management module of the two management modules when a preset condition is met, and the feedback information is used to indicate that the two management modules The main management module of the two management modules implements preset operation and maintenance actions; the standby management module of the two management modules switches to the main management module when it is determined that the main management module of the two management modules is abnormal.
第二方面,本公开实施例提供了一种网络设备,包括处理器与存储器;所述存储器,用于存储计算机程序;所述处理器,用于执行所述计算机程序并在执行所述计算机程序时实现 如第一方面所述的设备运维方法。In a second aspect, an embodiment of the present disclosure provides a network device, including a processor and a memory; the memory is used to store a computer program; the processor is used to execute the computer program and execute the computer program Realize the device operation and maintenance method as described in the first aspect.
第三方面,本公开实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器实现如第一方面所述的设备运维方法。In a third aspect, an embodiment of the present disclosure provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the processor implements the method described in the first aspect. The equipment operation and maintenance method described above.
图1为本公开实施例中网络设备的一种配置示意图;FIG. 1 is a schematic configuration diagram of a network device in an embodiment of the present disclosure;
图2为本公开实施例中设备运维方法的一种流程示意图;FIG. 2 is a schematic flow diagram of a device operation and maintenance method in an embodiment of the present disclosure;
图3为本公开实施例中两个管理模块共用一个通信接口的示意图;3 is a schematic diagram of two management modules sharing a communication interface in an embodiment of the present disclosure;
图4为本公开实施例中一种应用场景示意图;FIG. 4 is a schematic diagram of an application scenario in an embodiment of the present disclosure;
图5为本公开实施例中本公开实施例中另一种应用场景示意图;FIG. 5 is a schematic diagram of another application scenario in the embodiments of the present disclosure;
图6为本公开实施例中备用管理模块确定主管理模是否发生异常的一种流程示意图;Fig. 6 is a schematic flow diagram of determining whether the main management module is abnormal by the standby management module in the embodiment of the present disclosure;
图7为本公开实施例中的网络设备的一种结构示意性框图。Fig. 7 is a schematic structural block diagram of a network device in an embodiment of the present disclosure.
下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本说明书的一部分实施例,而不是全部的实施例。基于本说明书中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of this specification with reference to the drawings in the embodiments of this specification. Apparently, the described embodiments are part of the embodiments of this specification, not all of them. Based on the embodiments in this specification, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts fall within the protection scope of the present disclosure.
附图中所示的流程图仅是示例说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解、组合或部分合并,因此实际执行的顺序有可能根据实际情况发生改变。The flow charts shown in the drawings are just illustrations, and do not necessarily include all contents and operations/steps, nor must they be performed in the order described. For example, some operations/steps can be decomposed, combined or partly combined, so the actual order of execution may change according to the actual situation.
下面结合附图,对本说明书的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。Some implementations of this specification will be described in detail below with reference to the accompanying drawings. In the case of no conflict, the following embodiments and features in the embodiments can be combined with each other.
本公开实施例提供的一种设备运维方法,可以应用于网络设备,该网络设备可以包括交换机、路由器等等,例如,网络设备可以是业务交换机、业务路由器、接入路由器等等。如图1所示,网络设备配置有两个管理模块和若干个监测模块,其中,两个管理模块互为主备关系,即一主一备,且在一些实施方式中,管理模块可以包括在设备系统上运行的进程;监测模块也可以包括在设备系统上运行的进程,但在一些实施方式中,监测模块还可以包括网络设备所配备的一些硬件,例如传感器等等。A device operation and maintenance method provided by an embodiment of the present disclosure may be applied to a network device, and the network device may include a switch, a router, and the like. For example, the network device may be a service switch, a service router, an access router, and the like. As shown in Figure 1, the network device is configured with two management modules and several monitoring modules, wherein the two management modules are in a master-backup relationship with each other, that is, one master and one backup, and in some implementations, the management modules can be included in Processes running on the device system; the monitoring module may also include processes running on the device system, but in some implementations, the monitoring module may also include some hardware equipped with network devices, such as sensors and the like.
基于此,本公开实施例中的设备运维方法,如图2所示,可以包括但不限于步骤S10至步骤S20。Based on this, the device operation and maintenance method in the embodiment of the present disclosure, as shown in FIG. 2 , may include but not limited to step S10 to step S20.
步骤S10、监测模块对目标对象进行监测,并在满足预设条件时向两个管理模块中的主管理模块发送反馈信息,其中,反馈信息用于指示主管理模块实施预设的运维动作。Step S10, the monitoring module monitors the target object, and sends feedback information to the main management module of the two management modules when preset conditions are met, wherein the feedback information is used to instruct the main management module to implement preset operation and maintenance actions.
步骤S20、两个管理模块中的备用管理模块在确定当前主管理模块发生异常时,切换为主管理模块。Step S20, when the standby management module of the two management modules determines that an abnormality occurs in the current main management module, it switches to the main management module.
设备运维可以理解为当设备的运行状况不符合预期时,管理人员对设备做出一系列操作,以使设备运行能够符合预期。因此在本实施例中,监测模块可以实时对目标对象进行监测,并在监测到满足预设条件时向主管理模块发送反馈信息,从而使得主管理模块可以根据接收到的反馈信息实施预设的运维动作。其中,目标对象指的是与网络设备运行相关的对象,例如,目标对象可以包括但不限于以下至少一种:与网络设备通信连接的其他设备、网络设备自身所具有的硬件、以及在网络设备系统上运行的应用和进程;预设条件指的是预先设置好的触发条件,而预设的运维动作指的是预先设置好的一系列具体操作,也就是说,在满足预设条件时需实施运维动作。此外,需要说明的是,一个目标对象可以由一个或多个监测模块监测,或者一个监测模块可以对一个或多个目标对象进行监测,本实施例对此并不做限制。Equipment operation and maintenance can be understood as when the operating status of the equipment does not meet expectations, the manager performs a series of operations on the equipment to make the equipment operate as expected. Therefore, in this embodiment, the monitoring module can monitor the target object in real time, and send feedback information to the main management module when the preset condition is met, so that the main management module can implement preset actions according to the received feedback information. Operation and maintenance actions. Wherein, the target object refers to an object related to the operation of the network device. For example, the target object may include but not limited to at least one of the following: other devices connected in communication with the network device, hardware of the network device itself, and Applications and processes running on the system; preset conditions refer to preset trigger conditions, and preset operation and maintenance actions refer to a series of preset specific operations, that is, when the preset conditions are met Operation and maintenance actions are required. In addition, it should be noted that one target object may be monitored by one or more monitoring modules, or one monitoring module may monitor one or more target objects, which is not limited in this embodiment.
可以理解,网络设备通过配置的管理模块和监测模块能够实现自动运维,使得网络设备具备智能性,解决了传统技术中针对网络设备的运维方式存在局限性的问题,例如,传统技术中存在滞后性和运维成本过高的问题,而本实施例通过管理模块和监测模块,不仅能够及时感知故障并进行运维,且可以有效减少人力物力成本。It can be understood that the network equipment can realize automatic operation and maintenance through the configured management module and monitoring module, which makes the network equipment intelligent and solves the problem of limitations in the operation and maintenance of network equipment in the traditional technology. For example, in the traditional technology, there are Lag and high cost of operation and maintenance, and this embodiment through the management module and monitoring module, not only can detect the fault in time and carry out operation and maintenance, but also can effectively reduce the cost of manpower and material resources.
由前文论述可知,两个管理模块互为主备关系,即它们互相作为对方的冗余备份,基于此,备用管理模块在确定主管理模块发生异常时可以切换为主管理模块,继续执行主管理模块所需执行的任务,在一实施方式中,备用管理模块可以实时监控主管理模块是否发生异常,若确定发生异常则切换为主管理模块,或者由其他模块实时监测主管理模块是否发生异常,若确定发生异常则由该模块告知备用管理模块,以使备用管理模块切换为主管理模块。因此,与仅设置一个管理模块的实施方式相比,本实施例可以有效提高网络设备的可靠性,即使得网络设备具备健壮性,例如,两个管理模块分别为第一管理模块和第二管理模块,此时第一管理模块和第二管理模块分别为主管理模块和备用管理模块,则在监测模块的监测过程中以及第一管理模块实施运维动作的过程中,第二管理模块在确定第一管理模块发生异常时切换为主管理模块,如此可以避免由于第一管理模块发生异常而导致的设备异常。It can be seen from the previous discussion that the two management modules are in the master-backup relationship with each other, that is, they serve as each other’s redundant backup. Based on this, the standby management module can switch to the master management module when it is determined that the master management module is abnormal, and continue to perform master management. The tasks that the module needs to perform, in one embodiment, the standby management module can monitor whether the main management module is abnormal in real time, if it is determined that an abnormality occurs, it will switch to the main management module, or other modules will monitor whether the main management module is abnormal in real time, If it is determined that an abnormality occurs, the module notifies the standby management module, so that the standby management module switches to the main management module. Therefore, compared with the implementation in which only one management module is provided, this embodiment can effectively improve the reliability of the network equipment, that is, make the network equipment robust. For example, the two management modules are respectively the first management module and the second management module. module, at this time the first management module and the second management module are the main management module and the standby management module respectively, then during the monitoring process of the monitoring module and the process of implementing the operation and maintenance actions of the first management module, the second management module determines When the first management module is abnormal, it is switched to be the main management module, so that the device abnormality caused by the first management module can be avoided.
在一实施例中,如图3所示,两个管理模块通过同一个通信接口与监测模块通信,其中, 通信接口用于实现管理模块和监测模块两者之间的数据交互,其可以包括在设备系统上运行的进程等等。如此,备用管理模块在切换为主管理模块之后,由于两个管理模块共用同一个通信接口,因此切换后的管理模块仍能够通过原先建立的通信链路继续接收监测模块发送的反馈信息,其中,原先建立的通信链路为发生异常的管理模块与监测模块之间基于通信接口建立的通信链路。例如,两个管理模块分别为第一管理模块和第二管理模块,此时第一管理模块和第二管理模块分别为主管理模块和备用管理模块,可以理解,第一管理模块已经通过通信接口建立了通信链路,则当第一管理模块发生异常时,第二管理模块可以直接通过该通信链路继续接收监测模块发送的反馈信息。In one embodiment, as shown in Figure 3, the two management modules communicate with the monitoring module through the same communication interface, wherein the communication interface is used to realize data interaction between the management module and the monitoring module, which may include Processes running on the device system, etc. In this way, after the standby management module is switched to the main management module, since the two management modules share the same communication interface, the switched management module can still continue to receive the feedback information sent by the monitoring module through the originally established communication link, wherein, The originally established communication link is a communication link established based on the communication interface between the abnormal management module and the monitoring module. For example, the two management modules are respectively the first management module and the second management module. At this time, the first management module and the second management module are respectively the main management module and the backup management module. It can be understood that the first management module has already passed the communication interface After the communication link is established, when an abnormality occurs in the first management module, the second management module can continue to receive the feedback information sent by the monitoring module directly through the communication link.
因此可以理解,备用管理模块在切换时不需要重新建立通信链路,通过已建立的通信链接即可与监测模块进行交互,跟两个管理模块通过不同的通信接口与监测模块通信的实施方式相比,本实施例可以有效节省备用管理模块的切换时长,从而提高了网络设备的可靠性。在一实施方式中,两个管理模块分别为第一管理模块和第二管理模块,此时第一管理模块和第二管理模块分别为主管理模块和备用管理模块,并且,第一管理模块通过第一通信接口与监测模块通信,第二管理模块通过第二通信接口与监测模块通信,则第一管理模块在发生异常时,第二管理模块在切换时需要与监测模块基于第二通信接口建立通信链路,这会导致切换时长过长,而本实施例中的两个管理模块共用同一个通信接口,不需要重新建立通信链路,因此可以有效节省切换时长。Therefore, it can be understood that the standby management module does not need to re-establish the communication link when switching, and can interact with the monitoring module through the established communication link, which is similar to the implementation mode in which the two management modules communicate with the monitoring module through different communication interfaces. Compared with this embodiment, the switching time of the standby management module can be effectively saved in this embodiment, thereby improving the reliability of the network device. In one embodiment, the two management modules are respectively the first management module and the second management module. At this time, the first management module and the second management module are respectively the main management module and the backup management module, and the first management module passes The first communication interface communicates with the monitoring module, and the second management module communicates with the monitoring module through the second communication interface. When an abnormality occurs in the first management module, the second management module needs to establish a monitoring module based on the second communication interface when switching. The communication link, which will lead to too long switching time, and the two management modules in this embodiment share the same communication interface, and there is no need to re-establish the communication link, so the switching time can be effectively saved.
在一实施例中,步骤S10可以包括如下内容:监测模块基于主管理模块下发的预设监测策略对目标对象进行监测,并在监测到目标对象的运行状况满足预设监测策略所设置的触发条件时,向主管理模块发送反馈信息。In an embodiment, step S10 may include the following content: the monitoring module monitors the target object based on the preset monitoring strategy issued by the main management module, and when the monitored operating condition of the target object meets the trigger set by the preset monitoring strategy When conditions are met, feedback information is sent to the main management module.
主管理模块可以预先将预设的监测策略发送给监测模块,从而使得监测模块可以根据预设监测策略实时对目标对象进行监测,如此监测模块在监测到满足预设监测策略所设置的触发条件时向主管理模块发送反馈信息,反馈信息用于指示主管理模块实施运维动作。The main management module can send the preset monitoring strategy to the monitoring module in advance, so that the monitoring module can monitor the target object in real time according to the preset monitoring strategy, so that when the monitoring module meets the trigger conditions set by the preset monitoring strategy Send feedback information to the main management module, and the feedback information is used to instruct the main management module to implement operation and maintenance actions.
在一实施方式中,预设监测策略可以包括第一监测策略、第二监测策略以及第三监测策略中的至少一种。In an embodiment, the preset monitoring strategy may include at least one of a first monitoring strategy, a second monitoring strategy and a third monitoring strategy.
第一监测策略用于指示监测模块对与网络设备连接的目标设备进行监测,并在监测到目标设备与网络设备之间发生连接震荡时,向主管理模块发送第一反馈信息,其中,第一反馈信息用于指示主管理模块断开网络设备与目标设备之间的连接,以及在第一预设时长后重新建立网络设备与目标设备之间的连接。在一实施方式中,目标设备与网络设备之间发生连接震荡指的是在一定时长内两个设备之间反复建立和断开连接,例如,在10分钟之内建立和断 开连接的次数大于3(即反复建立和断开连接),可以理解,长时间的连接震荡可能会导致整个网络路由发生震荡,因此为了避免这种情况发生,监测模块在监测到目标设备与网络设备之间发生连接震荡时可以向主管理模块发送第一反馈信息,从而使得主管理模块可以断开网络设备与目标设备之间的连接,避免整个网络路由发生震荡。另外,在一定时长后,主管理模块可以重新建立网络设备与目标设备之间的连接,以恢复网络设备的正常运行。示例性的,如图4所示,网络设备分别与第一设备和第二设备通信,三者组成了一个网络路由,基于此,监测模块可以基于第一监测策略实时对第一设备(即目标设备)进行监测,并在监测到连接震荡时向主管理模块发送第一反馈信息(可以根据实际情况合理设置),因此主管理模块可以断开网络设备与第一设备之间的连接,从而避免了整个网络路由发生震荡(例如避免了网络设备与第二设备发生连接震荡),另外在第一预设时长后(例如60分钟后)重新建立两者之间的连接,网络设备正常运行。The first monitoring strategy is used to instruct the monitoring module to monitor the target device connected to the network device, and to send the first feedback information to the main management module when it detects connection oscillation between the target device and the network device, wherein the first The feedback information is used to instruct the main management module to disconnect the connection between the network device and the target device, and to re-establish the connection between the network device and the target device after a first preset time period. In one embodiment, the occurrence of connection oscillation between the target device and the network device refers to the repeated establishment and disconnection between two devices within a certain period of time, for example, the number of establishment and disconnection within 10 minutes is greater than 3 (repeated connection establishment and disconnection), it is understandable that long-term connection oscillations may cause oscillations in the entire network route, so in order to avoid this from happening, the monitoring module detects that the connection between the target device and the network device During the vibration, the first feedback information can be sent to the main management module, so that the main management module can disconnect the connection between the network device and the target device, and avoid the vibration of the entire network route. In addition, after a certain period of time, the main management module can re-establish the connection between the network device and the target device, so as to restore the normal operation of the network device. Exemplarily, as shown in FIG. 4, the network device communicates with the first device and the second device respectively, and the three form a network route. Based on this, the monitoring module can monitor the first device (ie, the target device) in real time based on the first monitoring strategy. device) to monitor, and send the first feedback information to the main management module (which can be reasonably set according to the actual situation) when the connection oscillation is detected, so the main management module can disconnect the connection between the network device and the first device, thereby avoiding The entire network route is prevented from oscillating (for example, the connection between the network device and the second device is prevented from oscillating), and the connection between the two is re-established after the first preset time period (for example, 60 minutes later), and the network device operates normally.
第二监测策略用于指示监测模块对目标业务进行监测,并在监测到目标业务的接收流量超过第一预设阈值时,向主管理模块发送第二反馈信息,第二反馈信息用于指示主管理模块限制其他业务的带宽和/或增大目标业务的带宽。在一实施方式中,若目标业务的接收流量增大而目标业务的带宽不变,则会对目标业务的服务质量(Quality of Service,QoS)造成影响,因此为了避免这种情况发生,监测模块在监测到目标业务的接收流量超过第一预设阈值时可以向主管理模块发送第二反馈信息,从而使得主管理模块可以限制其他业务的带宽和/或增大目标业务的带宽,以确保目标业务的服务质量。在一实施方式中,目标业务可以包括视频会议业务,示例性的,如图5所示,此时网络设备与远方设备正在进行视频会议,则为了确保视频会议的服务质量,则监测模块可以对视频会议业务(即目标业务)进行监测,当监测到视频会议业务的接收流量超过第一预设阈值(可以根据实际情况合理设置)时向主管理模块发送第二反馈信息(可以根据实际情况合理设置),因此主管理模块可以限制其他业务的带宽和/或增大目标业务的带宽。The second monitoring strategy is used to instruct the monitoring module to monitor the target service, and when the received traffic of the target service is detected to exceed the first preset threshold, send second feedback information to the main management module, and the second feedback information is used to instruct the main The management module limits the bandwidth of other services and/or increases the bandwidth of the target service. In one embodiment, if the received traffic of the target business increases while the bandwidth of the target business remains unchanged, it will affect the quality of service (Quality of Service, QoS) of the target business, so in order to avoid this situation, the monitoring module When it is detected that the received traffic of the target service exceeds the first preset threshold, the second feedback information can be sent to the main management module, so that the main management module can limit the bandwidth of other services and/or increase the bandwidth of the target service to ensure the target The quality of service of the business. In one embodiment, the target service may include a video conferencing service. For example, as shown in FIG. Monitor the video conferencing service (that is, the target service), and send second feedback information (which can be reasonably set according to the actual situation) to the main management module when it is detected that the received traffic of the video conferencing service exceeds the first preset threshold (which can be reasonably set according to the actual situation). setting), so the main management module can limit the bandwidth of other services and/or increase the bandwidth of the target service.
第三监测策略用于指示监测模块对目标硬件进行监控,并在监控到目标硬件的利用率超过第二预设阈值时,向主管理模块发送第三反馈信息,其中,第三反馈信息用于指示主管理模块调整目标硬件和/或其他硬件的运行状态。在一实施方式中,目标硬件可以包括网络设备的CPU,当监测模块监测到CPU的利用率超过第二预设阈值(可以根据实际情况合理设置)时向主管理模块发送第三反馈信息(可以根据实际情况合理设置),因此主管理模块可以调整目标硬件的运行状态(例如增大CPU的工作频率)和/或其他硬件的运行状态(例如增大风扇的输出功率),以确保网络设备的正常运行。The third monitoring strategy is used to instruct the monitoring module to monitor the target hardware, and when the monitored utilization rate of the target hardware exceeds a second preset threshold, send third feedback information to the main management module, wherein the third feedback information is used for Instructs the primary management module to adjust the health status of the target hardware and/or other hardware. In one embodiment, the target hardware may include a CPU of a network device, and when the monitoring module detects that the utilization rate of the CPU exceeds a second preset threshold (which may be reasonably set according to actual conditions), it sends third feedback information (which may be set to the main management module) to the main management module. Reasonably set according to the actual situation), so the main management module can adjust the running state of the target hardware (such as increasing the operating frequency of the CPU) and/or the running state of other hardware (such as increasing the output power of the fan), so as to ensure the network equipment normal operation.
需要说明的是,在其他实施方式中预设监测策略还可以包括其他策略,例如业务进程状 态监测、业务质量监控等等,本实施例并不作限制。It should be noted that in other implementation manners, the preset monitoring strategy may also include other strategies, such as service process status monitoring, service quality monitoring, etc., which are not limited in this embodiment.
在一实施例中,该方法还可以包括:预先在两个管理模块中确定一个作为主管理模块。在一实施方式中,在监测模块开始监测前,网络设备需要在两个管理模块中确定一个作为主管理模块,在一实施方式中,网络设备可以基于预设的选定策略,根据两个管理模块各自的ID值在两个管理模块中确定一个作为主管理模块,例如,基于预设的选定策略,在两个管理模块中确定ID值较小的管理模块作为主管理模块。In an embodiment, the method may further include: pre-determining one of the two management modules as the main management module. In one embodiment, before the monitoring module starts monitoring, the network device needs to determine one of the two management modules as the main management module. The respective ID values of the modules determine one of the two management modules as the main management module, for example, based on a preset selection policy, determine the management module with the smaller ID value among the two management modules as the main management module.
在一实施例中,该方法还可以包括:对两个管理模块做数据同步处理。可以理解,由于两个管理模块互为主备关系,因此需要做数据同步处理。在一实施方式中,可以定时对两个管理模块做数据同步处理,例如,每5分钟做一次数据同步处理;在其他实施方式中,可以在主管理模块发生数据变更时,备用管理模块同步变更。In an embodiment, the method may further include: performing data synchronization processing on the two management modules. It can be understood that since the two management modules are active and standby with each other, data synchronization processing is required. In one embodiment, data synchronization processing can be performed on the two management modules at regular intervals, for example, data synchronization processing is performed every 5 minutes; in other embodiments, when data changes occur in the main management module, the backup management module can be changed synchronously .
在一实施例中,备用管理模块确定主管理模块发生异常,可以包括如下内容:备用管理模块周期性地向主管理模块发送预设的探测报文;备用管理模块在每次发送探测报文之后,若在第二预设时长之内没有接收到主管理模块返回的回应报文,则确定主管理模块发生异常。如图6所示,备用管理模块在发送探测报文之后,若有接收到回应报文,则确定主管理模块没有发生异常,备用管理模块可以继续向主管理模块发送探测报文;若在第二预设时长之内没有接收到回应报文,则可以确定主管理模块发生异常,备用管理模块切换为主管理模块,例如,在三倍时间周期之内没有接收到回应报文,则确定主管理模块发生异常。In an embodiment, the standby management module determines that the main management module is abnormal, which may include the following content: the standby management module periodically sends a preset detection message to the main management module; , if no response message returned by the main management module is received within the second preset time period, it is determined that the main management module is abnormal. As shown in Figure 6, after the standby management module sends the detection message, if it receives a response message, it is determined that the main management module is not abnormal, and the standby management module can continue to send the detection message to the main management module; If no response message is received within the preset time period, it can be determined that the main management module is abnormal, and the standby management module is switched to the main management module. For example, if no response message is received within three times the time period, it is determined that the main management module An exception occurred in the management module.
在一实施例中,备用管理模块在切换为主管理模块之后,该方法还可以包括:主管理模块重启发生异常的管理模块,并指示发生异常的管理模块在重启后切换为备用管理模块。示例性的,两个管理模块分别为第一管理模块和第二管理模块,此时第一管理模块和第二管理模块分别为主管理模块和备用管理模块,若第一管理模块发生异常,则第二管理模块切换为主管理模块。在切换完成之后,第二管理模块将第一管理模块进行重启,并指示第一管理模块在重启后切换为备用管理模块,以恢复第一管理模块的正常运行。In an embodiment, after the standby management module is switched to be the main management module, the method may further include: the main management module restarts the abnormal management module, and instructs the abnormal management module to switch to the standby management module after restarting. Exemplarily, the two management modules are respectively the first management module and the second management module. At this time, the first management module and the second management module are respectively the main management module and the backup management module. If the first management module is abnormal, then The second management module switches to the main management module. After the switching is completed, the second management module restarts the first management module, and instructs the first management module to switch to the standby management module after restarting, so as to restore the normal operation of the first management module.
本公开实施例还提供一种网络设备,如图7所示,包括处理器与存储器,该存储器用于存储计算机程序;该处理器用于执行计算机程序并在执行计算机程序时实现本公开实施例提供的任一项设备运维方法。An embodiment of the present disclosure also provides a network device, as shown in FIG. 7 , including a processor and a memory, where the memory is used to store a computer program; the processor is used to execute the computer program and realize the implementation provided by the embodiment of the present disclosure when executing the computer program. Any equipment operation and maintenance method.
应当理解的是,处理器可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其 中,通用处理器可以是微处理器,或者该处理器也可以是任何常规的处理器等。It should be understood that the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein, the general-purpose processor can be a microprocessor, or the processor can also be any conventional processor and the like.
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时,使处理器实现本公开实施例提供的任一项设备运维方法。The embodiment of the present disclosure also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor can realize any one of the equipment operation and maintenance provided by the embodiments of the present disclosure method.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读存储介质上,计算机可读存储介质可以包括计算机可读存储介质(或非暂时性介质)和通信介质(或暂时性介质)。Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer-readable storage media, which may include computer-readable storage media (or non-transitory media) and communication media (or transitory media).
如本领域普通技术人员公知的,术语计算机可读存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机可读存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。As known to those of ordinary skill in the art, the term computer-readable storage medium includes both volatile and non-volatile media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data. Volatile, removable and non-removable media. Computer-readable storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, Or any other medium that can be used to store desired information and that can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
示例性的,计算机可读存储介质可以是前述实施例所述的网络设备的内部存储单元,例如网络设备的硬盘或内存。计算机可读存储介质也可以是网络设备的外部存储设备,例如网络设备上配备的插接式硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。Exemplarily, the computer-readable storage medium may be an internal storage unit of the network device described in the foregoing embodiments, such as a hard disk or a memory of the network device. The computer-readable storage medium can also be an external storage device of the network device, such as a plug-in hard disk equipped on the network device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card ( Flash Card), etc.
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。The above is only a specific embodiment of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person familiar with the technical field can easily think of various equivalents within the technical scope of the present disclosure. Modifications or replacements should be covered within the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure should be determined by the protection scope of the claims.
Claims (10)
- 一种设备运维方法,其中,应用于网络设备,所述网络设备配置有两个管理模块以及若干监测模块,所述两个管理模块互为主备关系,所述方法包括:A device operation and maintenance method, which is applied to a network device, the network device is configured with two management modules and several monitoring modules, and the two management modules are in a master-backup relationship with each other, and the method includes:所述监测模块对目标对象进行监测,并在满足预设条件时向所述两个管理模块中的主管理模块发送反馈信息,所述反馈信息用于指示所述两个管理模块中的主管理模块实施预设的运维动作;The monitoring module monitors the target object, and sends feedback information to the main management module of the two management modules when a preset condition is met, and the feedback information is used to indicate that the main management module of the two management modules The module implements preset operation and maintenance actions;所述两个管理模块中的备用管理模块在确定所述两个管理模块中的主管理模块发生异常时,切换为主管理模块。When the standby management module of the two management modules determines that the main management module of the two management modules is abnormal, it switches to the main management module.
- 根据权利要求1所述的方法,所述两个管理模块通过同一个通信接口与所述监测模块通信,以使所述两个管理模块中的备用管理模块在切换为主管理模块之后能够通过原先建立的通信链路继续接收所述监测模块发送的反馈信息,其中,所述原先建立的通信链路为发生异常的管理模块与所述监测模块之间基于所述通信接口建立的通信链路。According to the method according to claim 1, the two management modules communicate with the monitoring module through the same communication interface, so that the backup management module of the two management modules can pass the original management module after switching to the main management module. The established communication link continues to receive the feedback information sent by the monitoring module, wherein the previously established communication link is a communication link established based on the communication interface between the abnormal management module and the monitoring module.
- 根据权利要求1所述的方法,其中,所述监测模块对目标对象进行监测,并在满足预设条件时向所述两个管理模块中的主管理模块发送反馈信息,包括:The method according to claim 1, wherein the monitoring module monitors the target object, and sends feedback information to the main management module of the two management modules when preset conditions are met, including:所述监测模块基于所述主管理模块下发的预设监测策略对目标对象进行监测,并在监测到所述目标对象的运行状况满足所述预设监测策略所设置的触发条件时,向所述主管理模块发送反馈信息。The monitoring module monitors the target object based on the preset monitoring strategy issued by the main management module, and when the monitored operating status of the target object meets the trigger condition set by the preset monitoring strategy, the The main management module sends feedback information.
- 根据权利要求3所述的方法,其中,所述预设监测策略包括第一监测策略、第二监测策略以及第三监测策略中的至少一种;The method according to claim 3, wherein the preset monitoring strategy comprises at least one of a first monitoring strategy, a second monitoring strategy and a third monitoring strategy;所述第一监测策略用于指示所述监测模块对与所述网络设备连接的目标设备进行监测,并在监测到所述目标设备与所述网络设备之间发生连接震荡时,向所述主管理模块发送第一反馈信息;所述第一反馈信息用于指示所述主管理模块断开所述网络设备与所述目标设备之间的连接,以及在第一预设时长后重新建立所述网络设备与所述目标设备之间的连接;The first monitoring strategy is used to instruct the monitoring module to monitor the target device connected to the network device, and when detecting connection oscillation between the target device and the network device, send a message to the master The management module sends first feedback information; the first feedback information is used to instruct the main management module to disconnect the connection between the network device and the target device, and to re-establish the a connection between a network device and said target device;所述第二监测策略用于指示所述监测模块对目标业务进行监测,并在监测到所述目标业务的接收流量超过第一预设阈值时,向所述主管理模块发送第二反馈信息,其中,所述目标业务包括视频会议业务;所述第二反馈信息用于指示所述主管理模块限制其他业务的带宽和/或增大所述目标业务的带宽;The second monitoring strategy is used to instruct the monitoring module to monitor the target service, and send second feedback information to the main management module when it is detected that the received traffic of the target service exceeds a first preset threshold, Wherein, the target service includes a video conference service; the second feedback information is used to instruct the main management module to limit the bandwidth of other services and/or increase the bandwidth of the target service;所述第三监测策略用于指示所述监测模块对目标硬件进行监测,并在监测到所述目标硬件的利用率超过第二预设阈值时,向所述主管理模块发送第三反馈信息,其中,所述目标硬件包括CPU;所述第三反馈信息用于指示所述主管理模块调整所述目标硬件的运行状态和/ 或其他硬件的运行状态。The third monitoring strategy is used to instruct the monitoring module to monitor the target hardware, and send third feedback information to the main management module when it is detected that the utilization rate of the target hardware exceeds a second preset threshold, Wherein, the target hardware includes a CPU; the third feedback information is used to instruct the main management module to adjust the running state of the target hardware and/or the running state of other hardware.
- 根据权利要求1-4任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 1-4, wherein the method further comprises:预先在所述两个管理模块中确定一个作为主管理模块;和/或Predetermining one of the two management modules as the main management module; and/or对所述两个管理模块做数据同步处理。Perform data synchronization processing on the two management modules.
- 根据权利要求5所述的方法,其中,所述在所述两个管理模块中确定一个作为主管理模块,包括:The method according to claim 5, wherein said determining one of the two management modules as the main management module comprises:基于预设的选定策略,根据所述两个管理模块各自的ID值在所述两个管理模块中确定一个作为主管理模块。Based on a preset selection strategy, one of the two management modules is determined as the main management module according to the respective ID values of the two management modules.
- 根据权利要求1-4任一项所述的方法,其中,所述两个管理模块中的备用管理模块确定所述两个管理模块中的主管理模块发生异常,包括:The method according to any one of claims 1-4, wherein the standby management module of the two management modules determines that an abnormality occurs in the main management module of the two management modules, comprising:所述备用管理模块周期性地向所述主管理模块发送预设的探测报文;The standby management module periodically sends a preset detection message to the main management module;所述备用管理模块在每次发送所述探测报文之后,若在第二预设时长之内没有接收到所述主管理模块返回的回应报文,则确定主管理模块发生异常。If the standby management module does not receive a response message returned by the main management module within a second preset time period after sending the detection message each time, it determines that the main management module is abnormal.
- 根据权利要求1-4任一项所述的方法,其中,所述两个管理模块中的备用管理模块在切换为主管理模块之后,所述方法还包括:The method according to any one of claims 1-4, wherein after the standby management module among the two management modules is switched to be the main management module, the method further comprises:所述主管理模块重启发生异常的管理模块,并指示所述发生异常的管理模块在重启后切换为备用管理模块。The main management module restarts the abnormal management module, and instructs the abnormal management module to switch to the standby management module after restarting.
- 一种网络设备,其中,包括处理器与存储器;A network device, including a processor and a memory;所述存储器,用于存储计算机程序;The memory is used to store computer programs;所述处理器,用于执行所述计算机程序并在执行所述计算机程序时实现如权利要求1至8中任一项所述的设备运维方法。The processor is configured to execute the computer program and implement the equipment operation and maintenance method according to any one of claims 1 to 8 when executing the computer program.
- 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器实现如权利要求1至8中任一项所述的设备运维方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor implements the method according to any one of claims 1 to 8 Equipment operation and maintenance method.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110566486.5 | 2021-05-24 | ||
CN202110566486.5A CN115396295A (en) | 2021-05-24 | 2021-05-24 | Equipment operation and maintenance method, network equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022247675A1 true WO2022247675A1 (en) | 2022-12-01 |
Family
ID=84114641
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/093050 WO2022247675A1 (en) | 2021-05-24 | 2022-05-16 | Device operation and maintenance method, network device, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115396295A (en) |
WO (1) | WO2022247675A1 (en) |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009237758A (en) * | 2008-03-26 | 2009-10-15 | Nec Corp | Server system, server management method, and program therefor |
CN102868597A (en) * | 2012-10-08 | 2013-01-09 | 鞠洪尧 | Intelligent redundant gateway |
CN103607297A (en) * | 2013-11-07 | 2014-02-26 | 上海爱数软件有限公司 | Fault processing method of computer cluster system |
CN106815071A (en) * | 2017-01-12 | 2017-06-09 | 上海轻维软件有限公司 | Big data job scheduling system based on directed acyclic graph |
JP2017215872A (en) * | 2016-06-01 | 2017-12-07 | 日本電信電話株式会社 | Backup control server, and application data backup method for service control server |
CN109412863A (en) * | 2018-11-23 | 2019-03-01 | 广州市成格信息技术有限公司 | A kind of novel intelligent AI O&M awareness apparatus |
CN109495893A (en) * | 2018-12-13 | 2019-03-19 | 叶东海 | A kind of mobile data Traffic Anomaly monitoring system |
CN110719601A (en) * | 2019-09-18 | 2020-01-21 | 四川豪威尔信息科技有限公司 | 5G base station online management system based on Internet of things |
CN111371584A (en) * | 2018-12-26 | 2020-07-03 | 中兴通讯股份有限公司 | Equipment fault processing method, management equipment and home gateway equipment |
CN111865974A (en) * | 2020-07-17 | 2020-10-30 | 上海国际技贸联合有限公司 | Network security defense system and method |
CN111858176A (en) * | 2020-07-22 | 2020-10-30 | 欧冶云商股份有限公司 | Remote monitoring fault self-healing system and method |
CN112165501A (en) * | 2020-08-05 | 2021-01-01 | 宁夏无线互通信息技术有限公司 | Remote operation and maintenance system and method for product analysis based on industrial internet identification |
CN112600718A (en) * | 2021-01-01 | 2021-04-02 | 谭世克 | Communication network monitoring management system |
-
2021
- 2021-05-24 CN CN202110566486.5A patent/CN115396295A/en active Pending
-
2022
- 2022-05-16 WO PCT/CN2022/093050 patent/WO2022247675A1/en active Application Filing
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009237758A (en) * | 2008-03-26 | 2009-10-15 | Nec Corp | Server system, server management method, and program therefor |
CN102868597A (en) * | 2012-10-08 | 2013-01-09 | 鞠洪尧 | Intelligent redundant gateway |
CN103607297A (en) * | 2013-11-07 | 2014-02-26 | 上海爱数软件有限公司 | Fault processing method of computer cluster system |
JP2017215872A (en) * | 2016-06-01 | 2017-12-07 | 日本電信電話株式会社 | Backup control server, and application data backup method for service control server |
CN106815071A (en) * | 2017-01-12 | 2017-06-09 | 上海轻维软件有限公司 | Big data job scheduling system based on directed acyclic graph |
CN109412863A (en) * | 2018-11-23 | 2019-03-01 | 广州市成格信息技术有限公司 | A kind of novel intelligent AI O&M awareness apparatus |
CN109495893A (en) * | 2018-12-13 | 2019-03-19 | 叶东海 | A kind of mobile data Traffic Anomaly monitoring system |
CN111371584A (en) * | 2018-12-26 | 2020-07-03 | 中兴通讯股份有限公司 | Equipment fault processing method, management equipment and home gateway equipment |
CN110719601A (en) * | 2019-09-18 | 2020-01-21 | 四川豪威尔信息科技有限公司 | 5G base station online management system based on Internet of things |
CN111865974A (en) * | 2020-07-17 | 2020-10-30 | 上海国际技贸联合有限公司 | Network security defense system and method |
CN111858176A (en) * | 2020-07-22 | 2020-10-30 | 欧冶云商股份有限公司 | Remote monitoring fault self-healing system and method |
CN112165501A (en) * | 2020-08-05 | 2021-01-01 | 宁夏无线互通信息技术有限公司 | Remote operation and maintenance system and method for product analysis based on industrial internet identification |
CN112600718A (en) * | 2021-01-01 | 2021-04-02 | 谭世克 | Communication network monitoring management system |
Also Published As
Publication number | Publication date |
---|---|
CN115396295A (en) | 2022-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230362051A1 (en) | Control Plane Device Switching Method and Apparatus, and Forwarding-Control Separation System | |
US8977886B2 (en) | Method and apparatus for rapid disaster recovery preparation in a cloud network | |
CN110149220B (en) | Method and device for managing data transmission channel | |
CN112511456B (en) | Flow control method, apparatus, device, storage medium, and computer program product | |
RU2471301C2 (en) | Functioning of network subjects in communication system comprising control network with levels of agents and control | |
US10911295B2 (en) | Server apparatus, cluster system, cluster control method and program | |
CN104753828A (en) | SDN controller, data central system and route connection method | |
CN112491700B (en) | Network path adjustment method, system, device, electronic equipment and storage medium | |
CN106936613B (en) | Method and system for rapidly switching main and standby Openflow switch | |
WO2017124791A1 (en) | Link detecting method and device | |
EP2698948A1 (en) | Method and device for determining failure elimination based on oam protocol | |
US20180373563A1 (en) | Method and system for processing communication channel | |
JP2006501717A (en) | Telecom network element monitoring | |
WO2016095344A1 (en) | Link switching method and device, and line card | |
US8775869B2 (en) | Device and method for coordinating automatic protection switching operation and recovery operation | |
WO2020052687A1 (en) | Network element anti-looping method and apparatus, device, and readable storage medium | |
CN113765690B (en) | Cluster switching method, system, device, terminal, server and storage medium | |
WO2022247675A1 (en) | Device operation and maintenance method, network device, and storage medium | |
CN117201381A (en) | Computing power routing method and gateway | |
CN112751740B (en) | ERPS subring resource release method, system, server and storage medium | |
WO2018035766A1 (en) | Network abnormality processing method and system | |
CN111224803B (en) | Multi-master detection method in stacking system and stacking system | |
CN113497753A (en) | Cross-device link aggregation method and system | |
WO2020181699A1 (en) | Method for managing management control converged telecommunications network, and system | |
CN108141406A (en) | A kind of method, apparatus and equipment of traffic failure processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22810402 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22810402 Country of ref document: EP Kind code of ref document: A1 |