CN111083003A

CN111083003A - Monitoring system and method, storage medium and processor

Info

Publication number: CN111083003A
Application number: CN201811230530.XA
Authority: CN
Inventors: 姜浩然
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2018-10-22
Filing date: 2018-10-22
Publication date: 2020-04-28

Abstract

The invention provides a monitoring system and method, a storage medium and a processor, wherein the system comprises: at least two monitoring nodes for monitoring the service board; each monitoring node of the at least two monitoring nodes is connected to the bus; and every two monitoring nodes are connected. The invention solves the problem that the state of the monitoring node can not be effectively collected in the related technology, and achieves the effects of effectively collecting the state of the monitoring node and enhancing the robustness of the monitoring system.

Description

Monitoring system and method, storage medium and processor

Technical Field

The present invention relates to the field of communications, and in particular, to a monitoring system and method, a storage medium, and a processor.

Background

In the communication field, the current monitoring system of a large-scale switch adopts a monitoring mode of a center and a node. And installing a monitoring board on each service board to monitor the running state of each single board. This monitoring board is independent of the service board. And each monitoring board collects the running state information of the service version and reports the running state information to the central monitoring board. Meanwhile, the power supply, the fan and other parts of the whole machine frame also need to upload relevant state information to the central monitoring board. After the central monitoring board collects the information, the running state of the system is judged, and then corresponding warning or correction actions are made. This architecture is shown in fig. 1.

The current monitoring system depends on a central monitoring board, and if the central monitoring board is damaged, the whole monitoring system cannot work normally. Secondly, all the nodes are independent from each other, and if one monitoring node cannot work normally, the state information of the monitoring node cannot be collected. Namely, the problem that the monitoring nodes cannot be effectively collected exists in the related art.

In view of the above technical problems, no effective solution has been proposed in the related art.

Disclosure of Invention

Embodiments of the present invention provide a monitoring system and method, a storage medium, and a processor, so as to at least solve a problem that a state of a monitoring node cannot be effectively collected in a related art.

According to an embodiment of the present invention, there is provided a monitoring system including: at least two monitoring nodes for monitoring the service board; each monitoring node of the at least two monitoring nodes is connected to a bus; and every two monitoring nodes are connected.

Optionally, the system further comprises: at least two heat sinks and at least two power supplies; wherein the at least two heat sinks and the at least two power supplies are both connected to the bus; each two of the at least two heat dissipation devices are connected with each other, and the at least two heat dissipation devices are connected with the service board; and each two of the at least two power supplies are connected pairwise, and the at least two power supplies are connected with the service board.

Optionally, the at least two monitoring nodes are further configured to collect, through the bus, status information of: the at least two monitoring nodes, the at least two heat sinks, the at least two power supplies.

Optionally, the at least two monitoring nodes are further configured to determine an abnormal device according to the acquired state information, perform alarm processing on the abnormal device, and broadcast a processing result on the bus.

Optionally, monitoring devices are respectively arranged in the at least two heat dissipation devices and the at least two power supplies, wherein the monitoring devices in the heat dissipation devices are used for transmitting the state information of the heat dissipation devices to the at least two monitoring nodes through the bus, and the monitoring devices in the power supplies are used for transmitting the state information of the power supplies to the at least two monitoring nodes through the bus.

Optionally, the at least two monitoring nodes are further configured to collect status information of the service board and broadcast the status information to the bus, where different monitoring nodes correspond to different identification numbers, and different identification numbers correspond to different priorities for broadcasting the status information of the service board.

Optionally, when one of the monitoring nodes connected in pairs fails, the other monitoring node is used to restart and/or upgrade the failed monitoring node.

According to another embodiment of the present invention, there is provided a monitoring method including: at least two monitoring nodes in the monitoring system monitor the service board, wherein each monitoring node in the at least two monitoring nodes is connected to the bus, and each monitoring node is connected with each other.

Optionally, the method further comprises: at least two power supplies in the monitoring system supply power to the service board and the monitoring system, and at least two heat dissipation devices in the monitoring system dissipate heat of the service board, wherein the at least two heat dissipation devices and the at least two power supplies are both connected to the bus; every two of the at least two heat dissipation devices are connected; and each power supply of the at least two power supplies is connected in pairs.

Optionally, the method further comprises: the at least two monitoring nodes collect the state information of the following devices through the bus: the at least two monitoring nodes, the at least two heat sinks, the at least two power supplies.

Optionally, after the at least two monitoring nodes collect the state information of each device through the bus, the method further includes: and the at least two monitoring nodes determine abnormal equipment according to the acquired state information, perform alarm processing on the abnormal equipment and broadcast a processing result on the bus.

Optionally, the method further comprises: the monitoring equipment in the at least two radiating equipment transmits the state information of the radiating equipment to the at least two monitoring nodes through the bus; and the monitoring equipment in the at least two power supplies transmits the state information of the power supplies to the at least two monitoring nodes through the bus.

Optionally, the monitoring, by the at least two monitoring nodes in the monitoring system, the service board includes: the at least two monitoring nodes collect the state information of the service board and broadcast the state information to the bus, wherein different monitoring nodes correspond to different identification numbers, and different identification numbers correspond to different priorities for broadcasting the state information of the service board.

Optionally, the method further comprises: when one of the monitoring nodes connected pairwise fails, the other monitoring node is used for restarting and/or upgrading the version of the failed monitoring node.

According to yet another embodiment of the present invention, there is also provided a storage medium including a stored program, wherein the program performs any one of the above methods when executed.

According to yet another embodiment of the present invention, there is also provided a processor for executing a program, wherein the program executes to perform the method of any one of the above.

According to the invention, the monitoring system comprises at least two monitoring nodes for monitoring the service board, each monitoring node of the at least two monitoring nodes is connected to the bus, and each monitoring node is connected in pairs. The monitoring nodes can be connected with each other through the bus, every two monitoring nodes can be connected with each other, and the information of each monitoring node can be collected through the bus between the monitoring nodes, so that the problem that the state of the monitoring nodes cannot be effectively collected in the related art can be solved, the state of the monitoring nodes can be effectively collected, and the effect of enhancing the robustness of the monitoring system is achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a diagram of a monitoring system architecture in the related art;

FIG. 2a is a block diagram of a monitoring system in an embodiment of the present invention;

FIG. 2b is a block diagram of a monitoring system according to an embodiment of the present invention;

FIG. 3 is a flow chart of a monitoring method according to an embodiment of the invention;

fig. 4 is a flow chart of a method in an embodiment of the invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Example 1

Fig. 2a is a structural diagram (one) of a monitoring system in an embodiment of the present invention, and as shown in fig. 2a, the monitoring system includes: at least two monitoring nodes for monitoring the service board; each monitoring node of the at least two monitoring nodes is connected to the bus; and every two monitoring nodes are connected. In this embodiment, compared to the architecture diagram of the tree-shaped monitoring system in the prior art (shown in fig. 1), the monitoring nodes in this embodiment are connected in a ring through a bus, that is, each monitoring node is connected to the bus, and can broadcast to other monitoring nodes on the bus, that is, each monitoring node can receive status information of other monitoring nodes through the bus. And all the monitoring nodes are connected in pairs, the two monitoring nodes in each pair can also carry out mutual monitoring, and the monitored state information is broadcasted on the bus, so that the monitoring nodes in the whole system are all in monitoring. Therefore, the problem that the state of the monitoring node cannot be effectively collected in the related technology can be solved, and the effects of effectively collecting the state of the monitoring node and enhancing the robustness of the monitoring system can be achieved.

The partner nodes shown in fig. 2b are devices that are connected two by two and are each a partner to each other. Namely, the devices of the same type are connected to be the partner node through a debug line JIAG.

In an optional embodiment, the system further includes: at least two heat sinks and at least two power supplies; wherein at least two heat sinks and the at least two power supplies are both connected to the bus; each two of the at least two heat dissipation devices are connected with each other, and the at least two heat dissipation devices are connected with the service board; and each two of the two power supplies are connected in pairs, and the at least two power supplies are connected with the service board. In this embodiment, the at least one heat dissipation device may be a fan, configured to dissipate heat of the service board and the monitoring system; the at least two power supplies are used for supplying power to the service board and the monitoring system; and the monitoring node can adjust the power supply voltage of the power supply and the heat dissipation intensity of the heat dissipation equipment according to the collected state of the service board. In addition, other similar devices connected in the monitoring system can be connected pairwise to perform mutual monitoring, so that the monitoring accuracy is improved.

In an optional embodiment, the at least two monitoring nodes are further configured to collect status information of the following devices through the bus: in this embodiment, the status information of the device may be operating status information or fault information. When the fault information is collected, each monitoring node broadcasts on the bus according to the priority of the monitoring node and gives an alarm.

In an optional embodiment, the at least two monitoring nodes are further configured to determine an abnormal device according to the collected state information, perform alarm processing on the abnormal device, and broadcast a processing result on the bus. The abnormal device may be a monitoring node, or a service board, or a heat dissipation device or a power supply. The alarm processing mode comprises the steps of sending an alarm and displaying fault information of equipment.

In an optional embodiment, monitoring devices are respectively disposed in the at least two heat dissipation devices and the at least two power supplies, where the monitoring devices in the heat dissipation devices are configured to transmit status information of the heat dissipation devices to the at least two monitoring nodes through a bus, and the monitoring devices in the power supplies are configured to transmit status information of the power supplies to the at least two monitoring nodes through the bus. In this embodiment, the heat dissipation devices and the power supply devices connected in pairs may also be monitored, that is, the devices connected in pairs in the same class may be monitored, and corresponding processing may be performed according to the state of the other device.

In an optional embodiment, the at least two monitoring nodes are further configured to collect status information of the service board and broadcast the status information to the bus, where different monitoring nodes correspond to different identification numbers, and different identification numbers correspond to different priorities for broadcasting the status information of the service board. In this embodiment, the identification number may be a slot number for setting the monitoring node, or may be identification information of the monitoring node itself. I.e. mainly for broadcast sequencing. I.e. the higher the priority, the higher the priority of the broadcast.

In an alternative embodiment, when one of the monitoring nodes connected in pairs fails, the other monitoring node is used for restarting and/or upgrading the version of the failed monitoring node. In this embodiment, the monitoring nodes connected in pairs monitor each other, so that the robustness of the monitoring system is increased.

Example 2:

in the present embodiment, a monitoring method is provided, and fig. 3 is a flowchart of the monitoring method according to the embodiment of the present invention, as shown in fig. 3, the flowchart includes the following steps:

step S302, at least two monitoring nodes in the monitoring system monitor the service board, wherein each monitoring node in the at least two monitoring nodes is connected to the bus, and each monitoring node is connected with each other.

In this embodiment, one monitoring node of the at least two monitoring nodes monitors one service board, and monitoring nodes connected in pairs monitor each other.

Through the steps, at least two monitoring nodes in the monitoring system monitor the service board, each monitoring node in the at least two monitoring nodes is connected to the bus, and each monitoring node is connected with each other, namely all monitoring nodes can broadcast on the bus, and each monitoring node can monitor each other. Therefore, the problem that the state of the monitoring node cannot be effectively collected in the related technology is solved, and the effects of effectively collecting the state of the monitoring node and enhancing the robustness of the monitoring system are achieved.

In an optional embodiment, the method further includes: at least two power supplies in the monitoring system supply power to the service board and the monitoring system, and at least two heat dissipation devices in the monitoring system dissipate heat of the service board, wherein the at least two heat dissipation devices and the at least two power supplies are connected to the bus; every two of the at least two heat dissipation devices are connected; each power supply of the at least two power supplies is connected in pairs. In the present embodiment, the connection relationship between the devices is shown in detail in fig. 2 b.

In an optional embodiment, the method further includes: the method comprises the following steps that at least two monitoring nodes collect state information of the following devices through a bus: the system comprises at least two monitoring nodes, at least two heat dissipation devices and at least two power supplies. In this embodiment, the devices described above may also include other devices connected to the bus.

In an optional embodiment, after at least two monitoring nodes collect status information of each device through the bus, the method further includes: and at least two monitoring nodes determine abnormal equipment according to the acquired state information, perform alarm processing on the abnormal equipment and broadcast the processing result on the bus. Namely, each monitoring device can receive the broadcasted alarm processing through the bus.

In an optional embodiment, the method further includes: the monitoring equipment in the at least two radiating equipment transmits the state information of the radiating equipment to the at least two monitoring nodes through the bus; and the monitoring equipment in the at least two power supplies transmits the state information of the power supplies to the at least two monitoring nodes through the bus.

In an optional embodiment, the monitoring of the service board by the at least two monitoring nodes in the monitoring system comprises: the at least two monitoring nodes collect the state information of the service board and broadcast the state information to the bus, wherein different monitoring nodes correspond to different identification numbers, and different identification numbers correspond to different priorities for broadcasting the state information of the service board. In this embodiment, a plurality of monitoring nodes can receive the status information of the service board, thereby improving the monitoring accuracy.

In an optional embodiment, the method further includes: when one of the monitoring nodes connected pairwise fails, the other monitoring node is used for restarting and/or upgrading the version of the failed monitoring node. In this embodiment, the version of the monitoring node may be upgraded when no failure occurs.

The present invention will be described in detail with reference to the following specific examples:

specific example 1:

in the field of communication, a large-scale switch is composed of various boards such as a main control board, a line card and a switch board. The operation state of each board (corresponding to the monitoring node) is monitored, and it is very necessary to identify and process the abnormal board. If the monitoring system and the service systems of the respective boards operate on a Central Processing Unit (CPU), the board is abnormal, and the monitoring system may also be abnormal. The monitoring function cannot be achieved. There is a need for a service independent monitoring system.

The current monitoring system of the large-scale switch adopts a monitoring mode of a center and a node. And installing a monitoring board on each service board to monitor the running state of each single board. The single board is independent of the service board. Each monitoring board collects the running state information of the single board and reports the running state information to the central monitoring board. Meanwhile, the power supply, the fan and other parts of the whole machine frame also need to upload relevant state information to the central monitoring board. After the central monitoring board collects the information, the running state of the system is judged, and then corresponding warning or correction actions are made. The current monitoring system has some defects, firstly, the monitoring system depends on a central monitoring board, and if the central monitoring board is damaged, the whole monitoring system can not work normally. Secondly, the nodes are independent from each other, and if one monitoring node cannot work normally, how to effectively collect the state information of the monitoring node. The current monitoring architecture can only know the state before the monitoring loses communication, and can not effectively collect the state of the monitoring node after the problem is solved. This monitoring system is not very robust.

In order to solve the above technical problem, the present embodiment provides a distributed monitoring system (corresponding to the monitoring system described above). The stability of the whole monitoring system is higher. The concept of a central monitoring node is eliminated, each node (corresponding to the monitoring node) is a central node, messages are mutually synchronized among the nodes, and the nodes not only comprise a monitoring board on a service board, but also comprise a fan and a power supply of a machine frame. The nodes communicate messages with each other, and each node can know the running state of the whole machine frame. And then a corresponding monitoring action is made. Such as fan nodes, know the temperature of each board. The speed can be independently regulated according to the temperature. The monitoring board on the service board knows the power consumption conditions of other single boards and the total power consumption of the power supply, and can determine whether the current single board can be powered on or not. Logging on to any one monitoring node. The monitoring information of the whole machine frame can be checked. And is convenient to manage.

Most importantly, the same type of nodes can directly monitor each other two by two. Joint Test Action Group (JTAG) JTAG, which interconnects two nodes of the same type, can know the running state of the CPU (corresponding to the monitoring device) of the other node through JTAG. Therefore, the monitoring node can monitor not only the service board but also other monitoring nodes. And collecting the state of other monitoring nodes after problems occur. And responds with processing, see figure 2b for details.

The monitoring architecture provided by the specific embodiment is convenient to manage on one hand, and on the other hand, more effective modes are provided for collecting information and correcting problems after the monitoring system has problems.

Fig. 4 is a flowchart of a method in an embodiment of the present invention, and as shown in fig. 4, the embodiment includes the following steps:

step 401: each node of the monitoring system has access to the same bus (e.g., a Controller Area Network (CAN) bus). Interconnecting the nodes.

Step 402: two nodes of the same type are interconnected through JTAG. In this embodiment referred to as a partner node.

Step 403: and the monitoring node collects the state information on the service board. And then broadcast to the bus. To avoid bus collisions, priority may be given by slot number, with smaller (larger) slots having higher priority. The monitoring node collects information such as temperature and power consumption on the single board and broadcasts and sends the information through the CAN bus. The data may be broadcast according to the priority of slot number transmission. The data is broadcast first with a high priority. The data structure is roughly as follows:

step 404: the node receives messages sent by other nodes. The information is then stored locally. And according to the collected information, the operation condition of the whole system is monitored. And if the abnormity occurs, sending an alarm or carrying out other corresponding treatment. And broadcasts the processing result to inform other nodes. When a node has a problem, the partner board of the node repairs the node according to the actual situation.

Step 405: and processing by the monitoring node with high priority.

Step 406-410: the monitoring node receives information from other nodes, including fans and power supplies. The monitoring node acquires the information to know the running state of the whole machine frame. And then judging whether the power consumption, the temperature and the like of the single board are abnormal or not according to the monitoring information to perform corresponding processing. And broadcasts the state and information of the own single board. The next priority node of the node repeats the operation. And so on.

If a monitoring node hangs up or crashes. And the normal operation cannot be realized. If the partner board does not receive the information of the node within a certain time, the partner board considers that the node has an offline problem.

First the partner node resets the node through JTAG. Communication is not resumed after a reset period. The CPU register of the problem node is then read through JTAG. The cause of the problem is judged through the register. And then reporting an alarm or the next repair action according to the specific reason.

The partner board also has an important function of upgrading the other node. If the node hangs up because of a program problem. And the upgrading CAN not be realized through a CAN bus. The partner board can perform program upgrade on the problem node through JTAG.

In summary, the monitoring node also adopts the decentralized architecture, and any monitoring node knows the operation state of the whole system. Logging in any node can monitor and control the whole monitoring system. Most importantly, the invention provides a method for monitoring nodes to monitor each other. The partner node can directly collect the running state of the other side, and the problem of recovering and positioning the problem node is facilitated. And the other party can be upgraded online. The robustness of the whole monitoring system is effectively improved. The distributed monitoring system makes the whole monitoring system more stable. The monitored information is more accurate, and the monitored reaction action is more timely.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

An embodiment of the present invention further provides a storage medium including a stored program, where the program executes any one of the methods described above.

Alternatively, in the present embodiment, the storage medium may be configured to store program codes for executing the above steps.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present invention also provide a processor configured to execute a program, where the program executes to perform any of the steps in the method.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A monitoring system, comprising:

at least two monitoring nodes for monitoring the service board;

each monitoring node of the at least two monitoring nodes is connected to the bus;

and every two monitoring nodes are connected.

2. The system of claim 1, further comprising:

at least two heat sinks and at least two power supplies;

wherein the at least two heat sinks and the at least two power supplies are both connected to the bus;

each two of the at least two heat dissipation devices are connected with each other, and the at least two heat dissipation devices are connected with the service board;

and each two of the at least two power supplies are connected pairwise, and the at least two power supplies are connected with the service board.

3. The system of claim 2, wherein the at least two monitoring nodes are further configured to collect status information via the bus for:

the at least two monitoring nodes, the at least two heat sinks, the at least two power supplies.

4. The system according to claim 3, wherein the at least two monitoring nodes are further configured to determine an abnormal device according to the collected status information, perform alarm processing on the abnormal device, and broadcast a processing result on the bus.

5. The system according to claim 2, wherein monitoring devices are respectively disposed in the at least two heat dissipation devices and the at least two power supplies, wherein the monitoring devices in the heat dissipation devices are configured to transmit status information of the heat dissipation devices to the at least two monitoring nodes through the bus, and the monitoring devices in the power supplies are configured to transmit status information of the power supplies to the at least two monitoring nodes through the bus.

6. The system of claim 1, wherein the at least two monitoring nodes are further configured to collect status information of the service board and broadcast the status information to the bus, wherein different monitoring nodes correspond to different identification numbers, and different identification numbers correspond to different priorities for broadcasting the status information of the service board.

7. The system according to claim 1, characterized in that when one of the monitoring nodes connected two by two fails, the other one is used to restart and/or upgrade the failed monitoring node.

8. A method of monitoring, comprising:

at least two monitoring nodes in the monitoring system monitor the service board, wherein each monitoring node in the at least two monitoring nodes is connected to the bus, and each monitoring node is connected with each other.

9. The method of claim 8, further comprising:

at least two power supplies in the monitoring system supply power to the service board and the monitoring system, and at least two heat dissipation devices in the monitoring system dissipate heat from the service board, wherein,

the at least two heat dissipation devices and the at least two power supplies are both connected to the bus;

every two of the at least two heat dissipation devices are connected;

and each power supply of the at least two power supplies is connected in pairs.

10. The method of claim 9, further comprising:

the at least two monitoring nodes collect the state information of the following devices through the bus:

11. The method of claim 10, wherein after the at least two monitoring nodes collect status information of the devices via the bus, the method further comprises:

and the at least two monitoring nodes determine abnormal equipment according to the acquired state information, perform alarm processing on the abnormal equipment and broadcast a processing result on the bus.

12. The method of claim 9, further comprising:

the monitoring equipment in the at least two radiating equipment transmits the state information of the radiating equipment to the at least two monitoring nodes through the bus;

and the monitoring equipment in the at least two power supplies transmits the state information of the power supplies to the at least two monitoring nodes through the bus.

13. The method of claim 8, wherein the monitoring of the service board by the at least two monitoring nodes in the monitoring system comprises:

the at least two monitoring nodes collect the state information of the service board and broadcast the state information to the bus, wherein different monitoring nodes correspond to different identification numbers, and different identification numbers correspond to different priorities for broadcasting the state information of the service board.

14. The method of claim 8, further comprising:

when one of the monitoring nodes connected pairwise fails, the other monitoring node is used for restarting and/or upgrading the version of the failed monitoring node.

15. A storage medium comprising a stored program, wherein the program when executed performs the method of any one of claims 8 to 14.

16. A processor, configured to run a program, wherein the program when running performs the method of any one of claims 8 to 14.