[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN107632918B - Monitoring system and method for computing storage equipment - Google Patents

Monitoring system and method for computing storage equipment Download PDF

Info

Publication number
CN107632918B
CN107632918B CN201710763344.1A CN201710763344A CN107632918B CN 107632918 B CN107632918 B CN 107632918B CN 201710763344 A CN201710763344 A CN 201710763344A CN 107632918 B CN107632918 B CN 107632918B
Authority
CN
China
Prior art keywords
alarm
log
information
computing storage
storage device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710763344.1A
Other languages
Chinese (zh)
Other versions
CN107632918A (en
Inventor
栾英杰
宋允东
刘文曜
袁丁
宋辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
ICBC Technology Co Ltd
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN201710763344.1A priority Critical patent/CN107632918B/en
Publication of CN107632918A publication Critical patent/CN107632918A/en
Application granted granted Critical
Publication of CN107632918B publication Critical patent/CN107632918B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)
  • Alarm Systems (AREA)

Abstract

The embodiment of the invention provides a monitoring method and a monitoring system for computing storage equipment, wherein the method comprises the following steps: acquiring alarm information generated by an existing monitoring system of each to-be-monitored computing storage device in real time, and analyzing device identification information and error report related information of the to-be-monitored computing storage device which gives an alarm according to the alarm information; selecting a script file corresponding to the alarm information from preset script files according to the equipment identification information and the error report related information, and generating a log capture file; and selecting a corresponding log collection mode according to the log capture file, triggering the to-be-monitored computing storage equipment which generates the alarm, and executing the selected script file by adopting the parameter-triggered to-be-monitored computing storage equipment which generates the alarm to capture the log of the to-be-monitored computing storage equipment which generates the alarm. The scheme is beneficial to reducing the cost of human resources and the learning pressure of professionals, and meanwhile, the fault positioning efficiency is improved.

Description

Monitoring system and method for computing storage equipment
Technical Field
The invention relates to the technical field of monitoring processing of non-network IT equipment such as servers, disk drives, tape libraries and the like, in particular to a monitoring system and a monitoring method of computing storage equipment.
Background
For operation and maintenance work, the existing monitoring system is not considered enough for a large data center which uses a plurality of brands of equipment in a large quantity. In addition to the harsh requirements of the bank data center on safety, no automatic hardware monitoring and log collecting tool aiming at different brands and different types of equipment exists in the market at present. Meanwhile, along with the increasing large scale of the data center, the traditional passive equipment operation and maintenance mode based on the failure troubleshooting after the alarm information is seen is low in efficiency, the failure cannot be located quickly in time, and the production stability is influenced; meanwhile, the demand of manpower and material resources by the mode is increased in a geometric grade, and the workload of personnel is large. Such as: each manufacturer has an independent log packet and a generation mode, and all the received server alarm information needs to be run into a machine room to collect the server and disk machine alarm information on site every time, so that the time and labor are wasted, the safety is affected, and the information can be completed by personnel with certain special technical levels.
Disclosure of Invention
The embodiment of the invention provides a monitoring method for computing storage equipment, which aims to solve the technical problems that in the prior art, the equipment operation and maintenance mode is low in efficiency, faults cannot be timely and quickly located, and time and labor are wasted. The method comprises the following steps: acquiring alarm information generated by an existing monitoring system of each to-be-monitored computing storage device in real time, and analyzing device identification information and error report related information of the to-be-monitored computing storage device which gives an alarm according to the alarm information; selecting a script file corresponding to the alarm information from preset script files according to the equipment identification information and the error reporting related information, and generating a log capture file, wherein the log capture file comprises the selected script file, a log collection mode and parameters required for triggering the selected script file; and selecting a corresponding log collection mode according to the to-be-monitored computing storage device triggered to alarm by the log capture file, and executing the selected script file by the to-be-monitored computing storage device triggered to alarm by the parameter to capture the log of the to-be-monitored computing storage device triggered to alarm.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the processor implements any of the above-mentioned monitoring methods for the computing storage device.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing the monitoring method of the computing storage device is stored in the computer-readable storage medium.
The embodiment of the invention also provides a monitoring system of the computing storage equipment, which is used for solving the technical problems that the equipment operation and maintenance mode in the prior art is low in efficiency, cannot locate the fault in time and rapidly, and is time-consuming and labor-consuming. The system comprises: the system comprises a data acquisition device, a data comparison device and a log generation device, wherein the data acquisition device is connected with the existing monitoring system of each to-be-monitored computing storage device, and is used for acquiring alarm information generated by the existing monitoring system of each to-be-monitored computing storage device in real time and analyzing equipment identification information and error report related information of the to-be-monitored computing storage device which generates the alarm according to the alarm information; the data comparison device is used for selecting a script file corresponding to the alarm information from preset script files according to the equipment identification information and the error report related information to generate a log capture file, wherein the log capture file comprises the selected script file, a log collection mode and parameters required by triggering the selected script file; and the log generating device is used for selecting a corresponding log collecting mode according to the to-be-monitored computing storage equipment which is triggered to alarm by the log capturing file, and executing the selected script file by adopting the to-be-monitored computing storage equipment which is triggered to alarm by the parameter to capture the log of the to-be-monitored computing storage equipment which is triggered to alarm.
In the embodiment of the invention, the alarm information of various types of computing and storing equipment in centralized monitoring can be collected by collecting the alarm information generated by the existing monitoring system of each computing and storing equipment to be monitored in real time, the equipment identification information and the error reporting related information of the computing and storing equipment to be monitored, which generates the alarm, are analyzed through the alarm information, the script file corresponding to the alarm information is selected according to the equipment identification information and the error reporting related information to generate the log capture file, and finally, the log of the computing and storing equipment to be monitored, which generates the alarm, is captured according to the log capture file. When the alarm happens to the computing storage device, the user needs to run into the machine room to collect alarm information of the server, the disk machine and the like on site, log logs of the computing storage device to be monitored, which generate the alarm, can be automatically captured according to the collected alarm information, and a front-line operation and maintenance worker can quickly react according to the output log logs so as to determine a final alarm processing scheme. By using the monitoring method, the manual participation degree can be greatly reduced, the hardware operation and maintenance standardization degree is improved, the human resource cost is reduced, the learning pressure of professionals is reduced, and the fault positioning efficiency is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a schematic diagram of a monitoring system for computing storage devices according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a data acquisition apparatus according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a data comparison apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a log generating apparatus according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a data filtering apparatus according to an embodiment of the present invention;
fig. 6 is a flowchart of a monitoring method for a computing storage device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
In an embodiment of the present invention, a monitoring system for a computing storage device is provided, as shown in fig. 1, the system includes: data acquisition device 001, data comparison device 002, log generation device 003 and data filtering device 004. The data acquisition device 001 is connected with the existing monitoring system of each computing storage device to be monitored, and is used for acquiring alarm information generated by the existing monitoring system of each computing storage device to be monitored in real time and analyzing device identification information and error report related information of the computing storage device to be monitored, which generates an alarm, according to the alarm information;
the data comparison device 002 is configured to select a script file corresponding to the alarm information from preset script files according to the device identification information and the error report related information, and generate a log capture file, where the log capture file includes the selected script file, a log collection mode, and parameters required for triggering the selected script file;
the log generating device 003 is configured to select a corresponding log collecting mode according to the log capture file, trigger the to-be-monitored computing and storing device that generates the alarm to execute the selected script file by using the parameter-triggered to-be-monitored computing and storing device that generates the alarm to capture a log of the to-be-monitored computing and storing device that generates the alarm
As shown in fig. 1, in the embodiment of the present invention, by acquiring the alarm information generated by the existing monitoring system of each computing storage device to be monitored in real time, the alarm information of each type of computing storage device in centralized monitoring can be acquired, the device identification information and the error reporting related information of the computing storage device to be monitored, which generates an alarm, are analyzed through the alarm information, and then the script file corresponding to the alarm information is selected according to the device identification information and the error reporting related information to generate the log capture file, and finally, the log of the computing storage device to be monitored, which generates an alarm, is captured according to the log capture file. When the alarm happens to the computing storage device, the user needs to run into the machine room to collect alarm information of the server, the disk machine and the like on site, log logs of the computing storage device to be monitored, which generate the alarm, can be automatically captured according to the collected alarm information, and a front-line operation and maintenance worker can quickly react according to the output log logs so as to determine a final alarm processing scheme. By using the monitoring method, the manual participation degree can be greatly reduced, the hardware operation and maintenance standardization degree is improved, the human resource cost is reduced, the learning pressure of professionals is reduced, and the fault positioning efficiency is improved.
In specific implementation, the log refers to a record of the running state of hardware, for example, IQYY of an IBM high-end server.
In a specific implementation, the computing storage device to be monitored may be a non-network IT device such as a server, a disk drive, a tape library, and the like, as shown in fig. 1, the computing storage device to be monitored may be, for example, an IBM high-end server, an IBM low-end server, an HP X86 server, a DELL X86 server, an X86 server in china, a X86 server in the wave, a disk drive in china, an HDS disk drive, and the like. The target servers and the disk drives are required to pre-deploy in-band agent monitoring functions according to the existing monitoring mode, or configure and start corresponding monitoring functions on the device management machine and the out-of-band monitoring module, wherein the monitoring functions are the existing monitoring systems of the computing storage devices to be monitored.
In specific implementation, in order to achieve data acquisition, in this embodiment, as shown in fig. 2, the data acquisition device 001 includes:
the real-time monitoring module 102 is configured to connect to an existing monitoring system of each to-be-monitored computing and storage device 101, and collect, in real time, alarm information generated by the existing monitoring system of each to-be-monitored computing and storage device;
the monitoring and collecting module 103 is configured to analyze the alarm information into presentation data in a uniform format, add a distinguishing mark on each alarm information according to a source of the alarm information, where the distinguishing mark is used to distinguish a type of a to-be-monitored computing storage device that generates an alarm corresponding to the alarm information, and analyze device identification information and error reporting related information of the to-be-monitored computing storage device that generates the alarm according to the alarm information, where the device identification information includes a device type and an IP address of the to-be-monitored computing storage device that generates the alarm, and the error reporting related information includes an error reporting code and error reporting description information that generate the alarm;
and the keyword collection module 104 is configured to store the analyzed device identification information and the error reporting related information.
In a specific implementation, the real-time monitoring module 102 may be implemented by a plurality of servers, and the plurality of servers respectively monitor different servers or a computer storage device of a disk computer. For example, taking hua as a wave server as an example, the existing line control system adopts an out-of-band monitoring mode and is isolated from a service IP; taking X86 of HP and DELL as an example, the existing line control system installs agent on the operating system of a target machine to acquire monitoring information; taking IBM high-end servers and hua, hitachi disk machines as examples, the existing line control system has a special equipment management conable machine to manage and monitor the target machine. In order to realize real-time collection of alarm information of each computing storage device, the real-time monitoring module 102 is connected to the existing monitoring system of each computing storage device, and the existing monitoring system of each computing storage device is fully utilized to obtain the alarm information.
When the real-time monitoring module 102 is implemented by a plurality of servers, in order to ensure the later lateral expansion capability of the servers, the plurality of servers are designed in advance according to a load balancing mode, and the plurality of servers can acquire alarm information of one computing storage device, so that the performance capacity pressure caused by the rapid increase of the number of the computing storage devices can be met, the acquisition waiting time is reduced, and the service interruption caused by a single machine without redundancy can be avoided.
In specific implementation, the monitoring and collecting module 103 analyzes the disordered alarm information collected by the real-time monitoring module 102 into unified and standardized display data. Meanwhile, a distinguishing mark (which may be an ip of the corresponding monitoring server in the real-time monitoring module 102) is added to each alarm message according to the different sources of the monitoring messages (e.g., through different monitoring servers of the real-time monitoring module 102), so as to distinguish what type of alarm message is the monitored device. The specific actions of monitoring the acquisition module 103 can be described as: screening according to the related fields, and analyzing equipment identification information and error-reporting related information (the equipment identification information can comprise IP (Internet protocol), equipment type and error-reporting related information can comprise error-reporting codes, error-reporting description information and the like) required for capturing log logs of the to-be-monitored computing storage equipment with alarm according to the alarm information; and the collected alarm information is roughly filtered primarily aiming at the most basic information and impurities, and the filtered information is the alarm information which is provided by a manufacturer and definitely has no big problems. Specifically, for the analysis of the ip information, for some machines (i.e. the above-mentioned computing storage devices to be monitored) are ip that need to directly analyze the service server, and for some machines, ip (such as HMC) that manages the console machine is determined according to the service ip in combination with the configuration management system. The type of the device can be determined based on a distinctive mark of the alarm information. The analysis of the error reporting codes can be combined with the type of an error reporting machine to carry out analysis and positioning; for error-reporting description information without codes, the key description fields need to be screened out.
In specific implementation, as shown in fig. 3, the data comparing device 002 includes:
a data comparison module 201, connected to the keyword collection module 104, configured to compare the analyzed device identifier information and the error report related information with pre-stored alarm history data, and determine whether each piece of alarm information belongs to an alarm condition that needs to be processed, where the pre-stored alarm history data is an alarm information that does not belong to an alarm condition that needs to be processed;
the result feedback module 202 is used for directly outputting and displaying the alarm information when the alarm information does not belong to the alarm condition needing to be processed so as to prompt a user that the alarm information does not need to be processed;
the scheme strategy making module 203 is used for selecting a script file corresponding to the alarm information from preset script files according to the equipment identification information and the error report related information corresponding to the alarm information when the alarm information belongs to the alarm condition needing to be processed, and generating a log capture file;
and the scheme parameter transmission module 205 is used for sending the log capture file to the to-be-monitored computing storage device which generates the alarm.
In specific implementation, the data comparison module 201 compares the analyzed device identifier information and the error report related information with the pre-stored alarm history data to determine whether each alarm message belongs to an alarm condition that needs to be processed. Specifically, the pre-stored alarm history data may be implemented in the form of an alarm history database, which may be periodically updated according to the type of the newly added machine and the newly added information such as the alarm that is determined to be negligible.
In specific implementation, when the alarm information is determined not to belong to the alarm condition that needs to be processed, the result feedback module 202 directly outputs and displays the alarm information to prompt a user (for example, a monitoring attendant) that the alarm information does not need to be processed and can be ignored.
In specific implementation, when the alarm information belongs to an alarm condition to be processed, the scheme policy making module 203 locates which type of device the alarm signal comes from according to the device identification information, and then determines which log collection method to adopt according to information such as an error code and error description information in the error reporting related information (for example, the identification of the error code may be according to prior knowledge, that is, what log needs to be collected is known in advance through the error code of a manufacturer, and the log collection method generally includes collecting BMC logs by out-of-band monitoring or collecting related logs by an operating system by in-band monitoring or triggering a log collection tool according to a device management console machine to realize a log collection function). Selecting a script file corresponding to the alarm information from preset script files according to the device identification information and the error reporting related information corresponding to the alarm information, for example, setting preset parameters to provide script files of various versions in advance according to different existing machine types and log collection modes, comparing the device identification information and the error reporting related information with the preset parameters, and selecting the script file corresponding to the alarm information from the preset script files. For example, the preset script file may include a script file for IBM high-end HMClinux system, a script file for windows disk drive controller, a script file for out-of-band monitoring of servers, and so on.
In specific implementation, when the scheme policy making module 203 generates the log capture file, the log capture file may include a selected script file, a log collection mode, and parameters required for triggering the selected script file, and may further include a login mode (for example, a user name and a password for logging in) for logging in the to-be-monitored computing storage device that issues an alarm.
In a specific implementation, the scheme parameter transmission module 205 may feed back the log capture file generated by the scheme policy making module 203 to a server running a service, and in order to ensure that the service is not affected, in this embodiment, the data comparison device 002 further includes: the file security detection module 204 is configured to perform security detection on the log capture file before the scheme parameter transmission module 205 sends the log capture file, that is, perform security detection on the log capture file feedback packet, so as to prevent malicious codes or virus files from invading a service machine.
In a specific implementation, as shown in fig. 4, the log generation device 003 includes:
the trigger mode selection module 301 is connected with the scheme parameter transmission module and is used for triggering the corresponding log collection mode according to the log capture file and the to-be-monitored computing storage device which gives an alarm;
a trigger execution module 302, configured to trigger the to-be-monitored computing storage device that generates an alarm to execute the selected script file by using the parameter in the log capture file, so as to capture a log of the to-be-monitored computing storage device that generates an alarm;
the log collection judging module 303 judges whether log logs are collected successfully by detecting whether a file packet generated on a corresponding date exists under a preset path and the size of the file packet;
and the log result feedback module 304 is used for storing the collected log and forwarding the log.
In specific implementation, the trigger mode selection module 301 selects a corresponding log collection mode according to the to-be-monitored computing storage device that is triggered to alarm by the log capture file, for example, it is determined whether the BMC log is collected by out-of-band monitoring, the related log is collected by an operating system by in-band monitoring, or a log collection tool is triggered according to the device management console to implement the log collection function.
In specific implementation, the log collection determining module 303 determines whether the log is successfully collected by detecting whether the file packet generated on the corresponding date exists in the preset path and the size of the file packet, for example, by simply detecting whether the file packet generated on the current date exists in the relevant path and the size of the file packet, it is preliminarily determined whether the log is successfully collected. If the collection is not successful, the return trigger mode selection module 301 and the trigger execution module 302 re-identify and execute the script collection action. If the log packet is collected successfully, the log packet is fed back to the log result feedback module 304 for storage.
In specific implementation, the log result feedback module 304 transmits the collected log to the alarm monitoring platform through a relevant protocol (such as FTP protocol) for on-site duty use; or one path is transmitted to the production internal network transfer machine for use by an outgoing second line engineer.
In specific implementation, for a data center for deploying important information, in view of security and uncertainty of data information, in this embodiment, as shown in fig. 1 and fig. 5, the monitoring system for computing storage devices further includes a data filtering device 004, where the data filtering device 004 includes:
the log security detection module 401 is configured to detect whether the captured log contains service information;
the log auditing and processing module 402 is used for confirming the integrity of the captured log, and selecting a forwarding object for the captured log according to the alarm condition corresponding to the alarm information;
and a log sending-out module 403, configured to provide an interface for downloading a complete log without service information.
In specific implementation, the log security detection module 401 mainly intelligently screens whether service information is mixed into a captured log packet, and particularly screens logs collected by in-band monitoring. The log auditing processing module 402 may further confirm the integrity of the log packet by human intervention and select different target forwarding objects according to actual situations. The log sending-out module 403 provides an interface for downloading complete log logs without service information, so as to transfer the captured log logs to a log analysis engineer through FTP or SMTP protocol.
The following describes the working process of the monitoring system of the computing storage device, for example, the computing storage device to be monitored is exemplified by an IBM high-end server:
(1) the data obtaining device 001 obtains the Hardware class alarm information from the ITDW database through a statement "select from report _ STATUS _ wherewithserviceerror [ - ] 6AND ComponentType ═ Hardware'".
(2) The data comparison device 002, through preset modules, for example:
{ "Hitachi storage failure" { configuration information 1}, "IBM P hardware failure" { "SRC! E ": configuration information 2} }
Matching an event which contains alarm information of a field that IBM P hardware has a fault and SRC is not Exxxxxxx, and calling a subsequent module;
configuration information 2: { ip: nodeip, "user name": hscroot "," password ":": manner of logging ": ssh", "port": 22 "," cmd ": ibmpseriesh", "path": var/log/hsc/iqyy
(3) The "configuration information 2" in the data comparing device 002 points to the log generating device 003 to execute the relevant script, and records the relevant ip address and other information. The log generation unit 003 is composed of a series of scripts, and retrieves information recorded by the data comparison unit 002 to generate a script to be executed actually.
The Ibmpseriesh content is exemplified as follows:
mysu-;
cd/var/log/hsc;
zip-r iqyy.zip./*;
after it is confirmed that iqyy.zip at the current date is generated, "configuration information 2" is pushed to the data filtering device 004;
(4) the data filtering device 004 is used for extracting the remote log file to the local according to the path information of the configuration information 2 continuously pushed by the log generating device 003, archiving the log file and storing the directory information in a warehouse;
mkdir nodeip+date;
ftp
open ip
user name "
Password "
bin
lcd local directory \ nodeip + date
get/var/log/hsc/iqyy.zip
bye
(5) An operator can access the log server in an ftp or http protocol mode to extract log data.
Based on the same inventive concept, embodiments of the present invention further provide a monitoring method for a computing storage device, as described in the following embodiments. Because the principle of the monitoring method for the computing storage device for solving the problem is similar to that of the monitoring system for the computing storage device, the implementation of the monitoring method for the computing storage device can be referred to the implementation of the monitoring system for the computing storage device, and repeated details are not repeated.
Fig. 6 is a flowchart of a monitoring method for a computing storage device according to an embodiment of the present invention, and as shown in fig. 6, the method includes:
step 601: acquiring alarm information generated by an existing monitoring system of each to-be-monitored computing storage device in real time, and analyzing device identification information and error report related information of the to-be-monitored computing storage device which gives an alarm according to the alarm information;
step 602: selecting a script file corresponding to the alarm information from preset script files according to the equipment identification information and the error reporting related information, and generating a log capture file, wherein the log capture file comprises the selected script file, a log collection mode and parameters required for triggering the selected script file;
step 603: and selecting a corresponding log collection mode according to the to-be-monitored computing storage device triggered to alarm by the log capture file, and executing the selected script file by the to-be-monitored computing storage device triggered to alarm by the parameter to capture the log of the to-be-monitored computing storage device triggered to alarm.
In one embodiment, the method includes the steps of acquiring alarm information generated by an existing monitoring system of each to-be-monitored computing storage device in real time, and analyzing device identification information and error report related information of the to-be-monitored computing storage device which generates an alarm according to the alarm information, and the method includes the steps of: acquiring alarm information generated by the existing monitoring system of each computing storage device to be monitored in real time; analyzing the alarm information into display data in a uniform format, adding a distinguishing mark on each alarm information according to the source of the alarm information, wherein the distinguishing mark is used for distinguishing the type of the to-be-monitored computing storage equipment which generates the alarm and corresponds to the alarm information, and analyzing the equipment identification information and the error reporting related information of the to-be-monitored computing storage equipment which generates the alarm according to the alarm information, wherein the equipment identification information comprises the equipment type and the IP address of the to-be-monitored computing storage equipment which generates the alarm, and the error reporting related information comprises an error reporting code and error reporting description information which generate the alarm; and storing the analyzed equipment identification information and the related error reporting information.
In one embodiment, selecting a script file corresponding to the alarm information from preset script files according to the device identification information and the error reporting related information, and generating a log capture file includes: comparing the analyzed equipment identification information and the error-reporting related information with prestored alarm historical data, and judging whether each piece of alarm information belongs to an alarm condition needing to be processed, wherein the prestored alarm historical data is the alarm information which does not belong to the alarm condition needing to be processed; when the alarm information does not belong to the alarm condition needing to be processed, the alarm information is directly output and displayed so as to prompt a user that the alarm information does not need to be processed; when the alarm information belongs to the alarm condition needing to be processed, selecting a script file corresponding to the alarm information from preset script files according to the equipment identification information and the error reporting related information corresponding to the alarm information, and generating a log capture file; and sending the log capture file to the to-be-monitored computing storage equipment which generates the alarm.
In one embodiment, further comprising: and before the log capture file is sent, carrying out security detection on the log capture file.
In one embodiment, selecting a corresponding log collection mode according to the to-be-monitored computing storage device triggered to generate the alarm by the log capture file, and executing the selected script file by the to-be-monitored computing storage device triggered to generate the alarm by the parameter to capture the log of the to-be-monitored computing storage device generating the alarm, includes: triggering the to-be-monitored computing storage equipment which gives an alarm according to the log capture file to select a corresponding log collection mode; triggering the to-be-monitored computing storage equipment which generates the alarm to execute the selected script file by adopting the parameters in the log capture file so as to capture a log of the to-be-monitored computing storage equipment which generates the alarm; judging whether log logs are collected successfully or not by detecting whether the file packets generated on corresponding dates exist under the preset path or not and the sizes of the file packets; and storing the collected log logs and forwarding the log logs.
In one embodiment, further comprising: detecting whether the captured log contains service information or not; confirming the integrity of the captured log, and selecting a forwarding object for the captured log according to the alarm condition corresponding to the alarm information; an interface is provided to download a complete log that does not contain traffic information.
In this embodiment, a computer device is further provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the monitoring method of the computing storage device described in any one of the above is implemented.
There is also provided in the present embodiment a computer-readable storage medium storing a computer program for executing the monitoring method of the computing storage device described in any one of the above.
In the embodiment of the invention, the alarm information of various types of computing and storing equipment in centralized monitoring can be collected by collecting the alarm information generated by the existing monitoring system of each computing and storing equipment to be monitored in real time, the equipment identification information and the error reporting related information of the computing and storing equipment to be monitored, which generates the alarm, are analyzed through the alarm information, the script file corresponding to the alarm information is selected according to the equipment identification information and the error reporting related information to generate the log capture file, and finally, the log of the computing and storing equipment to be monitored, which generates the alarm, is captured according to the log capture file. When the alarm happens to the computing storage device, the user needs to run into the machine room to collect alarm information of the server, the disk machine and the like on site, log logs of the computing storage device to be monitored, which generate the alarm, can be automatically captured according to the collected alarm information, and a front-line operation and maintenance worker can quickly react according to the output log logs so as to determine a final alarm processing scheme. By using the monitoring method, the manual participation degree can be greatly reduced, the hardware operation and maintenance standardization degree is improved, the human resource cost is reduced, the learning pressure of professionals is reduced, and the fault positioning efficiency is improved.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for monitoring a computing storage device, comprising:
acquiring alarm information generated by an existing monitoring system of each computing storage device to be monitored in real time, and analyzing device identification information and error report related information of the computing storage device to be monitored, which generates an alarm, according to the alarm information, wherein the device identification information comprises a device type and an IP address of the computing storage device to be monitored, which generates the alarm;
selecting a script file corresponding to the alarm information from preset script files according to the equipment identification information and the error reporting related information, and generating a log capture file, wherein the log capture file comprises the selected script file, a log collection mode and parameters required for triggering the selected script file;
and selecting a corresponding log collection mode according to the to-be-monitored computing storage device triggered to alarm by the log capture file, and executing the selected script file by the to-be-monitored computing storage device triggered to alarm by the parameter to capture the log of the to-be-monitored computing storage device triggered to alarm.
2. The method for monitoring the computing storage device according to claim 1, wherein the step of collecting alarm information generated by an existing monitoring system of each computing storage device to be monitored in real time, and analyzing device identification information and error reporting related information of the computing storage device to be monitored, in which an alarm occurs, includes:
acquiring alarm information generated by the existing monitoring system of each computing storage device to be monitored in real time;
analyzing the alarm information into display data in a uniform format, adding a distinguishing mark on each piece of alarm information according to the source of the alarm information, wherein the distinguishing mark is used for distinguishing the type of the to-be-monitored computing storage equipment which is corresponding to the alarm information and generates an alarm, and analyzing the equipment identification information and error reporting related information of the to-be-monitored computing storage equipment which generates the alarm according to the alarm information, wherein the error reporting related information comprises an error reporting code and error reporting description information which generate the alarm;
and storing the analyzed equipment identification information and the related error reporting information.
3. The method for monitoring a computing storage device according to claim 2, wherein selecting a script file corresponding to the alarm information from preset script files according to the device identification information and the error report related information, and generating a log capture file comprises:
comparing the analyzed equipment identification information and the error-reporting related information with prestored alarm historical data, and judging whether each piece of alarm information belongs to an alarm condition needing to be processed, wherein the prestored alarm historical data is the alarm information which does not belong to the alarm condition needing to be processed;
when the alarm information does not belong to the alarm condition needing to be processed, the alarm information is directly output and displayed so as to prompt a user that the alarm information does not need to be processed;
when the alarm information belongs to the alarm condition needing to be processed, selecting a script file corresponding to the alarm information from preset script files according to the equipment identification information and the error reporting related information corresponding to the alarm information, and generating a log capture file;
and sending the log capture file to the to-be-monitored computing storage equipment which generates the alarm.
4. The method of monitoring a computing storage device of claim 3, further comprising:
and before the log capture file is sent, carrying out security detection on the log capture file.
5. The method for monitoring the computing storage device according to claim 3, wherein the step of selecting a corresponding log collection mode according to the computing storage device to be monitored, which is triggered by the log capture file to generate an alarm, and executing the selected script file by using the computing storage device to be monitored, which is triggered by the parameter to generate the alarm, to capture the log of the computing storage device to be monitored, which generates the alarm, comprises:
triggering the to-be-monitored computing storage equipment which gives an alarm according to the log capture file to select a corresponding log collection mode;
triggering the to-be-monitored computing storage equipment which generates the alarm to execute the selected script file by adopting the parameters in the log capture file so as to capture a log of the to-be-monitored computing storage equipment which generates the alarm;
judging whether log logs are collected successfully or not by detecting whether the file packets generated on corresponding dates exist under the preset path or not and the sizes of the file packets;
and storing the collected log logs and forwarding the log logs.
6. The method of monitoring a computing storage device of any of claims 1 to 5, further comprising:
detecting whether the captured log contains service information or not;
confirming the integrity of the captured log, and selecting a forwarding object for the captured log according to the alarm condition corresponding to the alarm information;
an interface is provided to download a complete log that does not contain traffic information.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of monitoring a computing storage device according to any of claims 1 to 6 when executing the computer program.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program that executes the monitoring method of the computing storage device according to any one of claims 1 to 6.
9. A monitoring system for a computing storage device, comprising: a data acquisition device, a data comparison device and a log generation device, wherein,
the data acquisition device is connected with the existing monitoring system of each to-be-monitored computing storage device, and is used for acquiring alarm information generated by the existing monitoring system of each to-be-monitored computing storage device in real time, analyzing equipment identification information and error report related information of the to-be-monitored computing storage device which generates an alarm according to the alarm information, wherein the equipment identification information comprises the equipment type and the IP address of the to-be-monitored computing storage device which generates the alarm;
the data comparison device is used for selecting a script file corresponding to the alarm information from preset script files according to the equipment identification information and the error report related information to generate a log capture file, wherein the log capture file comprises the selected script file, a log collection mode and parameters required by triggering the selected script file;
and the log generating device is used for selecting a corresponding log collecting mode according to the to-be-monitored computing storage equipment which is triggered to alarm by the log capturing file, and executing the selected script file by adopting the to-be-monitored computing storage equipment which is triggered to alarm by the parameter to capture the log of the to-be-monitored computing storage equipment which is triggered to alarm.
10. The monitoring system of a computing storage device of claim 9, further comprising a data filtering apparatus, the data filtering apparatus comprising:
the log security detection module is used for detecting whether the captured log contains service information or not;
the log auditing and processing module is used for confirming the integrity of the captured log and selecting a forwarding object for the captured log according to the alarm condition corresponding to the alarm information;
and the log sending-out module is used for providing an interface for downloading a complete log which does not contain service information.
CN201710763344.1A 2017-08-30 2017-08-30 Monitoring system and method for computing storage equipment Active CN107632918B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710763344.1A CN107632918B (en) 2017-08-30 2017-08-30 Monitoring system and method for computing storage equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710763344.1A CN107632918B (en) 2017-08-30 2017-08-30 Monitoring system and method for computing storage equipment

Publications (2)

Publication Number Publication Date
CN107632918A CN107632918A (en) 2018-01-26
CN107632918B true CN107632918B (en) 2020-09-11

Family

ID=61100829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710763344.1A Active CN107632918B (en) 2017-08-30 2017-08-30 Monitoring system and method for computing storage equipment

Country Status (1)

Country Link
CN (1) CN107632918B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108737182A (en) * 2018-05-22 2018-11-02 平安科技(深圳)有限公司 The processing method and system of system exception
CN108897665B (en) * 2018-06-29 2021-06-15 平安科技(深圳)有限公司 Log management method and device, computer equipment and storage medium
CN111756778B (en) * 2019-03-26 2024-06-18 京东科技控股股份有限公司 Method, device and storage medium for pushing server disk cleaning script
CN110377569B (en) * 2019-06-19 2023-07-28 中国平安人寿保险股份有限公司 Log monitoring method, device, computer equipment and storage medium
CN110601879B (en) * 2019-08-30 2022-11-08 深圳壹账通智能科技有限公司 Method and device for forming Zabbix alarm process information and storage medium
CN110990214B (en) * 2019-10-31 2022-12-06 苏州浪潮智能科技有限公司 Method, system and equipment for capturing memory card logs through BMC
CN110908885B (en) * 2019-11-21 2022-08-05 苏州浪潮智能科技有限公司 Log collection method and device and related components
CN111258813A (en) * 2020-01-13 2020-06-09 北京点众科技股份有限公司 Method and equipment for automatically recovering report data
CN113448795B (en) * 2020-03-26 2024-06-28 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for obtaining system diagnostic information
CN111597095A (en) * 2020-05-20 2020-08-28 中国工商银行股份有限公司 Monitoring method, monitoring device, electronic apparatus, and medium
CN112905410B (en) * 2021-01-19 2021-11-30 中国人民解放军32039部队 Equipment state monitoring system and method
US11442733B2 (en) * 2021-01-29 2022-09-13 Seagate Technology Llc Embedded computation instruction performance profiling
CN113872793A (en) * 2021-08-11 2021-12-31 深兰科技(上海)有限公司 Method, device, medium and equipment for remotely capturing log

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1488115A (en) * 2001-01-26 2004-04-07 布莱迪卡姆公司 System for providing services and virtual programming interface
CN102158462A (en) * 2010-02-11 2011-08-17 希姆通信息技术(上海)有限公司 Method for repairing remote diagnosis by using 2nd Generation (2G) or 3rd Generation (3G) module
CN105227351A (en) * 2015-09-01 2016-01-06 上海斐讯数据通信技术有限公司 Log acquisition system, journal obtaining method and electronic equipment
CN105791417A (en) * 2016-04-13 2016-07-20 北京思特奇信息技术股份有限公司 Intelligent disposition and process monitoring system and method based on cloud management platform
CN106095682A (en) * 2016-06-15 2016-11-09 浪潮软件集团有限公司 Android application stability test method for simulating complex network
JP2017117063A (en) * 2015-12-22 2017-06-29 日本電気株式会社 Radio base station maintenance device, radio base station maintenance system, radio base station maintenance method, and radio base station maintenance program

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9612827B2 (en) * 2015-06-11 2017-04-04 International Business Machines Corporation Automatically complete a specific software task using hidden tags

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1488115A (en) * 2001-01-26 2004-04-07 布莱迪卡姆公司 System for providing services and virtual programming interface
CN102158462A (en) * 2010-02-11 2011-08-17 希姆通信息技术(上海)有限公司 Method for repairing remote diagnosis by using 2nd Generation (2G) or 3rd Generation (3G) module
CN105227351A (en) * 2015-09-01 2016-01-06 上海斐讯数据通信技术有限公司 Log acquisition system, journal obtaining method and electronic equipment
JP2017117063A (en) * 2015-12-22 2017-06-29 日本電気株式会社 Radio base station maintenance device, radio base station maintenance system, radio base station maintenance method, and radio base station maintenance program
CN105791417A (en) * 2016-04-13 2016-07-20 北京思特奇信息技术股份有限公司 Intelligent disposition and process monitoring system and method based on cloud management platform
CN106095682A (en) * 2016-06-15 2016-11-09 浪潮软件集团有限公司 Android application stability test method for simulating complex network

Also Published As

Publication number Publication date
CN107632918A (en) 2018-01-26

Similar Documents

Publication Publication Date Title
CN107632918B (en) Monitoring system and method for computing storage equipment
US7389341B2 (en) Remotely monitoring a data processing system via a communications network
US8176137B2 (en) Remotely managing a data processing system via a communications network
CN101197621B (en) Method and system for remote diagnosing and locating failure of network management system
CN101877618B (en) Monitoring method, server and system based on proxy-free mode
JP2004021549A (en) Network monitoring system and program
CN110535710A (en) Remote diagnosis method and system, the network equipment and Cloud Server of the network equipment
CA2636753A1 (en) Methods and apparatus for monitoring software systems
US10341182B2 (en) Method and system for detecting network upgrades
CN111078490B (en) Server security assurance method and system based on operating system monitoring analysis
CN113553242A (en) Coal mine networking system fault handling method and system based on Zabbix
CN111052087B (en) Control system, information processing apparatus, and recording medium
CN110971464A (en) Operation and maintenance automatic system suitable for disaster recovery center
CN114143160A (en) Cloud platform automation operation and maintenance system
US8402125B2 (en) Method of managing operations for administration, maintenance and operational upkeep, management entity and corresponding computer program product
CN105007278A (en) Automatic real-time acquisition system and acquisition method for network safety log
CN111597095A (en) Monitoring method, monitoring device, electronic apparatus, and medium
CN112804190B (en) Security event detection method and system based on boundary firewall flow
CN115687036A (en) Log collection method and device and log system
CN117349121A (en) Communication monitoring method and system for computer software development
CN112540894A (en) User initiated runtime state detection method
CN116633769A (en) Switch log management system based on ELK
CN115695489A (en) Equipment fault diagnosis method, device and system
CN114679397A (en) Fault analysis system and method of embedded equipment
CN115632833A (en) Network security processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20201229

Address after: 100140, 55, Fuxing Avenue, Xicheng District, Beijing

Patentee after: INDUSTRIAL AND COMMERCIAL BANK OF CHINA

Patentee after: ICBC Technology Co.,Ltd.

Address before: 100140, 55, Fuxing Avenue, Xicheng District, Beijing

Patentee before: INDUSTRIAL AND COMMERCIAL BANK OF CHINA

TR01 Transfer of patent right