[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN103905253A - Server monitoring and management method based on Nagios and BMC - Google Patents

Server monitoring and management method based on Nagios and BMC Download PDF

Info

Publication number
CN103905253A
CN103905253A CN201410134635.0A CN201410134635A CN103905253A CN 103905253 A CN103905253 A CN 103905253A CN 201410134635 A CN201410134635 A CN 201410134635A CN 103905253 A CN103905253 A CN 103905253A
Authority
CN
China
Prior art keywords
nagios
bmc
server
management
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410134635.0A
Other languages
Chinese (zh)
Other versions
CN103905253B (en
Inventor
陈刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IEIT Systems Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201410134635.0A priority Critical patent/CN103905253B/en
Publication of CN103905253A publication Critical patent/CN103905253A/en
Application granted granted Critical
Publication of CN103905253B publication Critical patent/CN103905253B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明提出一种基于Nagios和BMC的服务器监控管理方法,实现了Nagios对服务器带外监控信息的交互访问,扩展服务器监控管理的路径,提高了服务器的可用性。本发明主要包括:服务器节点安装及配置Nagios、面向BMC的Nagios扩展插件实现、基于IPMI协议编写python脚本实现、Nagios和BMC间硬件通信接口实现、服务器远程监控管理客户端等部分。通过Nagios工具与BMC控制器的信息交互,在原有的基于Nagios的带内监控管理系统基础上扩展了面向BMC功能插件,在原有基于BMC的带外监控管理系统基础上增加了面向Nagios的信息获取,通过两系统的关键信息交互提高服务器系统可用性。

The invention proposes a server monitoring and management method based on Nagios and BMC, which realizes Nagios' interactive access to server out-of-band monitoring information, expands the path of server monitoring and management, and improves the usability of the server. The invention mainly includes: server node installation and Nagios configuration, BMC-oriented Nagios extension plug-in implementation, IPMI protocol-based python script implementation, hardware communication interface implementation between Nagios and BMC, server remote monitoring management client and other parts. Through the information interaction between the Nagios tool and the BMC controller, the BMC-oriented function plug-in is expanded on the basis of the original Nagios-based in-band monitoring and management system, and the Nagios-oriented information acquisition is added on the basis of the original BMC-based out-of-band monitoring and management system , improve the availability of the server system through the key information interaction between the two systems.

Description

一种基于Nagios和BMC的服务器监控管理方法A server monitoring and management method based on Nagios and BMC

  the

技术领域   technical field

    本发明设计一种服务器技术,具体地说是一种基于Nagios和BMC的服务器监控管理方法。     The invention designs a server technology, specifically a server monitoring and management method based on Nagios and BMC.

背景技术 Background technique

 当今社会的各个方面服务器都发挥着重要的作用,无论是国防、科技、金融保险,还是银行、能源、政府企业,几乎所有方面都有服务器的存在。为保障上述各项工作的稳定运行,如果实时有效的对服务器进行监控管理工作就成了关键的前提。对于服务器监控的工具很多种,诸如开源工具Ganglia、Nagios、Zabbix,及专业工具Pingdom 、interSeptorPro、Nimsoft等,这些工具可以监控到服务器方方面面的信息,从运行时间、性能到安全,甚至到服务器所处的物理环境指数。Ganglia用于测量数以千计的服务器节点,提供系统静态数据以及重要的性能度量数据,尤其适用于云计算系统;Nagios是一种服务器级和网络监控程序,它检测主机和服务,当异常发生和解除时能提醒用户。Zabbix是一个基于WEB界面的提供分布式系统监视以及网络监视功能源解决方案;Pingdom可以监测运行时间和整体性能,并生成便于阅读的表格和图表,interSeptor是以太网数据中心和机架监控系统,它可以监控机房和机架的环境状况,一旦出现系统故障或可能危及业务连续性的其他情况就会发出预警;NMS可以监控服务器的核心资源,能够集中管理远程进程和服务。基本上通过这些工具可以了解到服务器可能会出现什么问题,并在问题出现之前解决它们。单一的开源工具功能有限,需要多种工具组合才能发挥有效的作用,而且无法得到专业的技术支持,但可以免费获得,而企业级专业工具功能强大并配有专业技术支持,但是价格却很昂贵。 Servers play an important role in all aspects of today's society. Whether it is national defense, technology, finance and insurance, or banking, energy, and government enterprises, servers exist in almost all aspects. In order to ensure the stable operation of the above-mentioned tasks, it is a key premise to monitor and manage the server effectively in real time. There are many tools for server monitoring, such as open source tools Ganglia, Nagios, Zabbix, and professional tools Pingdom, interSeptorPro, Nimsoft, etc. These tools can monitor all aspects of server information, from running time, performance to security, and even where the server is located. physical environment index. Ganglia is used to measure thousands of server nodes, providing system static data and important performance measurement data, especially for cloud computing systems; Nagios is a server-level and network monitoring program that detects hosts and services, when abnormalities occur And the user can be reminded when it is dismissed. Zabbix is a source solution based on WEB interface that provides distributed system monitoring and network monitoring functions; Pingdom can monitor running time and overall performance, and generate easy-to-read tables and charts; interSeptor is an Ethernet data center and rack monitoring system, It can monitor the environmental conditions of computer rooms and racks, and issue early warnings in case of system failures or other situations that may endanger business continuity; NMS can monitor the core resources of servers and can centrally manage remote processes and services. Basically these tools allow you to see what problems might be going on with your server and fix them before they arise. A single open source tool has limited functions, requires a combination of multiple tools to play an effective role, and cannot get professional technical support, but can be obtained for free, while enterprise-level professional tools are powerful and equipped with professional technical support, but the price is very expensive .

 近年来采用开源工具对服务器系统进行监控和维护的就成了主流的解决方案,但由于这些开源监控工具各有特点和侧重点,在实际监控及维护过程很难将所有的工具都安装,即使能够安装系统管理人员也很难有精力处理所有工具的工作情况,这样就给监控管理带来了很大的问题。因此,如何既能有效的统合开源工具解决方案,简化监控管理流程,又能全面准确地对服务器系统乃至服务器集群系统场合下服务器进行监控管理就成了亟需解决的问题。 In recent years, using open source tools to monitor and maintain server systems has become a mainstream solution. However, since these open source monitoring tools have their own characteristics and focuses, it is difficult to install all the tools in the actual monitoring and maintenance process. It is also difficult for the administrators who can install the system to have the energy to deal with the working conditions of all the tools, which brings great problems to the monitoring and management. Therefore, how to effectively integrate open source tool solutions, simplify the monitoring and management process, and comprehensively and accurately monitor and manage servers in server systems and even server cluster systems has become an urgent problem to be solved.

发明内容 Contents of the invention

 针对如何既能有效的统合开源工具解决方案,简化监控管理流程,又能全面准确地对服务器系统进行监控管理,本发明提出了一种基于Nagios和BMC的服务器监控管理方法。 Aiming at how to effectively integrate open source tool solutions, simplify the monitoring and management process, and comprehensively and accurately monitor and manage the server system, the present invention proposes a server monitoring and management method based on Nagios and BMC.

 本发明包括:服务器节点安装及配置Nagios、面向BMC的Nagios扩展插件实现、基于IPMI协议的编写python脚本实现、Nagios和BMC间硬件通信接口实现、服务器远程监控管理客户端等部分。 The invention includes: server node installation and Nagios configuration, BMC-oriented Nagios extension plug-in realization, IPMI protocol-based python script realization, hardware communication interface realization between Nagios and BMC, server remote monitoring management client and other parts.

 其中,服务器节点安装及配置Nagios,需要在管理和被管理服务器节点上都安装并进行必要的配置,使Nagios能正常工作,即管理节点能通过Nagios获得被管理节点的监控管理信息; Among them, the installation and configuration of Nagios on the server node needs to be installed and configured on both the management and managed server nodes, so that Nagios can work normally, that is, the management node can obtain the monitoring and management information of the managed node through Nagios;

 其中,面向BMC的Nagios扩展插件,主要目的是在Nagios和IPMI规范间建立交互接口文件,Nagios官网可以下载基于C语言编写nagios_ipmi_monitor扩展插件,本发明利用跨平台性更好的脚本语言python实现,并且在原有基础上增加了面向BMC获取Nagios监控信息的接口文件。 Among them, the Nagios extension plug-in facing BMC, the main purpose is to establish an interactive interface file between Nagios and the IPMI specification, and the Nagios official website can download the nagios_ipmi_monitor extension plug-in based on C language writing, the present invention utilizes the better cross-platform scripting language python to achieve, and On the original basis, the interface file for obtaining Nagios monitoring information for BMC is added.

1.         Nagios获取BMC信息用monitor_nagios_to_bmc_plugins插件组主要包括电源、风扇、电压、温度传感器状态获取,以及能耗管理、日志信息导出等操作; 1. The monitor_nagios_to_bmc_plugins plug-in group for Nagios to obtain BMC information mainly includes power supply, fan, voltage, temperature sensor status acquisition, energy management, log information export and other operations;

monitor_bmc_fan.py、monitor_bmc_temp.py、 monitor_bmc_fan.py, monitor_bmc_temp.py,

monitor_bmc_psu.py、monitor_bmc_psu_control.py、 monitor_bmc_psu.py, monitor_bmc_psu_control.py,

monitor_bmc_voltage.py、monitor_bmc_logs.py; monitor_bmc_voltage.py, monitor_bmc_logs.py;

2.         BMC获取Nagios信息用get_nagios_to_bmc_plugins插件组主要包括CPU利用率、内存使用率、硬盘利用率、进程状态、网络性能指标等状态信息: 2. The get_nagios_to_bmc_plugins plug-in group for BMC to obtain Nagios information mainly includes status information such as CPU utilization, memory utilization, hard disk utilization, process status, and network performance indicators:

get_nagios_cpu_usgage.py、get_nagios_mem_usgage.py、 get_nagios_cpu_usgage.py, get_nagios_mem_usgage.py,

get_nagios_ps_status.py、get_nagios_hdd_status.py、 get_nagios_ps_status.py, get_nagios_hdd_status.py,

get_nagios_net_status.py; get_nagios_net_status.py;

其中,Nagios和BMC间硬件通信接口是基于LPC协议的KCS接口,适用于Intel X86 平台的CPU; Among them, the hardware communication interface between Nagios and BMC is the KCS interface based on the LPC protocol, which is suitable for the CPU of the Intel X86 platform;

其中,服务器远程监控管理客户端是安装在BMC的Webserver中,用户通过WebBrowser可以登录访问; Among them, the server remote monitoring and management client is installed in the Webserver of the BMC, and users can log in and access it through the WebBrowser;

其中,服务器远程监控管理客户端部署在BMC端,其具备基本的带外监控管理功能,即资产信息检测、远程控制维护、系统日志信息、事件告警等,另外基于Nagios提供的关键带内信息,增加了带内信息监控; Among them, the server remote monitoring management client is deployed on the BMC side, which has basic out-of-band monitoring and management functions, namely asset information detection, remote control maintenance, system log information, event alarms, etc. In addition, based on the key in-band information provided by Nagios, Added in-band information monitoring;

其中,本发明通过Nagios工具与BMC控制器的信息交互,在原有的基于Nagios的带内监控管理系统基础上扩展了面向BMC功能插件,在原有基于BMC的带外监控管理系统基础上增加了面向Nagios的信息获取;  Among them, the present invention expands the BMC-oriented function plug-in on the basis of the original Nagios-based in-band monitoring and management system based on the information interaction between the Nagios tool and the BMC controller, and adds a oriented Nagios information acquisition;

本发明的有益效果是: The beneficial effects of the present invention are:

    无需在客户服务器端安装特定的Agent端,只需要在已经客户已经安装的Nagios监控管理工具上安装面向BMC的基于标准IPMI协议的扩展插件,这样即保障了客户信息安全性也降低了系统维护的难度。 There is no need to install a specific Agent on the client server, but only need to install the BMC-oriented extension plug-in based on the standard IPMI protocol on the Nagios monitoring management tool that has already been installed by the client, which not only ensures the security of client information but also reduces the cost of system maintenance. difficulty.

    使客户在原有的监控管理方法上增加了信息互备,无论从带内系统还是从带外系统都可以获得系统关键信息,增加了系统运行的安全性和可用性。  It enables customers to add information mutual backup to the original monitoring and management method, and can obtain key system information no matter from the in-band system or the out-of-band system, which increases the security and availability of system operation.

既能有效的统合开源工具解决方案,简化监控管理流程,又能全面准确地对服务器系统乃系统场合下服务器进行监控管理。 It can not only effectively integrate open source tool solutions, simplify the monitoring and management process, but also comprehensively and accurately monitor and manage the server system and the server in the system environment.

附图说明 Description of drawings

图1是根据本发明的基于Nagios和BMC的服务器监控管理架构图; Fig. 1 is a server monitoring and management architecture diagram based on Nagios and BMC according to the present invention;

图2是根据本发明的服务器监控管理实施流程图。 Fig. 2 is a flow chart of implementing server monitoring and management according to the present invention.

具体实施方式 Detailed ways

以下结合附图对本发明的实施例进行说明,应当理解,以此所描述的实施例仅用于说明和理解本发明,并不用于限定本发明。 Embodiments of the present invention will be described below in conjunction with the accompanying drawings. It should be understood that the described embodiments are only used to illustrate and understand the present invention, and are not intended to limit the present invention.

   图1:是根据本发明的基于Nagios和BMC的服务器监控管理架构图。主要包括带内监控管理工具、操作系统OS层、硬件平台层、BMC层、以及带外监控管理工具,如图1所示,具体工作过程描述如下: Figure 1: It is a structure diagram of server monitoring and management based on Nagios and BMC according to the present invention. It mainly includes in-band monitoring and management tools, operating system OS layer, hardware platform layer, BMC layer, and out-of-band monitoring and management tools, as shown in Figure 1. The specific working process is described as follows:

1)              带内监控管理工具,可以是Nagios服务端,具体可以安装在中心监控服务器端,实时获取有Nagios客户端发送来的报警信息; 1) The in-band monitoring management tool can be the Nagios server, which can be installed on the central monitoring server to obtain the alarm information sent by the Nagios client in real time;

2)              在被管理节点的本地操作系统同上安装Nagios; 2) Install Nagios on the local operating system of the managed node;

3)              在安装Nagios客户端的基础上配置Nagios的扩展插件,即Nagios获取BMC信息用monitor_nagios_to_bmc_plugins插件组和BMC获取Nagios信息用get_nagios_to_bmc_plugins插件组,其中插件的具体实现包括以下两部分,以monitor_bmc_psu.py为例; 3) Configure the Nagios extension plug-in on the basis of installing the Nagios client, that is, use the monitor_nagios_to_bmc_plugins plug-in group for Nagios to obtain BMC information and the get_nagios_to_bmc_plugins plug-in group for BMC to obtain Nagios information. The specific implementation of the plug-in includes the following two parts, taking monitor_bmc_psu.py as an example ;

1.     Nagios配置文件 1. Nagios configuration file

Define service { Define service {

   host name              hostname host name hostname

   service_description   power supply unit service_description power supply unit

   privilege             generic-service  privilege                        

   command       monitor_bmc_pus!192.168.1.99!root!superuser command monitor_bmc_pus!192.168.1.99!root!superuser

} }

2.     具体python代码 2. Specific python code

#!/usr/bin/python #!/usr/bin/python

from os import path,system from os import path,system

import sys,getpass import sys, getpass

monitor_bmc_psu="/usr/lib/nagios/plugins/monitor_bmc_psu" monitor_bmc_psu="/usr/lib/nagios/plugins/monitor_bmc_psu"

#logging's usage:logger.error(message),logger.info(message) #logging's usage: logger.error(message),logger.info(message)

def initlog(): def initlog():

   import logging import logging

   logger = logging.getLogger() logger = logging. getLogger()

   hdlr = logging.FileHandler(logfile) hdlr = logging. FileHandler(logfile)

   formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s') formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')

    hdlr.setFormatter(formatter) hdlr. setFormatter(formatter)

    logger.addHandler(hdlr) logger. addHandler(hdlr)

    logger.setLevel(logging.NOTSET)   logger.setLevel(logging.NOTSET)

    return logger return logger

logfile=__file__.replace(r'.py','.log') logfile=__file__.replace(r'.py','.log')

logger=initlog() logger = initlog()

#command execution #command execution

command_list=["psu_status","psu_poweron","psu_poweroff"] command_list=["psu_status","psu_poweron","psu_poweroff"]

def status(ip,password): def status(ip,password):

    try: try:

        system("echo ipmitool -I lan -H %s -U '%s' -P '%s' power status "%(ip,username,password)) system("echo ipmitool -I lan -H %s -U '%s' -P '%s' power status "%(ip,username,password))

    except OSError,error: except OSError, error:

        logging.error(error) logging. error(error)

def start(ip,password): def start(ip,password):

    try: try:

        system("echo ipmitool -I lan -H %s -U '%s' -P '%s' power on "%(ip,username,password)) system("echo ipmitool -I lan -H %s -U '%s' -P '%s' power on "%(ip,username,password))

    except OSError,error: except OSError, error:

        logging.error(error) logging. error(error)

def stop(ip,password): def stop(ip,password):

    try: try:

        system("echo ipmitool -I lan -H %s -U '%s' -P '%s' power off "%(ip,username,password)) system("echo ipmitool -I lan -H %s -U '%s' -P '%s' power off "%(ip,username,password))

    except OSError,error: except OSError, error:

        logging.error(error) logging. error(error)

1)              本例中配置BMC的KCS接口,并在Linux中安装KCS driver; 1) In this example, configure the KCS interface of the BMC and install the KCS driver in Linux;

2)              BMC的固件程序中主要支持IPMI stack、Web Server、模块驱动层以及接口层组成 2) The firmware program of BMC mainly supports the composition of IPMI stack, Web Server, module driver layer and interface layer

3)              BMC通过SMBus总线获取硬件资产信息; 3) BMC obtains hardware asset information through SMBus bus;

4)              Nagios工具核心模块即可获得硬件资产工作信息; 4) The core module of the Nagios tool can obtain the hardware asset work information;

5)              带外监控管理工具,主要是通过WebBrowser登录BMC Web Server获取监控管理信息; 5) Out-of-band monitoring and management tools, mainly through WebBrowser to log in to BMC Web Server to obtain monitoring and management information;

图2:是根据本发明的服务器监控管理实施流程图。如图2所示,具体实施步骤如下: Fig. 2: It is the implementation flowchart of server monitoring and management according to the present invention. As shown in Figure 2, the specific implementation steps are as follows:

1)              步骤1:在被管理节点的本地操作系统同上安装Nagios; 1) Step 1: Install Nagios on the local operating system of the managed node;

2)              步骤2:在安装Nagios客户端的基础上配置Nagios的扩展插件; 2) Step 2: Configure Nagios extensions on the basis of installing the Nagios client;

3)              步骤3:BMC配置KCS接口; 3) Step 3: BMC configures KCS interface;

4)              步骤4:通过KCS接口给BMC发送命令; 4) Step 4: Send commands to BMC through the KCS interface;

5)              步骤5:BMC端接收并解析带内发送的命令; 5) Step 5: The BMC side receives and parses the commands sent in-band;

6)              步骤6:判断是何种类型命令? 6) Step 6: Determine what type of command it is?

7)              步骤7:如果是执行指令To BMC类型,则执行相应IPMI Command获取硬件工作状态等信息; 7) Step 7: If it is the execution command To BMC type, then execute the corresponding IPMI Command to obtain hardware working status and other information;

8)              步骤8:通过KCS接口将信息发送到带内系统; 8) Step 8: Send the information to the in-band system through the KCS interface;

9)              步骤9:Nagios plugins解析返回信息; 9) Step 9: Nagios plugins parse and return information;

10)         步骤10:在本地监控端显示带外信息; 10) Step 10: Display out-of-band information on the local monitoring terminal;

11)         步骤11:如果是分析指令From Nagios类型,则分析信息并判断是是否需要产生报警信息; 11) Step 11: If it is an analysis command From Nagios type, analyze the information and judge whether it is necessary to generate an alarm message;

12)         步骤12:如果需要报警,则通过SNMP Trap形式向远程监控管理端发送报警信息; 12) Step 12: If an alarm is required, send an alarm message to the remote monitoring management terminal in the form of SNMP Trap;

 利用本实施例的基于Nagios和BMC的服务器监控管理方法,可以既能有效的统合开源工具解决方案,简化服务器系统监控管理流程,又能通过设计的带内和带外监控管理信息交互方法,全面准确地对服务器系统实时工作状态,增强服务系统的可用性和使用安全性。除此之外,本发明也涉及Python脚本语言扩展Nagios插件的方法,对Python脚本进行抽象设计可更好的实现跨平台实现Nagios扩展的有效性。 Using the server monitoring and management method based on Nagios and BMC in this embodiment, it can not only effectively integrate open source tool solutions, simplify the server system monitoring and management process, but also through the designed in-band and out-of-band monitoring and management information interaction methods, comprehensively Accurately monitor the real-time working status of the server system to enhance the availability and safety of the service system. In addition, the present invention also relates to a method for extending the Nagios plug-in in the Python script language, and the abstract design of the Python script can better realize the effectiveness of cross-platform implementation of the Nagios extension.

Claims (8)

1.一种基于Nagios和BMC的服务器监控管理方法,其特征在于包括: 1. A server monitoring and management method based on Nagios and BMC, characterized in that it comprises: 基于Nagios和BMC的服务器监控管理架构及接口设计; Server monitoring management architecture and interface design based on Nagios and BMC; 面向BMC的Nagios扩展插件实现方法; Implementation method of Nagios extension plug-in for BMC; 基于IPMI协议的python命令脚本内容; The content of the python command script based on the IPMI protocol; 服务器远程监控管理客户端的特征。 Features of server remote monitoring and management client. 2.根据权利要求1所述的基于Nagios和BMC的服务器监控管理方法,其特征在于,要点1)中在所述管理架构及接口设计,架构包括服务器、交互接口层、基板控制器BMC及远程管理客户端。 2. The server monitoring and management method based on Nagios and BMC according to claim 1, characterized in that, in point 1), in the management architecture and interface design, the architecture includes a server, an interactive interface layer, a baseboard controller BMC, and a remote Manage clients. 3.根据权利要求1所述的基于Nagios和BMC的服务器监控管理方法,其特征在于,要点1)中在所述管理架构及接口设计,其带内与带外管理系统的交互接口层可包括但不限于I2C总线接口、基于LPC协议的KCS接口、CPU资源共享等,本发明使用LPC的KCS接口。 3. The server monitoring and management method based on Nagios and BMC according to claim 1, characterized in that, in point 1), in the management architecture and interface design, the interactive interface layer of the in-band and out-of-band management systems may include But not limited to I2C bus interface, KCS interface based on LPC protocol, CPU resource sharing, etc., the present invention uses the KCS interface of LPC. 4.根据权利要求1所述的基于Nagios和BMC的服务器监控管理方法,其特征在于,要点2)中在所述面向BMC的Nagios扩展插件实现方法,Nagios支持通过perl、shell、python及PHP等语言编写插件来扩展监控服务,面向BMC的插件扩展即面向标准IPMI协议的功能扩展,主要包括标准IPMI命令及第三方OEM的IPMI命令,本发明采用Python脚本语言。 4. The server monitoring and management method based on Nagios and BMC according to claim 1, characterized in that, in point 2), in the implementation method of the BMC-oriented Nagios extension plug-in, Nagios supports perl, shell, python and PHP, etc. The plug-in is written in language to expand the monitoring service. The plug-in extension for BMC is the function extension for the standard IPMI protocol, mainly including standard IPMI commands and third-party OEM IPMI commands. The present invention uses Python scripting language. 5.根据权利要求4所述的基于Nagios和BMC的服务器监控管理方法,其特征在于,所述BMC至少支持ipmitool、openIPMI等一种工具和至少支持perl、shell、python及PHP等语言一种。 5. the server monitoring management method based on Nagios and BMC according to claim 4, is characterized in that, described BMC at least supports a kind of tool such as ipmitool, openIPMI and at least supports a kind of languages such as perl, shell, python and PHP. 6.根据权利要求1所述的基于Nagios和BMC的服务器监控管理方法,其特征在于,要点3)中在所述IPMI协议的python命令脚本内容,即服务器端python命令脚本,主要完成系统带内及带外监控管理信息的交互,其包括: 6. The server monitoring and management method based on Nagios and BMC according to claim 1, characterized in that, in point 3), the python command script content of the IPMI protocol, that is, the server-side python command script, mainly completes the system in-band and out-of-band monitoring and management information interaction, including: (1)Nagios获取BMC信息:Nagios通过脚本解析后向BMC发送IPMI命令获取服务器风扇转速信息、电源工作状态、温度及电压等传感器信息、系统事件日志信息、RAID控制器信息等,并将接收的信息再通过python命令脚本解析后用于Nagios; (1) Nagios obtains BMC information: Nagios sends IPMI commands to BMC after parsing scripts to obtain server fan speed information, power supply status, sensor information such as temperature and voltage, system event log information, RAID controller information, etc., and sends the received The information is then parsed by the python command script and used for Nagios; (2)BMC获取Nagios信息:Nagios将系统CPU及内存使用率、磁盘利用率、系统进程工作状态等信息,通过脚本解析成IPMI命令后发送到BMC,BMC再处理该信息并发送到客户端。 (2) BMC obtains Nagios information: Nagios parses information such as system CPU and memory usage, disk utilization, and system process working status into IPMI commands through scripts and sends them to BMC, which then processes the information and sends it to the client. 7.根据权利要求1所述的基于Nagios和BMC的服务器监控管理方法,其特征在于,要点4)中在所述服务器远程监控管理客户端的特征,主要包括: 7. The server monitoring and management method based on Nagios and BMC according to claim 1, characterized in that, in point 4), the features of the remote monitoring and management client of the server mainly include: (1)客户端支持WEB访问方式; (1) The client supports WEB access mode; (2)客户端的宿主端是BMC的WebServer服务器; (2) The host end of the client is the WebServer server of BMC; (3)客户端支持标准IPMI协议、SNMP协议等; (3) The client supports standard IPMI protocol, SNMP protocol, etc.; (4)客户端分为接口层、解析层以及存储层等; (4) The client is divided into interface layer, analysis layer and storage layer, etc.; 接口层负责提供与带内Nagios的通信接口,并接受Nagios发送的命令 The interface layer is responsible for providing a communication interface with in-band Nagios and accepting commands sent by Nagios 解析层负责将解析并执行Nagios发送的IPMI命令 The parsing layer is responsible for parsing and executing the IPMI commands sent by Nagios 存储层负责将IPMI命令的解析或执行结果保存到系统缓存中 The storage layer is responsible for saving the parsing or execution results of IPMI commands into the system cache (5)客户端适用于多种不同的宿主端类型: (5) The client is suitable for a variety of different host types: 机架客户端:机架式服务器BMC端 Rack client: rack server BMC end 管理客户端:两层管理架构,即包括管理节点和被管理节点的BMC端 Management client: two-tier management architecture, including the BMC side of the management node and the managed node 中心客户端:服务器集群或云计算等三级或多级管理架构,即中心管理节点、分管理节点、被管理节点等的BMC端。 Central client: a three-level or multi-level management architecture such as server clusters or cloud computing, that is, the BMC terminal of the central management node, sub-management nodes, and managed nodes. 8.根据权利要求1所述的基于Nagios和BMC的服务器监控管理方法,其特征在于,要点4)中在所述服务器远程监控管理客户端的设计方法,目的是通过Nagios与BMC的带内带外信息交互,在原有的基于Nagios的带内监控管理系统基础上,增加基于BMC的带外监控管理系统。 8. The server monitoring and management method based on Nagios and BMC according to claim 1, characterized in that, in point 4), in the design method of remote monitoring and management client of the server, the purpose is to use Nagios and BMC in-band and out-of-band For information exchange, on the basis of the original Nagios-based in-band monitoring and management system, an out-of-band monitoring and management system based on BMC is added.
CN201410134635.0A 2014-04-04 2014-04-04 A kind of server monitoring management method based on Nagios and BMC Active CN103905253B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410134635.0A CN103905253B (en) 2014-04-04 2014-04-04 A kind of server monitoring management method based on Nagios and BMC

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410134635.0A CN103905253B (en) 2014-04-04 2014-04-04 A kind of server monitoring management method based on Nagios and BMC

Publications (2)

Publication Number Publication Date
CN103905253A true CN103905253A (en) 2014-07-02
CN103905253B CN103905253B (en) 2018-09-28

Family

ID=50996410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410134635.0A Active CN103905253B (en) 2014-04-04 2014-04-04 A kind of server monitoring management method based on Nagios and BMC

Country Status (1)

Country Link
CN (1) CN103905253B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104360922A (en) * 2014-10-20 2015-02-18 浪潮电子信息产业股份有限公司 Method for automatically monitoring BMC working state based on ipmitool
CN104506348A (en) * 2014-12-12 2015-04-08 上海新炬网络信息技术有限公司 Method for automatically discovering and configuring monitoring object
CN105072167A (en) * 2015-07-24 2015-11-18 江苏省公用信息有限公司 Monitoring method for portal host system
CN105512004A (en) * 2015-12-11 2016-04-20 浪潮电子信息产业股份有限公司 Method for avoiding server hard disk fault caused by abnormal ambient temperature and humidity
CN105573776A (en) * 2014-11-06 2016-05-11 华为技术有限公司 Software installation method for site server and site server
CN105634814A (en) * 2016-01-05 2016-06-01 浪潮电子信息产业股份有限公司 A monitoring method for server asset information change
CN106201813A (en) * 2016-07-26 2016-12-07 浪潮电子信息产业股份有限公司 Method for automatically testing BMC stability
CN106330567A (en) * 2016-09-14 2017-01-11 郑州云海信息技术有限公司 A server management control method and system for a server cluster
CN106533792A (en) * 2016-12-12 2017-03-22 北京锐安科技有限公司 Method and device for monitoring and configuring resources
WO2017107484A1 (en) * 2015-12-23 2017-06-29 深圳市华讯方舟软件技术有限公司 Cloud computing monitoring method and device
CN107026757A (en) * 2017-04-14 2017-08-08 广东浪潮大数据研究有限公司 A kind of system and method for Server remote monitoring management
CN107491308A (en) * 2017-08-15 2017-12-19 成都民航空管科技发展有限公司 Utilize script and the system and method for plug-in unit fast custom multipoint positioning monitoring system
CN107769953A (en) * 2016-08-23 2018-03-06 佛山市顺德区顺达电脑厂有限公司 Server failure detecting system
CN108197029A (en) * 2018-01-08 2018-06-22 华为技术有限公司 A kind of method and apparatus for obtaining progress information
CN108197008A (en) * 2018-01-31 2018-06-22 郑州云海信息技术有限公司 A kind of log collecting method, system, device and computer readable storage medium
CN108984466A (en) * 2018-06-29 2018-12-11 深圳市同泰怡信息技术有限公司 The exchange method of BMC and server OS, system
CN109766110A (en) * 2018-12-27 2019-05-17 联想(北京)有限公司 A kind of control method, baseboard management controller and control system
CN109947221A (en) * 2019-02-28 2019-06-28 苏州浪潮智能科技有限公司 A method of server cooling control
CN110377136A (en) * 2019-06-18 2019-10-25 苏州浪潮智能科技有限公司 A kind of PSU original value log recording method and device
CN110781053A (en) * 2019-09-29 2020-02-11 苏州浪潮智能科技有限公司 Method and device for detecting memory degradation errors
CN115473779A (en) * 2022-07-21 2022-12-13 浪潮通信技术有限公司 Server management method and system
CN118631824A (en) * 2024-06-05 2024-09-10 江苏智先生信息科技有限公司 A medical data protection and real-time monitoring system based on device autonomous management

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040122924A1 (en) * 2002-12-18 2004-06-24 Coryell Larry G. System and method for providing a flexible framework for remote heterogeneous server management and control
CN1547346A (en) * 2003-12-15 2004-11-17 浙江大学 An open ultra-long-distance industrial monitoring information integration method and system
CN101577698A (en) * 2008-05-09 2009-11-11 中兴通讯股份有限公司 System with external intelligent management server and method for monitoring server and processing commands
CN102323905A (en) * 2011-07-21 2012-01-18 曙光信息产业股份有限公司 Remote monitoring system for Godson main board

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040122924A1 (en) * 2002-12-18 2004-06-24 Coryell Larry G. System and method for providing a flexible framework for remote heterogeneous server management and control
CN1547346A (en) * 2003-12-15 2004-11-17 浙江大学 An open ultra-long-distance industrial monitoring information integration method and system
CN101577698A (en) * 2008-05-09 2009-11-11 中兴通讯股份有限公司 System with external intelligent management server and method for monitoring server and processing commands
CN102323905A (en) * 2011-07-21 2012-01-18 曙光信息产业股份有限公司 Remote monitoring system for Godson main board

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
魏根芽: "基于Linux的Nagios服务器监控系统的研究与实现", 《计算机与现代化》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104360922A (en) * 2014-10-20 2015-02-18 浪潮电子信息产业股份有限公司 Method for automatically monitoring BMC working state based on ipmitool
CN105573776B (en) * 2014-11-06 2019-04-12 华为技术有限公司 A kind of software installation method and server in station of server in station
CN105573776A (en) * 2014-11-06 2016-05-11 华为技术有限公司 Software installation method for site server and site server
CN104506348A (en) * 2014-12-12 2015-04-08 上海新炬网络信息技术有限公司 Method for automatically discovering and configuring monitoring object
CN104506348B (en) * 2014-12-12 2017-08-29 上海新炬网络信息技术有限公司 It is a kind of to automate the method for finding simultaneously configuration monitoring object
CN105072167A (en) * 2015-07-24 2015-11-18 江苏省公用信息有限公司 Monitoring method for portal host system
CN105512004A (en) * 2015-12-11 2016-04-20 浪潮电子信息产业股份有限公司 Method for avoiding server hard disk fault caused by abnormal ambient temperature and humidity
WO2017107484A1 (en) * 2015-12-23 2017-06-29 深圳市华讯方舟软件技术有限公司 Cloud computing monitoring method and device
CN105634814A (en) * 2016-01-05 2016-06-01 浪潮电子信息产业股份有限公司 A monitoring method for server asset information change
CN106201813A (en) * 2016-07-26 2016-12-07 浪潮电子信息产业股份有限公司 Method for automatically testing BMC stability
CN107769953A (en) * 2016-08-23 2018-03-06 佛山市顺德区顺达电脑厂有限公司 Server failure detecting system
CN106330567A (en) * 2016-09-14 2017-01-11 郑州云海信息技术有限公司 A server management control method and system for a server cluster
CN106533792A (en) * 2016-12-12 2017-03-22 北京锐安科技有限公司 Method and device for monitoring and configuring resources
CN107026757A (en) * 2017-04-14 2017-08-08 广东浪潮大数据研究有限公司 A kind of system and method for Server remote monitoring management
CN107491308A (en) * 2017-08-15 2017-12-19 成都民航空管科技发展有限公司 Utilize script and the system and method for plug-in unit fast custom multipoint positioning monitoring system
CN108197029A (en) * 2018-01-08 2018-06-22 华为技术有限公司 A kind of method and apparatus for obtaining progress information
CN108197029B (en) * 2018-01-08 2021-06-01 华为技术有限公司 A method and device for acquiring process information
CN108197008A (en) * 2018-01-31 2018-06-22 郑州云海信息技术有限公司 A kind of log collecting method, system, device and computer readable storage medium
CN108984466A (en) * 2018-06-29 2018-12-11 深圳市同泰怡信息技术有限公司 The exchange method of BMC and server OS, system
CN109766110A (en) * 2018-12-27 2019-05-17 联想(北京)有限公司 A kind of control method, baseboard management controller and control system
CN109947221A (en) * 2019-02-28 2019-06-28 苏州浪潮智能科技有限公司 A method of server cooling control
CN110377136A (en) * 2019-06-18 2019-10-25 苏州浪潮智能科技有限公司 A kind of PSU original value log recording method and device
CN110781053A (en) * 2019-09-29 2020-02-11 苏州浪潮智能科技有限公司 Method and device for detecting memory degradation errors
US11853150B2 (en) 2019-09-29 2023-12-26 Inspur Suzhou Intelligent Technology Co., Ltd. Method and device for detecting memory downgrade error
CN115473779A (en) * 2022-07-21 2022-12-13 浪潮通信技术有限公司 Server management method and system
CN115473779B (en) * 2022-07-21 2024-01-09 浪潮通信技术有限公司 Server management method and system
CN118631824A (en) * 2024-06-05 2024-09-10 江苏智先生信息科技有限公司 A medical data protection and real-time monitoring system based on device autonomous management

Also Published As

Publication number Publication date
CN103905253B (en) 2018-09-28

Similar Documents

Publication Publication Date Title
CN103905253B (en) A kind of server monitoring management method based on Nagios and BMC
CN106610836B (en) A microservice operation management tool
US9560062B2 (en) System and method for tamper resistant reliable logging of network traffic
US8719410B2 (en) Native bi-directional communication for hardware management
US10680896B2 (en) Virtualized network function monitoring
CN105573955B (en) Multi-protocol system management method and system and computer readable medium
US10848839B2 (en) Out-of-band telemetry data collection
US11675682B2 (en) Agent profiler to monitor activities and performance of software agents
US20140082142A1 (en) System and method for accessing operating system and hypervisors via a service processor of a server
US20140122930A1 (en) Performing diagnostic tests in a data center
US10936354B2 (en) Rebuilding a virtual infrastructure based on user data
CN105528273A (en) A server host hardware monitoring method and device and an electronic apparatus
CN114553672B (en) A method, device, equipment and medium for determining performance bottleneck of an application system
Cohen et al. Introducing new deformable surfaces to segment 3D images
US9854042B2 (en) Automated assessment report generation
US8296262B1 (en) Systems and methods for real-time online monitoring of computing devices
US9307015B1 (en) Cloud black box for cloud infrastructure
US10552282B2 (en) On demand monitoring mechanism to identify root cause of operation problems
US8769088B2 (en) Managing stability of a link coupling an adapter of a computing system to a port of a networking device for in-band data communications
JP2014127210A (en) Operation scheduling system for virtual machines and its method
CN107025155A (en) User method of testing and device is remotely set up based on ipmitool
WO2023276039A1 (en) Server management device, server management method, and program
Brim et al. Monitoring extreme-scale Lustre toolkit
US20250130925A1 (en) Providing automated application feedback for software testing
WO2023276038A1 (en) Server management device, server management method, and program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant