JP5258040B2

JP5258040B2 - Apparatus for supporting detection of failure event, method for supporting detection of failure event, and computer program

Info

Publication number: JP5258040B2
Application number: JP2008279292A
Authority: JP
Inventors: 祐介兼安; 泰久後藤
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2008-10-30
Filing date: 2008-10-30
Publication date: 2013-08-07
Anticipated expiration: 2028-10-30
Also published as: JP2010108225A

Description

本発明は、シンプトンの記憶量を過度に増大させることなく、障害検出精度を高く維持することができる障害イベントの検出を支援する装置、障害イベントの検出を支援する方法及びコンピュータプログラムに関する。 The present invention relates to an apparatus that supports detection of a fault event that can maintain high fault detection accuracy without excessively increasing the storage amount of symptom, a method that supports detection of a fault event, and a computer program.

昨今のコンピュータ技術の急速な発展により、コンピュータシステムは社会インフラを構築する基幹システムに当然のように組み込まれている。社会インフラを定常的に正常に運用するためには、相当の運用コストが発生する。斯かる運用コストを少しでも削減し、しかもシステムの安定度を高める技術としてオートノミック・コンピューティング・システムが注目されている。 Due to the rapid development of computer technology in recent years, computer systems are naturally incorporated in core systems that build social infrastructure. In order to operate social infrastructure regularly and properly, considerable operational costs are incurred. An autonomic computing system has attracted attention as a technique for reducing such operation costs as much as possible and increasing the stability of the system.

オートノミック・コンピューティング・システムは、システム規模の自己管理型環境を構築する技術全体の総称であり、システムに生じた問題、障害等を検出して自律的に解消するシステム全般を意味している。システムに生じた問題、障害等を検出する方法は、多様な方法が開示されている。 The autonomic computing system is a general term for all technologies for constructing a system-wide self-managed environment, and means an entire system that autonomously resolves problems and failures that occur in the system. Various methods have been disclosed for detecting problems, faults, etc. occurring in the system.

例えば特許文献１では、障害を含むサブジェクト構成要素の状態に至った根本原因を、サブジェクト構成要素が依存する他の構成要素及び該構成要素に関連する関係モデルの一部を走査することにより特定し、構成要素それぞれに関連する状態を判定する根本原因識別方法が開示されている。また、特許文献２では、コンピューティング環境下の様々なコンポーネント間の依存関係情報、特に実行時の依存関係を管理する依存性管理方法が開示されている。
特表２００５−５３８４５９号公報特開２００４−１０３０１５号公報 For example, in Patent Document 1, the root cause that has led to the state of a subject component including a fault is specified by scanning another component on which the subject component depends and a part of a relational model related to the component. A root cause identifying method for determining a state related to each component is disclosed. Patent Document 2 discloses a dependency management method for managing dependency relationship information between various components under a computing environment, in particular, dependency relationship at the time of execution.
JP 2005-538459 A JP 2004-103015 A

しかし、特許文献１に開示されている根本原因識別方法では、依存関係モデルを上流から下流までくまなく走査することにより精度良く根本原因を検出することができるが、依存関係モデルが複雑である場合には、走査自体に相当の時間を要し、依存関係モデルの走査順序も特定されていないことからパフォーマンス、ユーザビリティの低下を引き起こすおそれがあるという問題点があった。 However, in the root cause identification method disclosed in Patent Document 1, the root cause can be detected with high accuracy by scanning the dependency model from upstream to downstream, but the dependency model is complicated. However, there is a problem that the scanning itself takes a considerable time and the scanning order of the dependency model is not specified, so that the performance and usability may be deteriorated.

斯かる問題を解決するためには、特許文献２に開示されている依存関係モデルにシステムの構成情報を付与することが考えられるが、モデル生成時に必ずしもシステムの構成情報を付与することができるとは限らない。したがって、システムの構成情報を有していない依存関係モデルが存在する場合であってもパフォーマンス、ユーザビリティの低下を防止することが望まれる。 In order to solve such a problem, it is conceivable to add system configuration information to the dependency relationship model disclosed in Patent Document 2, but it is possible to always add system configuration information at the time of model generation. Is not limited. Therefore, it is desirable to prevent a decrease in performance and usability even when there is a dependency model that does not have system configuration information.

本発明は斯かる事情に鑑みてなされたものであり、依存関係モデルを構成するために必要となる論理式等の知識（以下、シンプトン）にシステム構成情報が付与されていない場合であっても、シンプトンの適用効率を向上させ、障害が発生したイベントの検出精度を高く維持することができる障害イベントの検出を支援する装置、障害イベントの検出を支援する方法及びコンピュータプログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and even when system configuration information is not given to knowledge (hereinafter referred to as symptoms) such as logical expressions necessary for constructing a dependency model. An object of the present invention is to provide an apparatus for supporting the detection of a fault event that can improve the efficiency of applying symptom and maintain high detection accuracy of the event in which the fault has occurred, a method for supporting the detection of a fault event, and a computer program And

上記目的を達成するために第１発明に係る障害イベントの検出を支援する装置は、少なくとも複数のコンポーネントを含むシステムのログ情報又は該システムでの障害発生時に各コンポーネントから出力される障害情報のいずれかを含む前記システムの履歴情報を収集する履歴情報収集手段と、発生した障害に関連する前記コンポーネントに含まれるイベントを検出する検出ルールに所定の付加情報を付加したシンプトンを記憶するシンプトン記憶手段と、収集された履歴情報及び記憶されているシンプトンに基づいて、前記シンプトンに適合するイベント群を特定するイベント群特定手段と、特定されたイベント群に基づいて、該イベント群それぞれを送出したコンポーネントと他のコンポーネントとの間の関連情報を含むシステム構成情報である部分構成情報を抽出する抽出手段と、特定されたイベント群及び抽出された部分構成情報に基づいて障害が発生したイベントが正常に検出されたか否かに関する正誤情報を取得する正誤情報取得手段と、取得した正誤情報及び抽出された部分構成情報に基づいて、前記イベント群特定の基礎となった前記シンプトンを更新する更新手段とを備える。 Device, the log information Homata of a system including at least a plurality of components failure information output from each component upon occurrence failure in the system to support the detection of a failure event according to the first invention to achieve the above object History information collecting means for collecting history information of the system including any of the above, and a symptom storage for storing a symptom in which predetermined additional information is added to a detection rule for detecting an event included in the component related to the fault that has occurred Means, an event group specifying means for specifying an event group that matches the symptom based on the collected history information and the stored symptom, and each event group is transmitted based on the specified event group System configuration information, including related information between the component and other components Extraction means for extracting partial configuration information, and correct / incorrect information acquisition means for acquiring whether or not an event in which a failure has occurred has been normally detected based on the identified event group and the extracted partial configuration information And updating means for updating the symptom that is the basis for specifying the event group based on the acquired correct / incorrect information and the extracted partial configuration information.

また、第２発明に係る障害イベントの検出を支援する装置は、第１発明において、前記システムの構成情報であるシステム構成情報を取得するシステム構成情報取得手段を備え、前記シンプトン記憶手段は、取得したシステム構成情報のうち、記憶されているシンプトンの基礎となったイベントを送出したコンポーネントに関するシステム構成情報である部分構成情報を前記シンプトンに付加情報として付加するようにしてある。 According to a second aspect of the present invention, there is provided an apparatus for supporting the detection of a failure event according to the first aspect, further comprising system configuration information acquisition means for acquiring system configuration information that is configuration information of the system, wherein the symptom storage means is acquired. Among the system configuration information, the partial configuration information, which is the system configuration information related to the component that sent the event that is the basis of the stored symptom, is added to the symptom as additional information.

次に、上記目的を達成するために第３発明に係る障害イベントの検出を支援する方法は、少なくとも複数のコンポーネントを含むシステムのログ情報又は該システムでの障害発生時に各コンポーネントから出力される障害情報のいずれかを含む前記システムの履歴情報を収集するステップと、発生した障害に関連する前記コンポーネントに含まれるイベントを検出する検出ルールに所定の付加情報を付加したシンプトンを記憶するステップと、収集された履歴情報及び記憶されているシンプトンに基づいて、前記シンプトンに適合するイベント群を特定するステップと、特定されたイベント群に基づいて、該イベント群それぞれを送出したコンポーネントと他のコンポーネントとの間の関連情報を含むシステム構成情報である部分構成情報を抽出するステップと、特定されたイベント群及び抽出された部分構成情報に基づいて障害が発生したイベントが正常に検出されたか否かに関する正誤情報を取得するステップと、取得した正誤情報及び抽出された部分構成情報に基づいて、前記イベント群特定の基礎となった前記シンプトンを更新するステップとを含む。 Next, a method for supporting the detection of the failure event according to a third invention for achieving the above object, the log information Homata of a system including at least a plurality of components is output from each component upon occurrence failure in the system Collecting history information of the system including any failure information , storing a symptom in which predetermined additional information is added to a detection rule for detecting an event included in the component related to the failure that has occurred, and A step of identifying an event group that matches the symptom based on the collected history information and the stored symptom, and a component that sent each of the event group based on the identified event group and other components Partial configuration information, which is system configuration information including information related to A step of acquiring, a step of acquiring correct / incorrect information on whether or not an event in which a failure has occurred has been normally detected based on the identified event group and the extracted partial configuration information, and the acquired correct / incorrect information and the extracted Updating the symptom that is the basis for identifying the event group based on partial configuration information.

次に、上記目的を達成するために第４発明に係るコンピュータプログラムは、複数のコンポーネントを含むシステムで障害が発生したイベントの検出を支援するコンピュータで実行することが可能であり、前記コンピュータを、少なくとも前記システムのログ情報又は該システムでの障害発生時に各コンポーネントから出力される障害情報のいずれかを含む前記システムの履歴情報を収集する履歴情報収集手段、発生した障害に関連する前記コンポーネントに含まれるイベントを検出する検出ルールに所定の付加情報を付加したシンプトンを記憶するシンプトン記憶手段、収集された履歴情報及び記憶されているシンプトンに基づいて、前記シンプトンに適合するイベント群を特定するイベント群特定手段、特定されたイベント群に基づいて、該イベント群それぞれを送出したコンポーネントと他のコンポーネントとの間の関連情報を含むシステム構成情報である部分構成情報を抽出する抽出手段、特定されたイベント群及び抽出された部分構成情報に基づいて障害が発生したイベントが正常に検出されたか否かに関する正誤情報を取得する正誤情報取得手段、及び取得した正誤情報及び抽出された部分構成情報に基づいて、前記イベント群特定の基礎となった前記シンプトンを更新する更新手段として機能させる。 Next, in order to achieve the above object, a computer program according to the fourth invention can be executed by a computer that supports detection of an event in which a failure has occurred in a system including a plurality of components. said component at least log information Homata of the system associated with the history information collecting means for collecting history information of the system, the failure that has occurred, including any of the failure information output from each component upon occurrence failure in the system Based on the symptom storage means for storing the symptom obtained by adding predetermined additional information to the detection rule for detecting the event included in the event, the collected history information, and the stored symptom, the event group that matches the symptom is specified. Event group identification means, based on the identified event group Extraction means for extracting partial configuration information, which is system configuration information including related information between the component that sent each event group and other components, a failure based on the identified event group and the extracted partial configuration information The correct / incorrect information acquisition means for acquiring correct / incorrect information regarding whether or not the event in which the event occurred has been normally detected, and the symptom that is the basis for specifying the event group based on the acquired correct / incorrect information and the extracted partial configuration information It functions as an updating means for updating.

また、第５発明に係るコンピュータプログラムは、第４発明において、前記コンピュータを、前記システムの構成情報であるシステム構成情報を取得するシステム構成情報取得手段として機能させ、前記シンプトン記憶手段を、取得したシステム構成情報のうち、記憶されているシンプトンの基礎となったイベントを送出したコンポーネントに関するシステム構成情報である部分構成情報を前記シンプトンに付加情報として付加する手段として機能させる。 The computer program according to a fifth aspect of the present invention is the computer program according to the fourth aspect , wherein the computer functions as system configuration information acquisition means for acquiring system configuration information that is configuration information of the system, and the symptom storage means is acquired. Of the system configuration information, the partial configuration information that is the system configuration information related to the component that has transmitted the event that is the basis of the stored symptom is caused to function as means for adding to the symptom as additional information.

次に、上記目的を達成するために第６発明に係る障害イベントの検出を支援する装置は、少なくとも複数のコンポーネントを含むシステムのログ情報又は該システムでの障害発生時に各コンポーネントから出力される障害情報のいずれかを含む前記システムの履歴情報を収集する履歴情報収集手段と、発生した障害に関連する前記コンポーネントに含まれるイベントを検出する検出ルールに所定の付加情報を付加したシンプトンを記憶するシンプトン記憶手段と、収集された履歴情報及び記憶されているシンプトンに基づいて、前記シンプトンに適合するイベント群を特定するイベント群特定手段と、特定されたイベント群に基づいて、該イベント群それぞれを送出したコンポーネントと他のコンポーネントとの間の関連情報を含むシステム構成情報である部分構成情報を抽出する抽出手段と、該抽出手段で抽出された部分構成情報と、前記イベント群特定手段で前記イベント群を特定するのに適用した前記シンプトンに含まれている部分構成情報との適合度を算出する適合度算出手段と、算出された適合度が所定値より大きいか否かを判断する判断手段と、該判断手段で所定値より大きいと判断された場合、前記抽出手段で抽出された部分構成情報に正常に検出された旨を示す情報を付加した正常検出構成情報を、所定値以下であると判断された場合、前記抽出手段で抽出された部分構成情報に誤検出された旨を示す情報を付加した誤検出構成情報を、それぞれ前記シンプトンに付加して前記シンプトンを更新する更新手段とを備える。 Next, apparatus for supporting detection of a failure event according to the sixth invention to achieve the above object, the log information Homata of a system including at least a plurality of components is output from each component upon occurrence failure in the system History information collecting means for collecting history information of the system including any of the failure information , and a symptom in which predetermined additional information is added to a detection rule for detecting an event included in the component related to the failure that has occurred Symptom storage means, event group specifying means for specifying an event group matching the symptom based on collected history information and stored symptom, and each event group based on the specified event group System that contains relevant information between the component that sent the message and other components Extraction means for extracting partial configuration information as composition information, partial configuration information extracted by the extraction means, and a portion included in the symptom applied to specify the event group by the event group specification means The degree-of-fit calculation means for calculating the degree of fit with the configuration information, the judgment means for judging whether or not the calculated degree of fit is larger than a predetermined value, When it is determined that the normal detection configuration information obtained by adding information indicating normal detection to the partial configuration information extracted by the extraction unit is equal to or less than a predetermined value, the partial configuration information extracted by the extraction unit is And updating means for updating the symptom by adding erroneously detected configuration information to which information indicating that the error has been detected is added to the symptom.

本発明によれば、部分構成情報をシンプトンが保有していない場合であっても、障害が発生したイベントの検出結果に応じて部分構成情報をシンプトンに付加することができる。したがって、付加された部分構成情報により、どのシンプトンを優先して適用するべきか容易に判断することができ障害が発生したイベントの検出精度の向上を図ることができる。また、誤って障害が発生したイベントを検出した場合の部分構成情報も記憶しておくことにより、誤検出時に用いられた部分構成情報との適合度も提示することができ、適合度に基づくシンプトン適用の順位付けをより精度良く行うことが可能となる。 According to the present invention, even if the symptom does not hold the partial configuration information, the partial configuration information can be added to the symptom according to the detection result of the event in which the failure has occurred. Therefore, it is possible to easily determine which symptom should be preferentially applied based on the added partial configuration information, and it is possible to improve the detection accuracy of an event in which a failure has occurred. Also, by storing the partial configuration information when an event in which a fault has occurred in error is stored, the degree of conformity with the partial configuration information used at the time of erroneous detection can also be presented. Application ranking can be performed with higher accuracy.

以下、本発明の実施の形態に係る障害イベントの検出を支援する装置について、図面に基づいて具体的に説明する。以下の実施の形態は、特許請求の範囲に記載された発明を限定するものではなく、実施の形態の中で説明されている特徴的事項の組み合わせの全てが解決手段の必須事項であるとは限らないことは言うまでもない。 Hereinafter, an apparatus for supporting detection of a failure event according to an embodiment of the present invention will be specifically described with reference to the drawings. The following embodiments do not limit the invention described in the claims, and all combinations of characteristic items described in the embodiments are essential to the solution. It goes without saying that it is not limited.

また、本発明は多くの異なる態様にて実施することが可能であり、実施の形態の記載内容に限定して解釈されるべきものではない。実施の形態を通じて同じ要素には同一の符号を付している。 The present invention can be implemented in many different modes and should not be construed as being limited to the description of the embodiment. The same symbols are attached to the same elements throughout the embodiments.

以下の実施の形態では、コンピュータシステムにコンピュータプログラムを導入した障害イベントの検出を支援する装置について説明するが、当業者であれば明らかな通り、本発明はその一部をコンピュータで実行することが可能なコンピュータプログラムとして実施することができる。したがって、本発明は、障害イベントの検出を支援する装置というハードウェアとしての実施の形態、ソフトウェアとしての実施の形態、又はソフトウェアとハードウェアとの組み合わせの実施の形態をとることができる。コンピュータプログラムは、ハードディスク、ＤＶＤ、ＣＤ、光記憶装置、磁気記憶装置等の任意のコンピュータで読み取ることが可能な記録媒体に記録することができる。 In the following embodiments, an apparatus for supporting the detection of a failure event in which a computer program is introduced into a computer system will be described. However, as will be apparent to those skilled in the art, the present invention can be partially executed by a computer. It can be implemented as a possible computer program. Therefore, the present invention can take a hardware embodiment of a device that supports detection of a failure event, a software embodiment, or a combination of software and hardware. The computer program can be recorded on any computer-readable recording medium such as a hard disk, DVD, CD, optical storage device, magnetic storage device or the like.

本発明の実施の形態では、部分構成情報をシンプトンが保有していない場合であっても、障害が発生したイベントの検出結果に応じて部分構成情報をシンプトンに付加することができる。したがって、付加された部分構成情報により、どのシンプトンを優先して適用するべきか容易に判断することができ、障害が発生したイベントの検出精度の向上を図ることができる。また、誤って障害が発生したイベントを検出した場合の部分構成情報も付加して記憶しておくことにより、誤検出時に用いられた部分構成情報との適合度も提示することができ、適合度に基づくシンプトン適用の順位付けをより精度良く行うことが可能となる。ここで、「部分構成情報」とは、システムを構成しているコンポーネントのうち障害が発生したイベントである障害発生イベントを送出したコンポーネントと他のコンポーネントとの間の依存関係を含む関連情報を意味する。例えばコンポーネントであるアプリケーションサーバとデータベースとの関連情報、リンク情報等が含まれる。したがって、対象となるコンポーネント間のトポロジー図を正確に作成することができる。 In the embodiment of the present invention, even if the symptom does not hold the partial configuration information, the partial configuration information can be added to the symptom according to the detection result of the event in which the failure has occurred. Therefore, it is possible to easily determine which symptom should be preferentially applied based on the added partial configuration information, and it is possible to improve the detection accuracy of an event in which a failure has occurred. In addition, by adding and storing partial configuration information when an event in which a fault has occurred accidentally is detected, the degree of conformity with the partial configuration information used at the time of erroneous detection can be presented. It becomes possible to perform the ranking of the symptom application based on the above with higher accuracy. Here, “partial configuration information” means related information including dependency relationships between the component that sent the fault occurrence event that is the event in which the fault has occurred and the other components that constitute the system. To do. For example, information related to the application server as a component and a database, link information, and the like are included. Therefore, it is possible to accurately create a topology diagram between target components.

また、コンポーネント間の依存関係に関する情報だけではなく、障害解析に有用な関連を導出することが可能な情報、例えば通信における接続関係、命令、指示等による操作主体、客体の関係等の情報も含めてシンプトンに付加することができ、障害が発生している根本原因をより精度良く特定することができ、どのような条件下で障害が発生したイベントが検出されたのか条件をより容易に絞り込むことが可能となる。 Also includes not only information about dependency relationships between components, but also information that can be used to derive useful relationships for failure analysis, such as connection relationships in communications, information on operating subjects based on commands, instructions, etc., object relationships, etc. Can be added to the symptom, the root cause of the failure can be identified more accurately, and the conditions under which the failure event is detected can be more easily narrowed down. Is possible.

（実施の形態１）
図１は、本発明の実施の形態１に係る障害イベントの検出を支援する装置を含む障害イベント検出装置の構成例を示すブロック図である。本発明の実施の形態１に係る障害イベント検出装置１は、少なくともＣＰＵ（中央演算装置）１１、メモリ１２、記憶装置１３、Ｉ／Ｏインタフェース１４、通信インタフェース１５、ビデオインタフェース１６、可搬型ディスクドライブ１７及び上述したハードウェアを接続する内部バス１８で構成されている。 (Embodiment 1)
FIG. 1 is a block diagram illustrating a configuration example of a failure event detection apparatus including an apparatus that supports detection of a failure event according to Embodiment 1 of the present invention. The failure event detection apparatus 1 according to the first embodiment of the present invention includes at least a CPU (Central Processing Unit) 11, a memory 12, a storage device 13, an I / O interface 14, a communication interface 15, a video interface 16, and a portable disk drive. 17 and an internal bus 18 for connecting the hardware described above.

ＣＰＵ１１は、内部バス１８を介して障害イベント検出装置１の上述したようなハードウェア各部と接続されており、上述したハードウェア各部の動作を制御するとともに、記憶装置１３に記憶されているコンピュータプログラム１００に従って、種々のソフトウェア的機能を実行する。メモリ１２は、ＳＲＡＭ、ＳＤＲＡＭ等の揮発性メモリで構成され、コンピュータプログラム１００の実行時にロードモジュールが展開され、コンピュータプログラム１００の実行時に発生する一時的なデータ等を記憶する。 The CPU 11 is connected to the above-described hardware units of the failure event detection apparatus 1 via the internal bus 18, controls the operation of the above-described hardware units, and stores the computer program stored in the storage device 13. According to 100, various software functions are performed. The memory 12 is composed of a volatile memory such as SRAM or SDRAM, and a load module is expanded when the computer program 100 is executed, and stores temporary data generated when the computer program 100 is executed.

記憶装置１３は、内蔵される固定型記憶装置（ハードディスク）、ＲＯＭ等で構成されている。記憶装置１３に記憶されているコンピュータプログラム１００は、プログラム及びデータ等の情報を記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体９０から、可搬型ディスクドライブ１７によりダウンロードされ、実行時には記憶装置１３からメモリ１２へ展開して実行される。もちろん、通信インタフェース１５を介してネットワーク２に接続されている外部のコンピュータからダウンロードされたコンピュータプログラムであっても良い。 The storage device 13 includes a built-in fixed storage device (hard disk), a ROM, and the like. The computer program 100 stored in the storage device 13 is downloaded by a portable disk drive 17 from a portable recording medium 90 such as a DVD or CD-ROM in which information such as programs and data is recorded. To the memory 12 and executed. Of course, a computer program downloaded from an external computer connected to the network 2 via the communication interface 15 may be used.

また記憶装置１３は、シンプトンデータベース１３１を備えている。シンプトンデータベース１３１には、障害が発生したイベントを検出するための検出ルールに加えて、検出ルールごとに障害検出時の推奨アクション、コメント等を付加してある。ユーザが、障害が発生したイベントを選択し、ルールパターン等の検出ルールの生成に必要な情報を入力した場合、選択されたイベントに応じて検出ルールが抽出され、コンポーネントのトポロジー図とともに表示装置２３に表示される。 The storage device 13 includes a symptom database 131. In the symptom database 131, in addition to detection rules for detecting an event in which a failure has occurred, recommended actions, comments, and the like at the time of failure detection are added for each detection rule. When the user selects an event in which a failure has occurred and inputs information necessary for generating a detection rule such as a rule pattern, the detection rule is extracted according to the selected event, and the display device 23 together with the component topology diagram. Is displayed.

また記憶装置１３は、障害が発生したか否かの監視対象となるシステムのシステム構成情報を記憶する構成情報記憶部１３２と、監視対象となるシステムのログ情報、該システムで障害が発生した場合に出力されるイベント情報等の履歴情報を記憶する履歴情報記憶部１３３とを備えている。構成情報記憶部１３２は、監視対象となる監視対象システム２００のコンポーネント間の依存関係情報、各コンポーネントの関連情報等を含むＣＣＭＤＢ（ＣｈａｎｇｅａｎｄＣｏｎｆｉｇｕｒａｔｉｏｎＭａｎａｇｅｍｅｎｔＤＢ）で構成されている。構成情報記憶部１３２に記憶されているシステム構成情報に基づいてコンポーネントのトポロジー図を表示することができる。なお、構成情報記憶部１３２は、記憶装置１３内に備わっていても良いが、通常は本実施の形態１に係る障害イベント検出装置１とは別個に設けてあり、例えばネットワーク２を介して接続されている外部コンピュータ等に備わっている。 In addition, the storage device 13 includes a configuration information storage unit 132 that stores system configuration information of a system to be monitored as to whether or not a failure has occurred, log information of the system to be monitored, and when a failure occurs in the system And a history information storage unit 133 for storing history information such as event information output to. The configuration information storage unit 132 is configured by a CCMDB (Change and Configuration Management DB) including dependency relationship information between components of the monitoring target system 200 to be monitored and related information of each component. Based on the system configuration information stored in the configuration information storage unit 132, the component topology diagram can be displayed. The configuration information storage unit 132 may be provided in the storage device 13, but is normally provided separately from the failure event detection device 1 according to the first embodiment, and is connected via the network 2, for example. It is equipped with an external computer etc.

通信インタフェース１５は内部バス１８に接続されており、インターネット、ＬＡＮ、ＷＡＮ等の外部のネットワーク２に接続されることにより、外部のコンピュータ等とデータ送受信を行うことが可能となっている。また、監視対象システム２００ともネットワーク２を介して接続されており、システム構成情報、障害発生時の履歴情報等を取得することが可能となっている。 The communication interface 15 is connected to an internal bus 18 and is connected to an external network 2 such as the Internet, a LAN, or a WAN, thereby enabling data transmission / reception with an external computer or the like. The monitoring target system 200 is also connected via the network 2 and can acquire system configuration information, history information when a failure occurs, and the like.

Ｉ／Ｏインタフェース１４は、キーボード２１、マウス２２等のデータ入力媒体と接続され、データの入力を受け付ける。また、ビデオインタフェース１６は、ＣＲＴモニタ、ＬＣＤ等の表示装置２３と接続され、所定の画像を表示する。 The I / O interface 14 is connected to a data input medium such as a keyboard 21 and a mouse 22 and receives data input. The video interface 16 is connected to a display device 23 such as a CRT monitor or LCD, and displays a predetermined image.

図２は、本発明の実施の形態１に係る障害イベント検出装置１の機能ブロック図である。構成情報抽出部２０１は、監視対象システム２００に含まれるコンポーネント間の関連情報を含むシステム構成情報を抽出して、構成情報記憶部１３２に記憶する。コンポーネント間の関連情報を含むシステム構成情報とは、例えばコンポーネント間の通信における接続関係情報、操作・非操作の関係に関するリンク関係情報等である。なお、構成情報抽出部２０１は本発明に必須の構成要件ではなく、事前に構成情報記憶部１３２にシステム構成情報を生成しておいても良く、障害イベント検出装置１内に内蔵していてもいなくても良い。すなわち、構成情報抽出部２０１及び構成情報記憶部１３２は、本発明の実施の形態１に係る障害イベント検出装置１の必須の構成要件ではない。 FIG. 2 is a functional block diagram of the failure event detection apparatus 1 according to Embodiment 1 of the present invention. The configuration information extraction unit 201 extracts system configuration information including related information between components included in the monitoring target system 200 and stores the extracted system configuration information in the configuration information storage unit 132. The system configuration information including related information between components is, for example, connection relationship information in communication between components, link relationship information regarding a relationship between operation and non-operation, and the like. Note that the configuration information extraction unit 201 is not an essential configuration requirement of the present invention, and system configuration information may be generated in the configuration information storage unit 132 in advance, or may be built in the failure event detection apparatus 1. It is not necessary. That is, the configuration information extraction unit 201 and the configuration information storage unit 132 are not essential configuration requirements of the failure event detection apparatus 1 according to Embodiment 1 of the present invention.

構成情報取得部２０２は、構成情報記憶部１３２に記憶されているシステム構成情報を取得する。システム構成情報は監視対象システム２００ごとに対応付けて構成情報記憶部１３２に記憶してあり、監視対象システム２００に応じて対応するシステム構成情報を取得する。 The configuration information acquisition unit 202 acquires system configuration information stored in the configuration information storage unit 132. The system configuration information is stored in the configuration information storage unit 132 in association with each monitoring target system 200, and corresponding system configuration information is acquired according to the monitoring target system 200.

履歴情報収集部２０３は、監視対象システム２００を常時監視し、監視対象システム２００に含まれる各コンポーネントから出力されたログ情報及び／又は障害発生時に出力されるイベント情報等の障害情報を含む履歴情報を収集して、履歴情報記憶部１３３に記憶する。ログ情報は、常時出力されるシステムログ等に限定されるものではなく、障害発生時に割り込み処理等により出力されるメッセージ情報等を含んでも良い。 The history information collection unit 203 constantly monitors the monitoring target system 200, and includes history information including failure information such as log information output from each component included in the monitoring target system 200 and / or event information output when a failure occurs. Are collected and stored in the history information storage unit 133. The log information is not limited to a system log or the like that is always output, and may include message information or the like that is output by interrupt processing or the like when a failure occurs.

なお、履歴情報収集部２０３で収集した履歴情報は、それぞれデータ形式が相違することが多く、そのままでは障害が発生したイベントの候補となるイベントを特定するための基本情報として用いることができない場合も生じうる。そこで、データ形式変換部２１２を備え、標準的な統一データ形式に変換して履歴情報記憶部１３３へ記憶しておくことが望ましい。 Note that the history information collected by the history information collection unit 203 often has a different data format, and may not be used as basic information for identifying an event that is a candidate event that has failed. Can occur. Therefore, it is desirable to provide a data format conversion unit 212 and convert it into a standard unified data format and store it in the history information storage unit 133.

検出ルール生成部２０４は、発生した障害に関連するコンポーネントに含まれるイベントを検出するための検出ルールを生成する。シンプトン記憶部２０５は、生成された検出ルールに所定の付加情報を付加したシンプトンを記憶する。付加情報としては、障害検出時の推奨アクション等に関する情報、各種コメントを含むメッセージ情報等を付加してある。 The detection rule generation unit 204 generates a detection rule for detecting an event included in a component related to the occurred failure. The symptom storage unit 205 stores a symptom in which predetermined additional information is added to the generated detection rule. As additional information, information on recommended actions at the time of failure detection, message information including various comments, and the like are added.

イベント検出部２０６は、記憶されているシンプトンに基づいて障害が発生したイベントを検出する。例えば、監視対象システム２００のシステム構成情報が付加情報として付加されているシンプトンを適用する場合には、システム構成情報を考慮したイベント検出処理を実行することができる。 The event detection unit 206 detects an event in which a failure has occurred based on the stored symptoms. For example, when applying a symptom to which the system configuration information of the monitoring target system 200 is added as additional information, an event detection process considering the system configuration information can be executed.

イベント選択受付部２０７は、発生した障害に関連するコンポーネントに含まれるイベントの選択を、例えばイベントリスト等からの選択として受け付ける。部分構成情報抽出部２０８は、構成情報取得部２０２で取得されたシステム構成情報から、イベント選択受付部２０７で選択を受け付けたイベントを送出したコンポーネントに関するシステム構成情報である部分構成情報を抽出する。抽出される部分構成情報とは、システムを構成しているコンポーネントのうち障害発生イベントとして選択を受け付けたイベントを送出したコンポーネントと他のコンポーネントとの間の依存関係を含む関連情報も含んでいる。例えばコンポーネントであるアプリケーションサーバとデータベースとの関連情報、リンク情報等も含まれる。 The event selection accepting unit 207 accepts selection of an event included in a component related to the occurred failure as a selection from an event list or the like, for example. The partial configuration information extraction unit 208 extracts, from the system configuration information acquired by the configuration information acquisition unit 202, partial configuration information that is system configuration information related to the component that has sent the event whose selection is received by the event selection reception unit 207. The extracted partial configuration information also includes related information including a dependency relationship between the component that sent the event that received the selection as a failure occurrence event and the other components among the components constituting the system. For example, information related to the application server as a component and database, link information, and the like are also included.

図３は、本発明の実施の形態１に係る障害イベント検出装置１における部分構成情報を含むシンプトンの構成の例示図である。図３（ａ）は従来のシンプトンの構成の例示図であり、シンプトンは、論理情報としてエラーＡに起因してエラーＢが誘導される旨を示す因果関係情報を有している。 FIG. 3 is an exemplary diagram of a symptom configuration including partial configuration information in the failure event detection apparatus 1 according to Embodiment 1 of the present invention. FIG. 3A is an exemplary diagram of a conventional symptom configuration, and the symptom has causal relationship information indicating that the error B is induced due to the error A as logical information.

一方、図３（ｂ）は部分構成情報を付加したシンプトンの構成の例示図である。シンプトンは論理情報としての因果関係情報に、具体的なコンポーネントであるアプリケーションサーバＡにてエラーＡが生じることに起因して、具体的なコンポーネントであるデータベースＢにてエラーＢが生じる旨が付加されている。すなわち、コンポーネントであるアプリケーションサーバＡとデータベースＢとの依存関係に関する情報を付加することにより、障害発生イベントの検出のためのシンプトンとしてコンポーネント間の依存関係をも考慮に入れることができ、より障害発生イベントの検出精度を高めることができる。 On the other hand, FIG. 3B is an exemplary diagram of a symptom configuration to which partial configuration information is added. The symptom adds to the causal relationship information as logical information that error B occurs in database B, which is a specific component, due to occurrence of error A in application server A, which is a specific component. ing. In other words, by adding information about the dependency relationship between the application server A and the database B, which are components, the dependency relationship between components can be taken into account as a symptom for detecting a failure occurrence event. Event detection accuracy can be increased.

シンプトン更新部２０９は、部分構成情報抽出部２０８で抽出された部分構成情報を、対応するシンプトンに付加して記憶する。すなわち、部分構成情報が付加情報となる。部分構成情報提示部２１０は、部分構成情報抽出部２０８で抽出された部分構成情報を表示装置２３に提示する。更新受付部２１１は、提示されている部分構成情報の更新を受け付ける。これにより、ユーザは提示された部分構成情報を所望の構成に変更することにより、適合した部分構成情報を確実に生成することができる。 The symptom update unit 209 adds the partial configuration information extracted by the partial configuration information extraction unit 208 to the corresponding symptom and stores it. That is, the partial configuration information becomes additional information. The partial configuration information presentation unit 210 presents the partial configuration information extracted by the partial configuration information extraction unit 208 on the display device 23. The update reception unit 211 receives an update of the presented partial configuration information. Accordingly, the user can surely generate the adapted partial configuration information by changing the presented partial configuration information to a desired configuration.

図４は、部分構成情報提示部２１０により表示装置２３に提示される画面４０の例示図である。トポロジー図表示領域４１には、監視対象システム２００に含まれるコンポーネントの依存関係を示すトポロジー図を表示する。イベントリスト表示領域４２には、監視対象システム２００に含まれるイベントが一覧表示される。イベント選択受付部２０７により、イベントリスト表示領域４２に表示されているイベント群の中から、発生した障害に関連するコンポーネントに含まれるイベントの選択を受け付けた場合、選択を受け付けたイベント及び依存関係を有するイベントが強調表示される。図４では、選択を受け付けたイベント及び依存関係を有するイベントの表示色を変更して表示している。強調表示の方法は特に限定されるものではなく、輝度を変更しても良い。 FIG. 4 is an exemplary view of a screen 40 presented on the display device 23 by the partial configuration information presentation unit 210. In the topology diagram display area 41, a topology diagram showing the dependency relationship of components included in the monitoring target system 200 is displayed. The event list display area 42 displays a list of events included in the monitoring target system 200. When the event selection receiving unit 207 receives a selection of an event included in a component related to the failure that has occurred from the event group displayed in the event list display area 42, the event selection and the dependency relationship are displayed. The event you have is highlighted. In FIG. 4, the display color of the event that has received the selection and the event having the dependency relationship is changed and displayed. The highlighting method is not particularly limited, and the luminance may be changed.

部分構成情報表示領域４３は、部分構成情報提示部２１０により、選択を受け付けたイベントに依存するコンポーネントのトポロジー図を部分的に表示する。コンポーネント表示領域４４は、部分構成情報表示領域４３に表示されているコンポーネントの内容を詳細に表示する。部分構成情報表示領域４３に表示されているコンポーネントに対する更新情報を更新受付部２１１により受け付けることにより、部分構成情報を更新することもできる。具体的にはマウス等の操作によりイベントリスト表示領域４２に一覧表示されているイベントから改めてイベントを選択することにより更新する。 In the partial configuration information display area 43, the partial configuration information presentation unit 210 partially displays the topology diagram of the component depending on the event for which the selection is accepted. The component display area 44 displays details of the components displayed in the partial configuration information display area 43. The partial configuration information can also be updated by receiving update information for the components displayed in the partial configuration information display area 43 by the update receiving unit 211. Specifically, it is updated by selecting an event again from the events listed in the event list display area 42 by operating the mouse or the like.

部分構成情報は、コンポーネントごとの依存関係を含む情報としてシンプトンデータベース１３１に記憶される。図５は、記憶される部分構成情報の例示図である。図５（ａ）は、部分構成情報表示領域４３及びコンポーネント表示領域４４の例示図であり、図５（ｂ）は、記憶される部分構成情報のデータ例である。図５（ｂ）に示すように、部分構成情報表示領域４３に抽出されているコンポーネントごとに、コンポーネントの種類、依存関係、及び各コンポーネント間のリンク情報がコード情報として記憶される。もちろん、記憶されるデータ形式は図５に示すデータ形式に限定されるものではない。 The partial configuration information is stored in the symptom database 131 as information including dependency relationships for each component. FIG. 5 is an exemplary diagram of the stored partial configuration information. FIG. 5A is an exemplary diagram of the partial configuration information display area 43 and the component display area 44, and FIG. 5B is a data example of the stored partial configuration information. As shown in FIG. 5B, for each component extracted in the partial configuration information display area 43, the component type, the dependency relationship, and link information between the components are stored as code information. Of course, the data format to be stored is not limited to the data format shown in FIG.

図６は、本発明の実施の形態１に係る障害イベント検出装置１のＣＰＵ１１の部分構成情報の付加処理の手順を示すフローチャートである。まず障害イベント検出装置１のＣＰＵ１１は、監視対象システム２００に含まれるコンポーネント間の関連情報を含むシステム構成情報を取得する（ステップＳ６０１）。もちろん、事前にシステム構成情報を取得して、構成情報記憶部１３２に記憶しておいても良い。 FIG. 6 is a flowchart showing a procedure of the partial configuration information addition processing of the CPU 11 of the failure event detection apparatus 1 according to Embodiment 1 of the present invention. First, the CPU 11 of the failure event detection apparatus 1 acquires system configuration information including related information between components included in the monitoring target system 200 (step S601). Of course, the system configuration information may be acquired in advance and stored in the configuration information storage unit 132.

ＣＰＵ１１は、監視対象システム２００に含まれる各コンポーネントから出力されたログ情報及び／又は障害発生時に出力されるイベント情報等の障害情報を含む履歴情報を収集して、履歴情報記憶部１３３に記憶する（ステップＳ６０２）。ログ情報は、常時出力されるシステムログ等に限定されるものではなく、障害発生時に割り込み処理等により出力されるメッセージ情報等を含んでも良い。 The CPU 11 collects history information including failure information such as log information output from each component included in the monitoring target system 200 and / or event information output when a failure occurs, and stores the collected history information in the history information storage unit 133. (Step S602). The log information is not limited to a system log or the like that is always output, and may include message information or the like that is output by interrupt processing or the like when a failure occurs.

なお、履歴情報収集部２０３で収集した履歴情報は、それぞれデータ形式が相違することが多く、そのままでは障害が発生したイベントの候補となるイベントを特定するための基本情報として用いることができない場合も生じうる。そこで、ステップＳ６０２にて履歴情報記憶部１３３に記憶する前に、収集した履歴情報を統一された標準のデータ形式に変換しておくことが望ましい。これにより、コンポーネントごとに異なる履歴情報であっても、統一された標準のデータ形式にて収集することができ、検出ルール生成時に全ての履歴情報を活用することができる。 Note that the history information collected by the history information collection unit 203 often has a different data format, and may not be used as basic information for identifying an event that is a candidate event that has failed. Can occur. Therefore, it is desirable to convert the collected history information into a unified standard data format before storing it in the history information storage unit 133 in step S602. As a result, even history information that is different for each component can be collected in a unified standard data format, and all history information can be utilized when generating a detection rule.

ＣＰＵ１１は、障害発生イベントの候補となる全てのイベントを表示装置２３へ提示して（ステップＳ６０３）、ユーザによるイベントの選択を受け付ける（ステップＳ６０４）。具体的にはマウス等の操作により、例えばイベントリスト等からの選択として受け付ける。 The CPU 11 presents all the events that are candidates for the failure occurrence event to the display device 23 (step S603), and accepts the selection of the event by the user (step S604). Specifically, it is accepted as a selection from an event list or the like, for example, by operating a mouse or the like.

ＣＰＵ１１は、選択を受け付けたイベント及び取得したシステム構成情報に基づいて、選択を受け付けたイベントを送出したコンポーネントに関するシステム構成情報である部分構成情報を抽出する（ステップＳ６０５）。抽出される部分構成情報とは、システムを構成しているコンポーネントのうち障害発生イベントとして選択を受け付けたイベントを送出したコンポーネントと他のコンポーネントとの間の依存関係を含む関連情報も含んでいる。図３に示すようなアプリケーションサーバとデータベースとの関連情報も含まれる。 The CPU 11 extracts partial configuration information, which is system configuration information related to the component that has sent the selected event, based on the selected event and the acquired system configuration information (step S605). The extracted partial configuration information also includes related information including a dependency relationship between the component that sent the event that received the selection as a failure occurrence event and the other components among the components constituting the system. Information related to the application server and database as shown in FIG. 3 is also included.

ＣＰＵ１１は、抽出された部分構成情報を、対応するシンプトンに付加情報として付加して記憶する（ステップＳ６０６）。このように部分構成情報をシンプトンに付加することにより、従来のように論理式だけでなく、監視対象システム２００のシステム構成情報を反映させたシンプトンを検出ルールとして生成することができる。したがって、誤検出の可能性を著しく減少させることができ、既知の障害であるか否かを正確に判別することが可能となる。 The CPU 11 adds the extracted partial configuration information as additional information to the corresponding symptom and stores it (step S606). By adding the partial configuration information to the symptom in this way, it is possible to generate a symptom that reflects the system configuration information of the monitoring target system 200 as a detection rule as well as a logical expression as in the past. Therefore, the possibility of erroneous detection can be significantly reduced, and it can be accurately determined whether or not it is a known failure.

以上のように本実施の形態１によれば、部分構成情報をシンプトンに付加することにより、記憶されているシンプトンの内容を確認することで障害が発生している根本原因をより精度良く特定することができ、どのような条件下で該障害が発生したイベントが検出されたのか容易に絞り込むことが可能となる。 As described above, according to the first embodiment, by adding the partial configuration information to the symptom, it is possible to more accurately identify the root cause of the failure by confirming the contents of the stored symptom. It is possible to easily narrow down under what conditions the event in which the failure has occurred is detected.

（実施の形態２）
本発明の実施の形態２に係る障害イベントの検出を支援する装置を含む障害イベント検出装置１の構成は、実施の形態１と同様であることから、同一の符号を付することにより詳細な説明は省略する。本実施の形態２は、シンプトンに部分構成情報が付加されている場合に、障害検出処理に適用するシンプトンに優先順位を付する点で実施の形態１と相違する。 (Embodiment 2)
Since the configuration of the fault event detection apparatus 1 including the apparatus that supports the detection of the fault event according to the second embodiment of the present invention is the same as that of the first embodiment, detailed description is given by attaching the same reference numerals. Is omitted. The second embodiment is different from the first embodiment in that, when partial configuration information is added to the symptom, priority is given to the symptom to be applied to the failure detection process.

図７は、本発明の実施の形態２に係る障害イベント検出装置１の機能ブロック図である。構成情報抽出部２０１は、監視対象システム２００に含まれるコンポーネント間の関連情報を含むシステム構成情報を抽出して、構成情報記憶部１３２に記憶する。コンポーネント間の関連情報を含むシステム構成情報とは、例えばコンポーネント間の通信における接続関係情報、操作・非操作の関係に関するリンク関係情報等である。なお、構成情報抽出部２０１は本発明に必須の構成要件ではなく、事前に構成情報記憶部１３２にシステム構成情報を生成しておいても良く、障害イベント検出装置１内に内蔵していてもいなくても良い。すなわち、構成情報抽出部２０１及び構成情報記憶部１３２は、本発明の実施の形態２に係る障害イベント検出装置１の必須の構成要件ではない。 FIG. 7 is a functional block diagram of the failure event detection apparatus 1 according to Embodiment 2 of the present invention. The configuration information extraction unit 201 extracts system configuration information including related information between components included in the monitoring target system 200 and stores the extracted system configuration information in the configuration information storage unit 132. The system configuration information including related information between components is, for example, connection relationship information in communication between components, link relationship information regarding a relationship between operation and non-operation, and the like. Note that the configuration information extraction unit 201 is not an essential configuration requirement of the present invention, and system configuration information may be generated in the configuration information storage unit 132 in advance, or may be built in the failure event detection apparatus 1. It is not necessary. That is, the configuration information extraction unit 201 and the configuration information storage unit 132 are not essential configuration requirements of the failure event detection apparatus 1 according to Embodiment 2 of the present invention.

本発明の実施の形態２に係る障害イベント検出装置１における部分構成情報を含むシンプトンの構成は実施の形態１と同様であり、上述した図３（ｂ）に示す構成と同じである。すなわち、従来のシンプトンにコンポーネントであるアプリケーションサーバＡとデータベースＢとの依存関係に関する情報を追加することにより、障害発生イベントの検出のためのシンプトンとしてコンポーネント間の依存関係をも考慮に入れることができ、より障害が発生したイベントの検出精度を高めることができる。 The configuration of the symptom including the partial configuration information in the failure event detection apparatus 1 according to the second embodiment of the present invention is the same as that of the first embodiment, and is the same as the configuration shown in FIG. That is, by adding information on the dependency relationship between the application server A and the database B, which are components, to the conventional symptom, the dependency relationship between components can be taken into account as a symptom for detecting a failure event. Therefore, the detection accuracy of the event where the failure has occurred can be improved.

図８は、部分構成情報提示部２１０により表示装置２３に提示される画面８０の例示図である。トポロジー図表示領域８１には、監視対象システム２００に含まれるコンポーネントの依存関係を示すトポロジー図を表示する。シンプトンリスト表示領域８２には、シンプトンデータベース１３１に記憶されているシンプトンが一覧表示される。例えばシンプトンリスト表示領域８２に表示されているシンプトン群の中から、障害発生イベントの検出に適用するべきシンプトンの選択を受け付けた場合、選択を受け付けたシンプトンに付加されている部分構成情報が部分構成情報表示領域８３に表示される。 FIG. 8 is an exemplary diagram of a screen 80 presented on the display device 23 by the partial configuration information presentation unit 210. In the topology diagram display area 81, a topology diagram showing the dependency relationship of the components included in the monitored system 200 is displayed. The symptom list display area 82 displays a list of symptoms stored in the symptom database 131. For example, when the selection of a symptom to be applied to the detection of a failure event is received from the symptom group displayed in the symptom list display area 82, the partial configuration information added to the selected symptom is a partial configuration. It is displayed in the information display area 83.

トポロジー図表示領域８１に表示されているコンポーネントのうち、部分構成情報表示領域８３に表示されている部分構成情報と一致しているコンポーネントは、シンプトンの選択を受け付けた時点で強調表示される。図８では、選択を受け付けたシンプトン及び該シンプトンに付加されている部分構成情報に対応する部分の表示色を変更して表示している。強調表示の方法は特に限定されるものではなく、輝度を変更しても良い。 Among the components displayed in the topology diagram display area 81, the components that match the partial configuration information displayed in the partial configuration information display area 83 are highlighted when the selection of the symptom is accepted. In FIG. 8, the display color of the part corresponding to the symptom that has been selected and the partial configuration information added to the symptom is changed and displayed. The highlighting method is not particularly limited, and the luminance may be changed.

一致度算出部７０１は、構成情報取得部２０２で取得したシステム構成情報と、シンプトン更新部２０９でシンプトンに付加して記憶されている部分構成情報とを比較して、シンプトンデータベース１３１に記憶されている部分構成情報ごとに両者の一致度を算出する。 The degree-of-match calculation unit 701 compares the system configuration information acquired by the configuration information acquisition unit 202 with the partial configuration information stored in the symptom update unit 209 and stored in the symptom database 131. The degree of coincidence of the two is calculated for each partial configuration information.

シンプトン抽出部７０２は、一致度算出部７０１で算出された一致度に基づいて、イベント検出部２０６で適用するシンプトンを抽出する。すなわち、一致度が高いシンプトンの適用優先順位を高くすることにより、障害が発生したイベントの検出精度を高める。 The symptom extraction unit 702 extracts the symptom to be applied by the event detection unit 206 based on the coincidence degree calculated by the coincidence degree calculation unit 701. In other words, by increasing the application priority of a symptom having a high degree of coincidence, the detection accuracy of an event in which a failure has occurred is increased.

イベント検出部２０６は、一致度が高いシンプトンから順次適用して、障害が発生したイベントを検出する、一致度が高いシンプトンの適用優先順位を高めることにより、誤検出の可能性を低減し、障害が発生したイベントの検出精度を高めることができる。 The event detection unit 206 detects the event in which the failure has occurred by sequentially applying the symptom having a high degree of coincidence. It is possible to improve the detection accuracy of an event that has occurred.

図９は、本発明の実施の形態２に係る障害イベント検出装置１のＣＰＵ１１のシンプトン抽出処理の手順を示すフローチャートである。障害イベント検出装置１のＣＰＵ１１は、監視対象システム２００に含まれるコンポーネント間の関連情報を含むシステム構成情報を取得する（ステップＳ９０１）。もちろん、事前にシステム構成情報を取得して、構成情報記憶部１３２に記憶しておいても良い。 FIG. 9 is a flowchart showing a procedure of symptom extraction processing of the CPU 11 of the failure event detection apparatus 1 according to Embodiment 2 of the present invention. The CPU 11 of the failure event detection apparatus 1 acquires system configuration information including related information between components included in the monitoring target system 200 (step S901). Of course, the system configuration information may be acquired in advance and stored in the configuration information storage unit 132.

ＣＰＵ１１は、シンプトンデータベース１３１に記憶されているシンプトンから一のシンプトンを読み出し（ステップＳ９０２）、読み出したシンプトンに付加されている部分構成情報の、システム構成情報との一致度を算出する（ステップＳ９０３）。一致度の算出方法は特に限定されるものではなく、一実施例については後述する。 The CPU 11 reads one symptom from the symptoms stored in the symptom database 131 (step S902), and calculates the degree of coincidence between the partial configuration information added to the read symptom and the system configuration information (step S903). . The method for calculating the degree of coincidence is not particularly limited, and one embodiment will be described later.

ＣＰＵ１１は、算出した一致度の昇順にシンプトンをソートし（ステップＳ９０４）、記憶されている全てのシンプトンを読み出したか否かを判断する（ステップＳ９０５）。ＣＰＵ１１が、まだ読み出していないシンプトンが存在すると判断した場合（ステップＳ９０５：ＮＯ）、ＣＰＵ１１は、次のシンプトンを読み出し（ステップＳ９０６）、処理をステップＳ９０３へ戻して上述した処理を繰り返す。ＣＰＵ１１が、全てのシンプトンを読み出したと判断した場合（ステップＳ９０５：ＹＥＳ）、ＣＰＵ１１は、一致度の高い順にシンプトンを抽出して（ステップＳ９０７）、障害発生イベントの検出に適用する。 The CPU 11 sorts the symptoms in ascending order of the calculated degree of coincidence (step S904), and determines whether all stored symptoms have been read (step S905). If the CPU 11 determines that there is a symptom that has not yet been read (step S905: NO), the CPU 11 reads the next symptom (step S906), returns the process to step S903, and repeats the above-described process. When the CPU 11 determines that all symptoms have been read (step S905: YES), the CPU 11 extracts the symptoms in descending order of coincidence (step S907) and applies them to the detection of the failure event.

図１０及び図１１は、本発明の実施の形態２に係る障害イベント検出装置１のＣＰＵ１１の一致度の算出処理の手順を示すフローチャートである。図１０において、障害イベント検出装置１のＣＰＵ１１は、一致度の最大値Ｎmax 、最小値Ｎmin を初期化する（ステップＳ１００１）。本実施の形態２では、最大値Ｎmax を１００に、最小値Ｎmin を０に設定する。ＣＰＵ１１は、最大値Ｎmax を部分構成情報に含まれるコンポーネント及びリンクに割り当てる（ステップＳ１００２）。割り当て方法によって、コンポーネント及びリンクの重要度の重み付けをすることができる。 10 and 11 are flowcharts showing the procedure of the degree of coincidence calculation processing of the CPU 11 of the failure event detection apparatus 1 according to Embodiment 2 of the present invention. In FIG. 10, the CPU 11 of the failure event detection apparatus 1 initializes the maximum value Nmax and the minimum value Nmin of coincidence (step S1001). In the second embodiment, the maximum value Nmax is set to 100, and the minimum value Nmin is set to 0. The CPU 11 assigns the maximum value Nmax to the component and link included in the partial configuration information (step S1002). The importance of components and links can be weighted according to the allocation method.

ＣＰＵ１１は、読み出したシンプトンに付加されている部分構成情報に含まれるコンポーネントと一致するコンポーネントが有るか否かを判断する（ステップＳ１００３）。一致するコンポーネントが無いと判断した場合（ステップＳ１００３：ＮＯ）、ＣＰＵ１１は、最小値Ｎmin を一致度とする（ステップＳ１００４）。すなわち、一致度Ｎは０（ゼロ）となる。 The CPU 11 determines whether there is a component that matches the component included in the partial configuration information added to the read symptom (step S1003). If it is determined that there is no matching component (step S1003: NO), the CPU 11 sets the minimum value Nmin as the matching degree (step S1004). That is, the coincidence degree N is 0 (zero).

ＣＰＵ１１が、一致するコンポーネントが有ると判断した場合（ステップＳ１００３：ＹＥＳ）、ＣＰＵ１１は、一のコンポーネントを選択する（ステップＳ１００５）。ＣＰＵ１１は、選択したコンポーネントの属性に応じてコンポーネントの割当値に乗ずる係数αを特定する（ステップＳ１００６）。例えば、コンポーネントのタイプが一致している場合はα＝０．１、コンポーネントの製品名が一致している場合はα＝０．６、コンポーネントのバージョンが上位互換で該当している場合はα＝０．８、バージョンが一致している場合はα＝１．０等のように特定すれば良い。 When the CPU 11 determines that there is a matching component (step S1003: YES), the CPU 11 selects one component (step S1005). The CPU 11 specifies a coefficient α to be multiplied by the component allocation value in accordance with the attribute of the selected component (step S1006). For example, α = 0.1 when the component types match, α = 0.6 when the component product names match, and α = 0.6 when the component version is upward compatible. If the versions match, it may be specified as α = 1.0.

ＣＰＵ１１は、コンポーネントの割当値に係数αを乗算してコンポーネントの一致度Ｎ１を算出し（ステップＳ１００７）、一致するすべてのコンポーネントを選択したか否かを判断する（ステップＳ１００８）。ＣＰＵ１１が、まだ選択されていないコンポーネントが有ると判断した場合（ステップＳ１００８：ＮＯ）、ＣＰＵ１１は、次のコンポーネントを選択し（ステップＳ１００９）、処理をステップＳ１００６へ戻して上述した処理を繰り返す。 The CPU 11 multiplies the component allocation value by the coefficient α to calculate the component matching degree N1 (step S1007), and determines whether all matching components have been selected (step S1008). If the CPU 11 determines that there is a component that has not yet been selected (step S1008: NO), the CPU 11 selects the next component (step S1009), returns the process to step S1006, and repeats the above-described process.

ＣＰＵ１１が、すべてのコンポーネントを選択したと判断した場合（ステップＳ１００８：ＹＥＳ）、図１１に示すようにＣＰＵ１１は、関連するリンクが有るか否かを判断する（ステップＳ１１０１）。ＣＰＵ１１が、関連するリンクが有ると判断した場合（ステップＳ１１０１：ＹＥＳ）、ＣＰＵ１１は、一のリンクを選択し（ステップＳ１１０２）、選択したリンクの属性に応じてリンクの割当値に乗ずる係数βを特定する（ステップＳ１１０３）。 If the CPU 11 determines that all components have been selected (step S1008: YES), the CPU 11 determines whether there is a related link as shown in FIG. 11 (step S1101). When the CPU 11 determines that there is a related link (step S1101: YES), the CPU 11 selects one link (step S1102), and calculates a coefficient β by which the link allocation value is multiplied according to the attribute of the selected link. Specify (step S1103).

例えばコンポーネント間のリンクが、該当ルールを検出するために存在が必須である必須リンクと一致している場合にはβ＝１．０、必須リンクが間接的に存在している場合にはβ＝０．６、存在してもしなくても良い任意リンクである場合にはβ＝１．０、任意リンクが間接的に存在している場合にはβ＝０．８、存在してはならない禁止リンクが存在しない場合にはβ＝１．０、禁止リンクが間接的に存在している場合にはβ＝０．１、リンクの種類が一致している場合にはβ＝１．０、互換性がある場合にはβ＝０．８等のように特定すれば良い。 For example, β = 1.0 when a link between components matches a mandatory link that must be present in order to detect the corresponding rule, and β = when a mandatory link exists indirectly. 0.6, β = 1.0 if an arbitrary link may or may not exist, β = 0.8 if an arbitrary link exists indirectly, prohibited that must not exist Β = 1.0 if no link exists, β = 0.1 if a forbidden link exists indirectly, β = 1.0 if the link types match, compatible If there is a property, it may be specified such as β = 0.8.

ＣＰＵ１１は、リンクの割当値に係数βを乗算してリンクの一致度Ｎ２を算出し（ステップＳ１１０４）、関連するすべてのリンクを選択したか否かを判断する（ステップＳ１１０５）。ＣＰＵ１１が、選択されていないリンクが存在すると判断した場合（ステップＳ１１０５：ＮＯ）、ＣＰＵ１１は、次のリンクを選択し（ステップＳ１１０６）、処理をステップＳ１１０３へ戻して上述した処理を繰り返す。 The CPU 11 multiplies the link allocation value by the coefficient β to calculate the link matching degree N2 (step S1104), and determines whether or not all related links have been selected (step S1105). When the CPU 11 determines that there is an unselected link (step S1105: NO), the CPU 11 selects the next link (step S1106), returns the process to step S1103, and repeats the above-described process.

ＣＰＵ１１が、関連するリンクが無いと判断した場合（ステップＳ１１０１：ＮＯ）、ＣＰＵ１１は、リンクの一致度Ｎ２を０（ゼロ）とし（ステップＳ１１０７）、処理をステップＳ１１０８へ進める。ＣＰＵ１１が、すべてのリンクを選択したと判断した場合（ステップＳ１１０５：ＹＥＳ）、ＣＰＵ１１は、部分構成情報全体の一致度Ｎを、コンポーネントの一致度Ｎ１とリンクの一致度Ｎ２との和として算出し（ステップＳ１１０８）、連続リンクが有るか否かを判断する（ステップＳ１１０９）。 When the CPU 11 determines that there is no related link (step S1101: NO), the CPU 11 sets the link coincidence degree N2 to 0 (zero) (step S1107), and advances the processing to step S1108. When the CPU 11 determines that all the links have been selected (step S1105: YES), the CPU 11 calculates the matching degree N of the entire partial configuration information as the sum of the matching degree N1 of the component and the matching degree N2 of the link. (Step S1108), it is determined whether or not there is a continuous link (Step S1109).

ＣＰＵ１１が、連続リンクが有ると判断した場合（ステップＳ１１０９：ＹＥＳ）、ＣＰＵ１１は、所定の評価値Ｎ３を部分構成情報全体の一致度Ｎに加算する（ステップＳ１１１０）。ＣＰＵ１１が、連続リンクが無いと判断した場合（ステップＳ１１０９：ＮＯ）、ＣＰＵ１１は、ステップＳ１１１０をスキップして処理を終了する。 When the CPU 11 determines that there is a continuous link (step S1109: YES), the CPU 11 adds a predetermined evaluation value N3 to the matching degree N of the entire partial configuration information (step S1110). When the CPU 11 determines that there is no continuous link (step S1109: NO), the CPU 11 skips step S1110 and ends the process.

上述した一致度の算出方法を、具体例に基づいて説明する。一致度を算出する基礎となる部分構成情報は、アプリケーションサーバＡとデータベースＢとの２つのコンポーネントと、アプリケーションサーバＡとデータベースＢとの間のリンクとを有している。図１０及び図１１と同様、Ｎmax ＝１００、Ｎmin ＝０とし、重要度をアプリケーションサーバＡ：データベースＢ：リンク＝１：１：３とする。したがって、アプリケーションサーバＡの割当値は２０、データベースＢの割当値は２０、リンクの割当値は６０とＮmax が割り当てられる。 The above-described method for calculating the degree of coincidence will be described based on a specific example. The partial configuration information that is the basis for calculating the degree of coincidence includes two components, the application server A and the database B, and a link between the application server A and the database B. Similar to FIGS. 10 and 11, Nmax = 100 and Nmin = 0, and the importance is assumed to be application server A: database B: link = 1: 1: 3. Therefore, 20 is assigned to the application server A, 20 is assigned to the database B, and 60 is assigned to the link.

図１２は、部分構成情報と同一の構成がシステム構成情報に存在する場合の一致度算出例を示す模式図である。図１２の例では、同一のコンポーネントであるアプリケーションサーバＡとデータベースＢ２とが存在することから、コンポーネントの一致度はそれぞれ‘２０’であり、アプリケーションサーバＡとデータベースＢとの間のリンクも存在することから、リンクの一致度も‘６０’となる。したがって、図１２に示す部分構成情報の一致度Ｎは２０＋２０＋６０＝１００となる。 FIG. 12 is a schematic diagram illustrating an example of calculating the degree of coincidence when the same configuration as the partial configuration information exists in the system configuration information. In the example of FIG. 12, since the application server A and the database B2 which are the same components exist, the degree of coincidence between the components is “20”, and there is also a link between the application server A and the database B. For this reason, the matching degree of the link is also “60”. Accordingly, the degree of coincidence N of the partial configuration information shown in FIG. 12 is 20 + 20 + 60 = 100.

図１３は、部分構成情報のリンクがシステム構成情報に間接的に存在する場合の一致度算出例を示す模式図である。図１３の例では、同一のコンポーネントであるアプリケーションサーバＡとデータベースＢ１とが存在することから、コンポーネントの一致度はそれぞれ‘２０’であるが、アプリケーションサーバＡとデータベースＢとの間のリンクがコンポーネントＫを介する間接的なリンクであることから、リンクの一致度Ｎ２＝０．６×６０＝３６となる。したがって、図１３に示す部分構成情報の一致度Ｎは２０＋２０＋３６＝７６となる。 FIG. 13 is a schematic diagram illustrating an example of calculating the degree of coincidence when the link of the partial configuration information exists indirectly in the system configuration information. In the example of FIG. 13, since the application server A and the database B1 which are the same components exist, the degree of coincidence between the components is “20”, but the link between the application server A and the database B is the component. Since the link is an indirect link via K, the degree of matching N2 = 0.6 × 60 = 36. Accordingly, the coincidence degree N of the partial configuration information shown in FIG. 13 is 20 + 20 + 36 = 76.

図１４は、部分構成情報と同一のコンポーネントは存在するが、リンクが存在しない場合の一致度算出例を示す模式図である。図１４の例では、同一のコンポーネントであるアプリケーションサーバＡとデータベースＢ１とが存在することから、コンポーネントの一致度はそれぞれ‘２０’であるが、アプリケーションサーバＡとデータベースＢ１との間にリンクが存在しないことから、リンクの一致度は‘０’となる。したがって、図１４に示す部分構成情報の一致度Ｎは２０＋２０＋０＝４０となる。 FIG. 14 is a schematic diagram illustrating an example of calculating the degree of coincidence when the same component as the partial configuration information exists but no link exists. In the example of FIG. 14, since the application server A and the database B1 which are the same components exist, the degree of coincidence of the components is “20”, but there is a link between the application server A and the database B1. As a result, the matching degree of the link is “0”. Accordingly, the degree of coincidence N of the partial configuration information shown in FIG. 14 is 20 + 20 + 0 = 40.

図１５は、部分構成情報と同一のコンポーネントが存在しない場合の一致度算出例を示す模式図である。図１５の例では、同一のコンポーネントが存在しないことから、コンポーネントの一致度はそれぞれ‘０’であり、当然のことながらアプリケーションサーバＡとデータベースＢとの間のリンクも存在しないことから、リンクの一致度も‘０’となる。したがって、図１５に示す部分構成情報の一致度Ｎは０となる。 FIG. 15 is a schematic diagram illustrating an example of calculating the coincidence when the same component as the partial configuration information does not exist. In the example of FIG. 15, since the same component does not exist, the degree of coincidence between the components is “0”. Naturally, there is no link between the application server A and the database B. The degree of coincidence is also “0”. Accordingly, the degree of coincidence N of the partial configuration information shown in FIG.

以上のように本実施の形態２によれば、取得したシステム構成情報と、シンプトンに付加して記憶されている部分構成情報とを比較して算出した一致度が大きい部分構成情報に対応するシンプトンを適用して障害が発生したイベントを検出することができ、不要なシンプトンを適用することなく、効率的に障害が発生している根本原因を特定することができる。 As described above, according to the second embodiment, the symptom corresponding to the partial configuration information having a high degree of coincidence calculated by comparing the acquired system configuration information with the partial configuration information stored in addition to the symptom. Can be used to detect the event in which the failure has occurred, and the root cause of the failure can be identified efficiently without applying unnecessary symptoms.

（実施の形態３）
本発明の実施の形態３に係る障害イベントの検出を支援する装置を含む障害イベント検出装置１の構成は、実施の形態１及び２と同様であることから、同一の符号を付することにより詳細な説明は省略する。本実施の形態３は、抽出された部分構成情報と、イベント群を特定するのに用いたシンプトンに含まれている部分構成情報との適合度を算出してシンプトンの検出結果を評価する点で実施の形態１及び２と相違する。 (Embodiment 3)
Since the configuration of the fault event detection apparatus 1 including the apparatus that supports the detection of the fault event according to the third embodiment of the present invention is the same as that of the first and second embodiments, the same reference numerals are used for the details. The detailed explanation is omitted. In the third embodiment, the degree of matching between the extracted partial configuration information and the partial configuration information included in the symptom used to identify the event group is calculated, and the symptom detection result is evaluated. This is different from the first and second embodiments.

図１６は、本発明の実施の形態３に係る障害イベント検出装置１の機能ブロック図である。構成情報抽出部２０１は、監視対象システム２００に含まれるコンポーネント間の関連情報を含むシステム構成情報を抽出して、構成情報記憶部１３２に記憶する。コンポーネント間の関連情報を含むシステム構成情報とは、例えばコンポーネント間の通信における接続関係情報、操作・非操作の関係に関するリンク関係情報等である。なお、構成情報抽出部２０１は本発明に必須の構成要件ではなく、事前に構成情報記憶部１３２にシステム構成情報を生成しておいても良く、障害イベント検出装置１内に内蔵していてもいなくても良い。すなわち、構成情報抽出部２０１及び構成情報記憶部１３２は、本発明の実施の形態３に係る障害イベント検出装置１の必須の構成要件ではない。 FIG. 16 is a functional block diagram of the failure event detection apparatus 1 according to Embodiment 3 of the present invention. The configuration information extraction unit 201 extracts system configuration information including related information between components included in the monitoring target system 200 and stores the extracted system configuration information in the configuration information storage unit 132. The system configuration information including related information between components is, for example, connection relationship information in communication between components, link relationship information regarding a relationship between operation and non-operation, and the like. Note that the configuration information extraction unit 201 is not an essential configuration requirement of the present invention, and system configuration information may be generated in the configuration information storage unit 132 in advance, or may be built in the failure event detection apparatus 1. It is not necessary. That is, the configuration information extraction unit 201 and the configuration information storage unit 132 are not essential configuration requirements of the failure event detection apparatus 1 according to Embodiment 3 of the present invention.

本発明の実施の形態３に係る障害イベント検出装置１における部分構成情報を含むシンプトンの構成は実施の形態１と同様であり、上述した図３（ｂ）に示す構成と同じである。すなわち、従来のシンプトンにコンポーネントであるアプリケーションサーバＡとデータベースＢとの依存関係に関する情報を追加することにより、障害発生イベントの検出のためのシンプトンとしてコンポーネント間の依存関係をも考慮に入れることができ、より障害が発生したイベントの検出精度を高めることができる。 The configuration of the symptom including the partial configuration information in the failure event detection apparatus 1 according to the third embodiment of the present invention is the same as that of the first embodiment, and is the same as the configuration shown in FIG. That is, by adding information on the dependency relationship between the application server A and the database B, which are components, to the conventional symptom, the dependency relationship between components can be taken into account as a symptom for detecting a failure event. Therefore, the detection accuracy of the event where the failure has occurred can be improved.

部分構成情報提示部２１０により表示装置２３に提示される画面８０は、実施の形態２と同様であることから、詳細な説明は省略する。 Since the screen 80 presented to the display device 23 by the partial configuration information presentation unit 210 is the same as that in the second embodiment, detailed description thereof is omitted.

イベント群特定部１６０１は、履歴情報収集部２０３で収集された履歴情報及びシンプトンデータベース１３１に記憶されているシンプトンに基づいて、記憶されているシンプトンに適合するイベント群を特定する。抽出部１６０２は、イベント群特定部１６０１で特定されたイベント群それぞれを送出したコンポーネントと他のコンポーネントとの間の関連情報を含む部分構成情報を抽出する。 The event group identification unit 1601 identifies an event group that matches a stored symptom based on the history information collected by the history information collection unit 203 and the symptom stored in the symptom database 131. The extraction unit 1602 extracts partial configuration information including related information between the component that transmitted each event group identified by the event group identification unit 1601 and other components.

適合度算出部１６０３は、抽出部１６０２で抽出された部分構成情報と、イベント群特定部１６０１でイベント群を特定するのに用いたシンプトンデータベース１３１に記憶されているシンプトンに含まれている部分構成情報との適合度を算出する。適合度が高いシンプトンが検出されている場合には、障害が発生したイベントの誤検出の可能性が低いものと判断することができる。また、ユーザによるシンプトン適用の熟練度合に左右されること無く、一定の精度で障害が発生したイベントを検出することができる。すなわち、イベント検出部２０６は、適合度が高いシンプトンを適用して、障害が発生したイベントを検出することにより、障害が発生したイベントの誤検出の可能性を低減し、障害発生イベントの検出精度を高めることができる。 The fitness calculation unit 1603 includes the partial configuration information extracted by the extraction unit 1602 and the partial configuration included in the symptom stored in the symptom database 131 used to identify the event group by the event group identification unit 1601. The degree of fitness with information is calculated. If a symptom having a high fitness is detected, it can be determined that the possibility of erroneous detection of an event in which a failure has occurred is low. Further, it is possible to detect an event in which a failure has occurred with a certain degree of accuracy without depending on the skill level of symptom application by the user. In other words, the event detection unit 206 applies a symptom having a high degree of fitness to detect an event in which a failure has occurred, thereby reducing the possibility of erroneous detection of the event in which the failure has occurred, and detecting the failure occurrence event. Can be increased.

提示部１６０４は、適用したシンプトン、該シンプトンに付加されている部分構成情報、及び算出された該シンプトンの適合度を表示装置２３へ提示する。これにより、単にシンプトンの適用結果を適用順に提示するだけでなく適合度の高い順に提示することにより、障害発生イベントの誤検出結果が表示装置２３に表示される可能性を低減することができ、イベント検出精度の高いシンプトンから順に表示することが可能となる。 The presenting unit 1604 presents the applied symptom, the partial configuration information added to the symptom, and the calculated fitness of the symptom to the display device 23. Thereby, by not only presenting the application results of symptoms in the order of application but also presenting them in the order of high fitness, it is possible to reduce the possibility that the erroneous detection result of the failure event is displayed on the display device 23, It is possible to display in order from the symptom with high event detection accuracy.

図１７は、提示部１６０４により表示装置２３に提示される画面１６０の例示図である。トポロジー図表示領域１６１には、監視対象システム２００に含まれるコンポーネントの依存関係を示すトポロジー図を表示する。イベントリスト表示領域１６２には、監視対象システム２００に含まれるイベントが一覧表示される。イベント選択受付部２０７により、イベントリスト表示領域１６２に表示されているイベント群の中から、発生した障害に関連するコンポーネントに含まれるイベントの選択を受け付けた場合、選択を受け付けたイベント及び依存関係を有するイベントが強調表示される。図１７では、選択を受け付けたイベント及び依存関係を有するイベントの表示色を変更して表示している。強調表示の方法は特に限定されるものではなく、輝度を変更しても良い。 FIG. 17 is an exemplary diagram of a screen 160 presented on the display device 23 by the presentation unit 1604. In the topology diagram display area 161, a topology diagram showing the dependency relationship of components included in the monitoring target system 200 is displayed. The event list display area 162 displays a list of events included in the monitoring target system 200. When the event selection accepting unit 207 accepts a selection of an event included in a component related to the failure that has occurred from the event group displayed in the event list display area 162, the event and the dependency relationship that received the selection are displayed. The event you have is highlighted. In FIG. 17, the display color of the event that has received the selection and the event having the dependency relationship is changed and displayed. The highlighting method is not particularly limited, and the luminance may be changed.

シンプトンリスト表示領域１６３には、検出ルールに基づいて検出されたシンプトンが一覧表示され、部分構成情報表示領域１６４は、適用されたシンプトンに付加されている部分構成情報が表示される。部分構成情報表示領域１６４に表示されている部分構成情報と、イベント群の選択の受付によりトポロジー図表示領域１６１に表示されている部分構成情報とを照合して、適合度を算出する。算出された適合度は、シンプトンリスト表示領域１６３の対応するシンプトンの欄に提示される。 The symptom list display area 163 displays a list of symptoms detected based on the detection rule, and the partial configuration information display area 164 displays partial configuration information added to the applied symptom. The degree of conformity is calculated by comparing the partial configuration information displayed in the partial configuration information display area 164 with the partial configuration information displayed in the topology diagram display area 161 upon acceptance of the selection of the event group. The calculated fitness is presented in the corresponding symptom column in the symptom list display area 163.

図１８は、本発明の実施の形態３に係る障害イベント検出装置１のＣＰＵ１１の障害検出処理の手順を示すフローチャートである。障害イベント検出装置１のＣＰＵ１１は、監視対象システム２００に含まれるコンポーネント間の関連情報を含むシステム構成情報を取得する（ステップＳ１８０１）。もちろん、事前にシステム構成情報を取得して、構成情報記憶部１３２に記憶しておいても良い。 FIG. 18 is a flowchart showing the procedure of the failure detection process of the CPU 11 of the failure event detection apparatus 1 according to the third embodiment of the present invention. The CPU 11 of the failure event detection apparatus 1 acquires system configuration information including related information between components included in the monitoring target system 200 (step S1801). Of course, the system configuration information may be acquired in advance and stored in the configuration information storage unit 132.

ＣＰＵ１１は、監視対象システム２００のログ情報及び／又は障害発生時に各コンポーネントから出力される障害情報を含む履歴情報を収集し（ステップＳ１８０２）、履歴情報記憶部１３３に記憶する（ステップＳ１８０３）。ＣＰＵ１１は、収集された履歴情報及びシンプトンデータベース１３１に記憶されているシンプトンに基づいて、記憶されているシンプトンに適合するイベント群を特定する（ステップＳ１８０４）。 The CPU 11 collects log information of the monitoring target system 200 and / or history information including failure information output from each component when a failure occurs (step S1802), and stores it in the history information storage unit 133 (step S1803). Based on the collected history information and the symptom stored in the symptom database 131, the CPU 11 identifies an event group that matches the stored symptom (step S1804).

ＣＰＵ１１は、特定されたイベント群それぞれを送出したコンポーネントと他のコンポーネントとの間の関連情報を含む部分構成情報を抽出し（ステップＳ１８０５）、抽出された部分構成情報と、イベント群を特定するのに適用したシンプトンデータベース１３１に記憶されているシンプトンに付加されている部分構成情報との適合度を算出する（ステップＳ１８０６）。適合度の算出方法は特に限定されるものではなく、一実施例については後述する。 The CPU 11 extracts the partial configuration information including the related information between the component that sent each of the specified event group and the other component (step S1805), and specifies the extracted partial configuration information and the event group. The degree of matching with the partial configuration information added to the symptom stored in the symptom database 131 applied to is calculated (step S1806). The calculation method of the fitness is not particularly limited, and one embodiment will be described later.

ＣＰＵ１１は、適用したシンプトン、該シンプトンに付加されている部分構成情報、及び算出された該シンプトンの適合度を表示装置２３へ提示する（ステップＳ１８０７）。これにより、ユーザは、シンプトンの適合度の高い順に提示された結果を目視で確認することができる。 The CPU 11 presents the applied symptom, the partial configuration information added to the symptom, and the calculated fitness of the symptom to the display device 23 (step S1807). Thereby, the user can visually confirm the results presented in descending order of the symptom suitability.

図１９及び図２０は、本発明の実施の形態３に係る障害イベント検出装置１のＣＰＵ１１の適合度の算出処理の手順を示すフローチャートである。図１９において、障害イベント検出装置１のＣＰＵ１１は、適合度の最大値Ｐmax 、最小値Ｐmin を初期化する（ステップＳ１９０１）。本実施の形態３では、最大値Ｐmax を１００に、最小値Ｐmin を０に設定する。ＣＰＵ１１は、最大値Ｐmax を部分構成情報に含まれるコンポーネント及びリンクに割り当てる（ステップＳ１９０２）。割り当て方法によって、コンポーネント及びリンクの重要度の重み付けをすることができる。 19 and 20 are flowcharts showing the procedure of the fitness level calculation processing of the CPU 11 of the failure event detection apparatus 1 according to Embodiment 3 of the present invention. In FIG. 19, the CPU 11 of the failure event detection apparatus 1 initializes the maximum value Pmax and the minimum value Pmin of the fitness (step S1901). In the third embodiment, the maximum value Pmax is set to 100 and the minimum value Pmin is set to 0. The CPU 11 assigns the maximum value Pmax to the components and links included in the partial configuration information (step S1902). The importance of components and links can be weighted according to the allocation method.

ＣＰＵ１１は、読み出したシンプトンに付加されている部分構成情報に含まれるコンポーネントと一致するコンポーネントが有るか否かを判断する（ステップＳ１９０３）。一致するコンポーネントが無いと判断した場合（ステップＳ１９０３：ＮＯ）、ＣＰＵ１１は、最小値Ｐmin を適合度とする（ステップＳ１９０４）。すなわち、適合度Ｐは０（ゼロ）となる。 The CPU 11 determines whether there is a component that matches the component included in the partial configuration information added to the read symptom (step S1903). When determining that there is no matching component (step S1903: NO), the CPU 11 sets the minimum value Pmin as the fitness (step S1904). That is, the fitness P is 0 (zero).

ＣＰＵ１１が、一致するコンポーネントが有ると判断した場合（ステップＳ１９０３：ＹＥＳ）、ＣＰＵ１１は、一のコンポーネントを選択する（ステップＳ１９０５）。ＣＰＵ１１は、選択したコンポーネントの属性に応じてコンポーネントの割当値に乗ずる係数αを特定する（ステップＳ１９０６）。例えば、コンポーネントのタイプが一致している場合はα＝０．１、コンポーネントの製品名が一致している場合はα＝０．６、コンポーネントのバージョンが上位互換で該当している場合はα＝０．８、バージョンが一致している場合はα＝１．０等のように特定すれば良い。 If the CPU 11 determines that there is a matching component (step S1903: YES), the CPU 11 selects one component (step S1905). The CPU 11 specifies a coefficient α to be multiplied by the component allocation value in accordance with the attribute of the selected component (step S1906). For example, α = 0.1 when the component types match, α = 0.6 when the component product names match, and α = 0.6 when the component version is upward compatible. If the versions match, it may be specified as α = 1.0.

ＣＰＵ１１は、コンポーネントの割当値に係数αを乗算してコンポーネントの適合度Ｐ１を算出し（ステップＳ１９０７）、一致するすべてのコンポーネントを選択したか否かを判断する（ステップＳ１９０８）。ＣＰＵ１１が、まだ選択されていないコンポーネントが有ると判断した場合（ステップＳ１９０８：ＮＯ）、ＣＰＵ１１は、次のコンポーネントを選択し（ステップＳ１９０９）、処理をステップＳ１９０６へ戻して上述した処理を繰り返す。 The CPU 11 multiplies the component allocation value by the coefficient α to calculate the component suitability P1 (step S1907), and determines whether all matching components have been selected (step S1908). If the CPU 11 determines that there is a component that has not yet been selected (step S1908: NO), the CPU 11 selects the next component (step S1909), returns the process to step S1906, and repeats the above-described process.

ＣＰＵ１１が、すべてのコンポーネントを選択したと判断した場合（ステップＳ１９０８：ＹＥＳ）、図２０に示すようにＣＰＵ１１は、関連するリンクが有るか否かを判断する（ステップＳ２００１）。ＣＰＵ１１が、関連するリンクが有ると判断した場合（ステップＳ２００１：ＹＥＳ）、ＣＰＵ１１は、一のリンクを選択し（ステップＳ２００２）、選択したリンクの属性に応じてリンクの割当値に乗ずる係数βを特定する（ステップＳ２００３）。 If the CPU 11 determines that all components have been selected (step S1908: YES), the CPU 11 determines whether there is a related link as shown in FIG. 20 (step S2001). When the CPU 11 determines that there is a related link (step S2001: YES), the CPU 11 selects one link (step S2002), and calculates the coefficient β by which the link allocation value is multiplied according to the attribute of the selected link. Specify (step S2003).

ＣＰＵ１１は、リンクの割当値に係数βを乗算してリンクの適合度Ｐ２を算出し（ステップＳ２００４）、関連するすべてのリンクを選択したか否かを判断する（ステップＳ２００５）。ＣＰＵ１１が、選択されていないリンクが存在すると判断した場合（ステップＳ２００５：ＮＯ）、ＣＰＵ１１は、次のリンクを選択し（ステップＳ２００６）、処理をステップＳ２００３へ戻して上述した処理を繰り返す。 The CPU 11 multiplies the link allocation value by the coefficient β to calculate the link fitness P2 (step S2004), and determines whether all related links have been selected (step S2005). When the CPU 11 determines that there is an unselected link (step S2005: NO), the CPU 11 selects the next link (step S2006), returns the process to step S2003, and repeats the above-described process.

ＣＰＵ１１が、関連するリンクが無いと判断した場合（ステップＳ２００１：ＮＯ）、ＣＰＵ１１は、リンクの適合度Ｐ２を０（ゼロ）とし（ステップＳ２００７）、処理をステップＳ２００８へ進める。ＣＰＵ１１が、すべてのリンクを選択したと判断した場合（ステップＳ２００５：ＹＥＳ）、ＣＰＵ１１は、部分構成情報全体の適合度Ｐを、コンポーネントの適合度Ｐ１とリンクの適合度Ｐ２との和として算出し（ステップＳ２００８）、連続リンクが有るか否かを判断する（ステップＳ２００９）。 When the CPU 11 determines that there is no related link (step S2001: NO), the CPU 11 sets the link fitness P2 to 0 (zero) (step S2007), and advances the processing to step S2008. When the CPU 11 determines that all links have been selected (step S2005: YES), the CPU 11 calculates the fitness P of the entire partial configuration information as the sum of the component fitness P1 and the link fitness P2. (Step S2008), it is determined whether or not there is a continuous link (Step S2009).

ＣＰＵ１１が、連続リンクが有ると判断した場合（ステップＳ２００９：ＹＥＳ）、ＣＰＵ１１は、所定の評価値Ｐ３を部分構成情報全体の適合度Ｐに加算する（ステップＳ２０１０）。ＣＰＵ１１が、連続リンクが無いと判断した場合（ステップＳ２００９：ＮＯ）、ＣＰＵ１１は、ステップＳ２０１０をスキップして処理を終了する。 When the CPU 11 determines that there is a continuous link (step S2009: YES), the CPU 11 adds a predetermined evaluation value P3 to the matching degree P of the entire partial configuration information (step S2010). When the CPU 11 determines that there is no continuous link (step S2009: NO), the CPU 11 skips step S2010 and ends the process.

上述した適合度の算出方法を、具体例に基づいて説明する。適合度を算出する基礎となるシンプトンに付加されている部分構成情報は、アプリケーションサーバＡとデータベースＢとの２つのコンポーネントと、アプリケーションサーバＡとデータベースＢとの間のリンクとを有している。図１９及び図２０と同様、Ｐmax ＝１００、Ｐmin ＝０とし、重要度をアプリケーションサーバＡ：データベースＢ：リンク＝１：１：３とする。したがって、アプリケーションサーバＡの割当値は２０、データベースＢの割当値は２０、リンクの割当値は６０とＰmax が割り当てられる。 The above-described method for calculating the fitness will be described based on a specific example. The partial configuration information added to the symptom that is the basis for calculating the fitness has two components, the application server A and the database B, and a link between the application server A and the database B. Similarly to FIGS. 19 and 20, Pmax = 100 and Pmin = 0, and the importance is set to application server A: database B: link = 1: 1: 3. Therefore, 20 is assigned to the application server A, 20 is assigned to the database B, 60 is assigned to the link, and Pmax is assigned.

図２１は、シンプトンに付加されている部分構成情報と、システム構成情報との適合度の概念を説明する模式図である。システム構成情報には、アプリケーションサーバＡ１とデータベースＢ１、アプリケーションサーバＡ２とデータベースＢ２という２つのリンクが存在しており、アプリケーションサーバＡとデータベースＢとのリンク属性と属性がそれぞれ同一であるとする。 FIG. 21 is a schematic diagram for explaining the concept of the degree of matching between the partial configuration information added to the symptom and the system configuration information. In the system configuration information, there are two links, application server A1 and database B1, application server A2 and database B2, and it is assumed that the link attributes and attributes of application server A and database B are the same.

この場合、アプリケーションサーバＡ１とデータベースＢ１とのリンク２１０１、アプリケーションサーバＡ２とデータベースＢ２とのリンク２１０２は、シンプトン生成時に付加されている部分構成情報と属性が同一であることから類似していると判断される。したがって、この場合には適合度Ｐは大きな値として算出される。 In this case, it is determined that the link 2101 between the application server A1 and the database B1 and the link 2102 between the application server A2 and the database B2 are similar because the partial configuration information added at the time of symptom generation has the same attribute. Is done. Therefore, in this case, the fitness P is calculated as a large value.

一方、アプリケーションサーバＡ１とデータベースＢ２とのリンク２１０３、アプリケーションサーバＡ２とデータベースＢ１とのリンク２１０４は、コンポーネントの属性は一致するものの、シンプトン生成時に付加されている部分構成情報とは属性が相違する。したがって、この場合には適合度Ｐは小さな値として算出される。 On the other hand, the link 2103 between the application server A1 and the database B2 and the link 2104 between the application server A2 and the database B1 have the same component attributes but are different from the partial configuration information added at the time of symptom generation. Therefore, in this case, the fitness P is calculated as a small value.

図２２は、シンプトン生成時に付加されている部分構成情報と同一の構成がシステム構成情報に存在する場合の適合度算出例を示す模式図である。図２２の例では、同一のコンポーネントであるアプリケーションサーバＡ１とデータベースＢ１、アプリケーションサーバＡ２とデータベースＢ２が存在することから、コンポーネントの適合度はそれぞれ‘２０’であり、アプリケーションサーバＡとデータベースＢとの間のリンクと同一属性であるアプリケーションサーバＡ１とデータベースＢ１との間のリンク２１０１が存在する。したがって、リンクの適合度が‘６０’となるので、図２２に示す部分構成情報の適合度Ｐは２０＋２０＋６０＝１００となる。 FIG. 22 is a schematic diagram illustrating a fitness calculation example in the case where the same configuration as the partial configuration information added at the time of symptom generation exists in the system configuration information. In the example of FIG. 22, since the application server A1 and the database B1 and the application server A2 and the database B2 which are the same components exist, the compatibility of the components is “20”, and the application server A and the database B There is a link 2101 between the application server A1 and the database B1, which has the same attribute as the link between them. Therefore, since the link fitness is “60”, the fitness P of the partial configuration information shown in FIG. 22 is 20 + 20 + 60 = 100.

図２３は、シンプトン生成時に付加されている部分構成情報のリンクがシステム構成情報に間接的に存在する場合の適合度算出例を示す模式図である。図２３の例では、同一のコンポーネントであるアプリケーションサーバＡ１とデータベースＢ１、アプリケーションサーバＡ２とデータベースＢ２が存在することから、コンポーネントの適合度はそれぞれ‘２０’である。一方、アプリケーションサーバＡとデータベースＢとの間のリンクと同一属性であるリンクは存在しないものの、アプリケーションサーバＡ１とデータベースＢ１とが、その他のコンポーネントＣ１を介するリンク２３０１が存在する。したがって、これらのリンクの適合度Ｐ２は０．６×６０＝３６となる。したがって、図２３に示す部分構成情報の適合度Ｐは２０＋２０＋３６＝７６となる。 FIG. 23 is a schematic diagram illustrating a fitness calculation example in a case where a link of partial configuration information added at the time of symptom generation indirectly exists in the system configuration information. In the example of FIG. 23, since the application server A1 and the database B1 and the application server A2 and the database B2 that are the same components exist, the fitness of the components is “20”. On the other hand, although there is no link having the same attribute as the link between the application server A and the database B, the application server A1 and the database B1 have a link 2301 through the other component C1. Therefore, the fitness P2 of these links is 0.6 × 60 = 36. Therefore, the fitness P of the partial configuration information shown in FIG. 23 is 20 + 20 + 36 = 76.

図２４は、シンプトン生成時に付加されている部分構成情報と同一のコンポーネントは存在するが、リンクが存在しない場合の適合度算出例を示す模式図である。図２４の例では、同一のコンポーネントであるアプリケーションサーバＡ１とデータベースＢ１、アプリケーションサーバＡ２とデータベースＢ２が存在することから、コンポーネントの適合度はそれぞれ‘２０’である。一方、アプリケーションサーバＡとデータベースＢとの間のリンクと同一属性であるリンクは存在せず、アプリケーションサーバＡ１とデータベースＢ２との間のリンク２１０３が存在する。したがって、これらのリンクの適合度Ｐ２は０となるので、図２４に示す部分構成情報の適合度Ｐは２０＋２０＋０＝４０となる。 FIG. 24 is a schematic diagram illustrating a fitness calculation example in the case where the same component as the partial configuration information added at the time of symptom generation exists but no link exists. In the example of FIG. 24, since the application server A1 and the database B1 and the application server A2 and the database B2 which are the same components exist, the compatibility of the components is “20”. On the other hand, there is no link having the same attribute as the link between the application server A and the database B, and there is a link 2103 between the application server A1 and the database B2. Accordingly, since the fitness P2 of these links is 0, the fitness P of the partial configuration information shown in FIG. 24 is 20 + 20 + 0 = 40.

以上のように本実施の形態３によれば、部分構成情報をシンプトンに付加することにより障害が発生したイベントの検出精度を高く維持しつつ、抽出された部分構成情報と、イベント群を特定するのに適用したシンプトンに含まれている部分構成情報との適合度を算出することができるので、ユーザの熟練度合に左右されることなく客観的に検出された結果を評価することができる。また、部分構成情報とともに、算出された適合度も提示することができるので、ユーザの熟練度合に左右されることなく、検出された結果を客観的に評価することができる。 As described above, according to the third embodiment, by adding the partial configuration information to the symptom, the extracted partial configuration information and the event group are specified while maintaining high detection accuracy of the event in which the failure has occurred. Since the degree of matching with the partial configuration information included in the symptom applied to the above can be calculated, the objectively detected result can be evaluated without being influenced by the degree of skill of the user. Further, since the calculated fitness can be presented together with the partial configuration information, the detected result can be objectively evaluated without being influenced by the skill level of the user.

（実施の形態４）
本発明の実施の形態４に係る障害イベントの検出を支援する装置を含む障害イベント検出装置１の構成は、実施の形態１乃至３と同様であることから、同一の符号を付することにより詳細な説明は省略する。本実施の形態４は、検出結果の正誤に関する情報とともに部分構成情報をシンプトンに付加する点で実施の形態１乃至３と相違する。 (Embodiment 4)
The configuration of the failure event detection apparatus 1 including the apparatus that supports the detection of the failure event according to the fourth embodiment of the present invention is the same as that of the first to third embodiments. The detailed explanation is omitted. The fourth embodiment is different from the first to third embodiments in that the partial configuration information is added to the symptom together with the information regarding the correctness of the detection result.

図２５は、本発明の実施の形態４に係る障害イベント検出装置１の機能ブロック図である。構成情報抽出部２０１は、監視対象システム２００に含まれるコンポーネント間の関連情報を含むシステム構成情報を抽出して、構成情報記憶部１３２に記憶する。コンポーネント間の関連情報を含むシステム構成情報とは、例えばコンポーネント間の通信における接続関係情報、操作・非操作の関係に関するリンク関係情報等である。なお、構成情報抽出部２０１は本発明に必須の構成要件ではなく、事前に構成情報記憶部１３２にシステム構成情報を生成しておいても良く、障害イベント検出装置１内に内蔵していてもいなくても良い。すなわち、構成情報抽出部２０１及び構成情報記憶部１３２は、本発明の実施の形態４に係る障害イベント検出装置１の必須の構成要件ではない。 FIG. 25 is a functional block diagram of the failure event detection apparatus 1 according to Embodiment 4 of the present invention. The configuration information extraction unit 201 extracts system configuration information including related information between components included in the monitoring target system 200 and stores the extracted system configuration information in the configuration information storage unit 132. The system configuration information including related information between components is, for example, connection relationship information in communication between components, link relationship information regarding a relationship between operation and non-operation, and the like. Note that the configuration information extraction unit 201 is not an essential configuration requirement of the present invention, and system configuration information may be generated in the configuration information storage unit 132 in advance, or may be built in the failure event detection apparatus 1. It is not necessary. That is, the configuration information extraction unit 201 and the configuration information storage unit 132 are not essential configuration requirements of the failure event detection apparatus 1 according to Embodiment 4 of the present invention.

本発明の実施の形態４に係る障害イベント検出装置１における部分構成情報を含むシンプトンの構成は実施の形態１と同様であり、上述した図３（ｂ）に示す構成と同じである。すなわち、従来のシンプトンにコンポーネントであるアプリケーションサーバＡとデータベースＢとの依存関係に関する情報を追加することにより、障害発生イベントの検出のためのシンプトンとしてコンポーネント間の依存関係をも考慮に入れることができ、より障害が発生したイベントの検出精度を高めることができる。 The configuration of the symptom including the partial configuration information in the failure event detection apparatus 1 according to the fourth embodiment of the present invention is the same as that of the first embodiment, and is the same as the configuration shown in FIG. That is, by adding information on the dependency relationship between the application server A and the database B, which are components, to the conventional symptom, the dependency relationship between components can be taken into account as a symptom for detecting a failure event. Therefore, the detection accuracy of the event where the failure has occurred can be improved.

本発明の実施の形態４では、上述した部分構成情報が付加されたシンプトンと、従来のように部分構成情報が付加されていないシンプトンとがシンプトンデータベース１３１に混在して記憶されていることを前提とする。したがって、部分構成情報が付加されているシンプトンを適用する場合には、上述した実施の形態１乃至３と同様の構成とすることにより、同等の効果を奏することは言うまでもない。 In the fourth embodiment of the present invention, it is assumed that the symptom to which the partial configuration information is added and the symptom to which the partial configuration information is not added as in the past are stored in the symptom database 131 in a mixed manner. And Therefore, when applying a symptom to which partial configuration information is added, it goes without saying that an equivalent effect can be obtained by adopting the same configuration as in the first to third embodiments.

そこで本実施の形態４では、実施の形態３と同様、イベント群特定部１６０１が、履歴情報収集部２０３で収集された履歴情報及びシンプトンデータベース１３１に記憶されているシンプトンに基づいて、記憶されているシンプトンに適合するイベント群を特定し、抽出部１６０２が、イベント群特定部１６０１で特定されたイベント群それぞれを送出したコンポーネントと他のコンポーネントとの間の関連情報を含む部分構成情報を抽出する。 Therefore, in the fourth embodiment, as in the third embodiment, the event group specifying unit 1601 is stored based on the history information collected by the history information collecting unit 203 and the symptom stored in the symptom database 131. The event group matching the symptom is identified, and the extraction unit 1602 extracts partial configuration information including related information between the component that transmitted each event group identified by the event group identification unit 1601 and other components. .

そして、イベント検出部２０６は、抽出部１６０２で抽出された部分構成情報とイベント群特定の基礎となったシンプトンを適用して、障害が発生したイベントを検出する。正誤情報取得部２５０１は、イベント検出部２０６における検出結果が正しいか否か、すなわち障害が発生したイベントが正常に検出されたか否かに関する正誤情報を取得する。 Then, the event detection unit 206 detects the event in which the failure has occurred by applying the partial configuration information extracted by the extraction unit 1602 and the symptom that is the basis for specifying the event group. The correct / incorrect information acquisition unit 2501 acquires correct / incorrect information regarding whether or not the detection result in the event detection unit 206 is correct, that is, whether or not an event in which a failure has occurred is normally detected.

更新部２５０２は、取得した正誤情報及び抽出された部分構成情報を、イベント群特定の基礎となったシンプトンの付加情報として該シンプトンを更新する。これにより、部分構成情報を有していないシンプトンに対して、正誤情報とともに部分構成情報を付加することができる。 The updating unit 2502 updates the symptom using the acquired correct / incorrect information and the extracted partial configuration information as additional information of the symptom that is the basis of event group identification. Thereby, partial configuration information can be added together with correct / incorrect information to a symptom having no partial configuration information.

正誤情報取得部２５０１は、ユーザによる正誤判断結果の入力を受け付ける正誤情報受付部２５０３であっても良い。この場合、画面上に表示された障害発生イベントが正しいか否かを判断した結果を、例えば「確認」ボタン又は「修正」ボタン等のマウス等によるクリック操作により受け付ける。 The correct / incorrect information acquisition unit 2501 may be the correct / incorrect information reception unit 2503 that receives an input of the correct / incorrect determination result by the user. In this case, the result of determining whether or not the failure occurrence event displayed on the screen is correct is received by a click operation using a mouse or the like such as a “confirm” button or a “correct” button.

あるいは、実施の形態３と同様、適合度を算出して、所定値より大きいか否かに応じて正誤判断をしても良い。この場合、適合度算出部２５０４は、抽出部１６０２で抽出された部分構成情報と、イベント群特定部１６０１でイベント群を特定するのに用いたシンプトンデータベース１３１に記憶されているシンプトンに含まれている部分構成情報との適合度を算出する。適合度が高いシンプトンが検出されている場合には、障害判別における誤検出の可能性が低いものと判断することができる。また、ユーザによるシンプトン適用の熟練度合に左右されること無く、一定の精度で障害判別を行うことができる。すなわち、イベント検出部２０６は、適合度が高いシンプトンを適用して、障害が発生したイベントを検出することにより、誤検出の可能性を低減し、障害発生イベントの検出精度を高めることができる。 Alternatively, as in the third embodiment, the degree of conformity may be calculated and a correct / incorrect determination may be made according to whether the value is greater than a predetermined value. In this case, the fitness calculation unit 2504 is included in the partial configuration information extracted by the extraction unit 1602 and the symptom stored in the symptom database 131 used to identify the event group by the event group identification unit 1601. The degree of matching with the existing partial configuration information is calculated. When a symptom having a high fitness is detected, it can be determined that the possibility of erroneous detection in failure determination is low. In addition, it is possible to determine a failure with a certain accuracy without being influenced by the skill level of symptom application by the user. That is, the event detection unit 206 can reduce the possibility of erroneous detection and increase the detection accuracy of a failure occurrence event by detecting a failure occurrence event by applying a symptom having a high degree of fitness.

適合度判断部２５０５は、算出した適合度が所定値より大きいか否かを判断し、所定値より大きいと判断した場合、更新部２５０２は、抽出部１６０２で抽出された部分構成情報に正常に検出された旨を示す情報を付加した正常検出構成情報を、所定値以下であると判断した場合、更新部２５０２は、抽出部１６０２で抽出された部分構成情報に誤検出された旨を示す情報を付加した誤検出構成情報を、それぞれシンプトンに付加してシンプトンデータベース１３１を更新する。これにより、正常に検出された旨を示す情報を付加されている部分構成情報を有するシンプトンを優先して適用することで、障害発生イベントをより精度良く検出することができる。 The fitness level determination unit 2505 determines whether or not the calculated fitness level is greater than a predetermined value. If the fitness level determination unit 2505 determines that the fitness level is greater than the predetermined value, the update unit 2502 normally adds the partial configuration information extracted by the extraction unit 1602. When it is determined that the normal detection configuration information to which the information indicating detection is added is equal to or less than a predetermined value, the update unit 2502 indicates that the partial configuration information extracted by the extraction unit 1602 is erroneously detected. The symptom database 131 is updated by adding the misdetection configuration information to which the symptom is added to the symptom. Accordingly, the failure occurrence event can be detected with higher accuracy by preferentially applying the symptom having the partial configuration information to which the information indicating that the detection is normally performed is added.

図２６は、本発明の実施の形態４に係る障害イベント検出装置１のＣＰＵ１１の障害検出処理の手順を示すフローチャートである。障害イベント検出装置１のＣＰＵ１１は、監視対象システム２００に含まれるコンポーネント間の関連情報を含むシステム構成情報を取得する（ステップＳ２６０１）。もちろん、事前にシステム構成情報を取得して、構成情報記憶部１３２に記憶しておいても良い。 FIG. 26 is a flowchart showing a procedure of failure detection processing of the CPU 11 of the failure event detection apparatus 1 according to Embodiment 4 of the present invention. The CPU 11 of the failure event detection apparatus 1 acquires system configuration information including related information between components included in the monitoring target system 200 (step S2601). Of course, the system configuration information may be acquired in advance and stored in the configuration information storage unit 132.

ＣＰＵ１１は、監視対象システム２００のログ情報及び／又は障害発生時に各コンポーネントから出力される障害情報を含む履歴情報を収集し（ステップＳ２６０２）、履歴情報記憶部１３３に記憶する（ステップＳ２６０３）。ＣＰＵ１１は、収集された履歴情報及びシンプトンデータベース１３１に記憶されているシンプトンに基づいて、記憶されているシンプトンに適合するイベント群を特定する（ステップＳ２６０４）。 The CPU 11 collects log information of the monitoring target system 200 and / or history information including failure information output from each component when a failure occurs (step S2602) and stores the history information in the history information storage unit 133 (step S2603). Based on the collected history information and the symptom stored in the symptom database 131, the CPU 11 identifies an event group that matches the stored symptom (step S2604).

ＣＰＵ１１は、特定されたイベント群それぞれを送出したコンポーネントと他のコンポーネントとの間の関連情報を含む部分構成情報を抽出し（ステップＳ２６０５）、監視対象システムに発生した障害に適合したシンプトンが正常に検出されているか否かの判断結果の入力を受け付ける（ステップＳ２６０６）。ＣＰＵ１１は、受け付けた判断結果が正常に検出された旨を示す判断結果であるか否かを判断する（ステップＳ２６０７）。 The CPU 11 extracts the partial configuration information including the related information between the component that transmitted each of the identified event groups and the other components (step S2605), and the symptom that matches the failure that has occurred in the monitoring target system is normal. An input of a determination result as to whether or not it has been detected is received (step S2606). The CPU 11 determines whether or not the received determination result is a determination result indicating that it has been detected normally (step S2607).

ＣＰＵ１１が、受け付けた判断結果が正常に検出された旨を示すと判断した場合（ステップＳ２６０７：ＹＥＳ）、ＣＰＵ１１は、正常に検出された旨を示す情報を該部分構成情報に付加した正常検出構成情報を、適用したシンプトンに付加することでシンプトンデータベース１３１を更新する（ステップＳ２６０８）。ＣＰＵ１１が、受け付けた判断結果が正常に検出されていない旨を示すと判断した場合（ステップＳ２６０７：ＮＯ）、ＣＰＵ１１は、誤検出である旨を示す情報を該部分構成情報に付加した誤検出構成情報を、適用したシンプトンに付加することでシンプトンデータベース１３１を更新する（ステップＳ２６０９）。これにより、検出結果が正常に検出されたか否かを示す情報とともに部分構成情報を付加したシンプトンを生成することができる。 When the CPU 11 determines that the received determination result indicates that it has been detected normally (step S2607: YES), the CPU 11 adds information indicating that it has been detected normally to the partial configuration information. The symptom database 131 is updated by adding the information to the applied symptom (step S2608). When the CPU 11 determines that the received determination result indicates that it has not been detected normally (step S2607: NO), the CPU 11 adds an error detection information to the partial configuration information. The symptom database 131 is updated by adding information to the applied symptom (step S2609). Thereby, it is possible to generate a symptom to which partial configuration information is added together with information indicating whether or not the detection result is normally detected.

このように、シンプトンの生成時には部分構成情報が付加されていないシンプトンであっても、正常検出構成情報が新たに付加された場合には、該シンプトンを適用することにより障害発生イベントを正しく検出できることが保証される。また、誤検出構成情報が新たに付加された場合には、該シンプトンの適用の可能性を減じることにより、障害発生イベントを正しく検出する可能性を高めることができる。 Thus, even if a symptom is not added with partial configuration information at the time of symptom generation, when a normal detection configuration information is newly added, a fault occurrence event can be detected correctly by applying the symptom. Is guaranteed. In addition, when erroneous detection configuration information is newly added, the possibility of correctly detecting a failure event can be increased by reducing the possibility of applying the symptom.

また、実施の形態２と同様に、取得したシステム構成情報と、シンプトンに付加された正常検出構成情報又は誤検出構成情報とを比較して、記憶されている部分構成情報ごとに両者の一致度を算出することにより、適用するシンプトンの優先順位をつけることができる。この場合、実施の形態２と同様、一致度算出部７０１及びシンプトン抽出部７０２を設け、ＣＰＵ１１は、構成情報取得部２０２で取得したシステム構成情報と、更新部２５０２でシンプトンに付加して記憶されている正常検出構成情報又は誤検出構成情報とを比較して、正常検出構成情報又は誤検出構成情報ごとに両者の一致度を算出する。 Similarly to the second embodiment, the acquired system configuration information is compared with normal detection configuration information or false detection configuration information added to the symptom, and the degree of coincidence between the two is stored for each stored partial configuration information. , It is possible to prioritize the symptom to be applied. In this case, as in the second embodiment, a coincidence degree calculation unit 701 and a symptom extraction unit 702 are provided, and the CPU 11 stores the system configuration information acquired by the configuration information acquisition unit 202 and the symptom added by the update unit 2502. The normal detection configuration information or the erroneous detection configuration information is compared, and the degree of coincidence between the two is calculated for each normal detection configuration information or erroneous detection configuration information.

正常検出構成情報については、算出された一致度が高いシンプトンの適用優先順位を高くしてシンプトンを適用することにより、障害が発生したイベントの検出精度を高めることができる。誤検出構成情報については、算出された一致度が高いシンプトンの適用優先順位を下げることにより、障害が発生したイベントの検出精度を高めることができる。また、誤検出構成情報であって、一致度が高いシンプトンについては、あえてシンプトンとして適用した場合の検出結果を誤検出用の検出結果として活用することもできる。 With respect to the normal detection configuration information, by applying the symptom by increasing the application priority of the symptom having a high degree of coincidence, the detection accuracy of the event in which the failure has occurred can be improved. For erroneous detection configuration information, the detection accuracy of an event in which a failure has occurred can be increased by lowering the application priority of a symptom having a high degree of coincidence. Further, for a symptom that is erroneous detection configuration information and has a high degree of coincidence, the detection result when applied as a symptom can be used as a detection result for erroneous detection.

また、誤検出構成情報が付加されている場合には、実施の形態３と同様に検出結果の適合度を算出してシンプトン適用可能性を減じることもできる。例えばＣＰＵ１１は、抽出部１６０２にて抽出された部分構成情報と、イベント群特定部１６０１でイベント群を特定するのに適用されたシンプトンに含まれている部分構成情報との適合度を適合度算出部２５０４で算出し、適合度判断部２５０５にて算出された適合度が所定値より大きいと判断した場合には誤検出構成情報と判断して、例えば評価値を減算する、優先順位を下げる等の処理を行うことにより、適合度が高いシンプトンを障害発生イベント検出のために適用するシンプトンの候補から確実に外すことができる。結果として障害発生イベントの検出精度を高めることができる。 Further, when false detection configuration information is added, the suitability of the detection result can be reduced by calculating the fitness of the detection result as in the third embodiment. For example, the CPU 11 calculates the fitness between the partial configuration information extracted by the extraction unit 1602 and the partial configuration information included in the symptom applied to specify the event group by the event group specifying unit 1601. When the fitness level calculated by the unit 2504 and the fitness level calculated by the fitness level determination unit 2505 is determined to be larger than the predetermined value, it is determined as misdetected configuration information, for example, the evaluation value is subtracted, the priority is lowered, etc. By performing this process, it is possible to reliably remove a symptom having a high fitness from a symptom candidate to be applied for detecting a failure event. As a result, it is possible to improve the detection accuracy of the failure event.

一のシンプトンに対して、正常検出構成情報及び誤検出構成情報は複数付加されても良い。図２７は、正常検出構成情報及び誤検出構成情報が付加されたシンプトンデータベース１３１のデータ構成の例示図である。図２７に示すように、正常検出構成情報２７１が複数個、誤検出構成情報２７２が複数個、それぞれ一のシンプトン２７３に対応付けて付加されている。 A plurality of normal detection configuration information and erroneous detection configuration information may be added to one symptom. FIG. 27 is a view showing an example of the data configuration of the symptom database 131 to which normal detection configuration information and erroneous detection configuration information are added. As shown in FIG. 27, a plurality of normal detection configuration information 271 and a plurality of erroneous detection configuration information 272 are added in association with one symptom 273, respectively.

例えば複数の正常検出構成情報２７１、２７１、・・・が一のシンプトンに対応付けられている場合、正常検出構成情報２７１、２７１、・・・の各々に対して付加された回数を計数する計数部（図示せず）を備え、付加された回数が多い部分構成情報を優先的に適用する。このようにすることで、正常に検出したと判断された回数の多い部分構成情報を優先的に適用することができ、障害が発生したイベントの検出精度を高めることができる。 For example, when a plurality of normal detection configuration information 271, 271,... Are associated with one symptom, the count for counting the number of times added to each of the normal detection configuration information 271, 271,. A part (not shown) is provided, and partial configuration information with a large number of additions is preferentially applied. By doing so, it is possible to preferentially apply the partial configuration information that is determined to have been normally detected, and to improve the detection accuracy of the event in which the failure has occurred.

一方、複数の誤検出構成情報２７２、２７２、・・・が一のシンプトンに対応付けられている場合、誤検出構成情報２７２、２７２、・・・の各々に対して付加された回数を計数する計数部（図示せず）を備え、付加された回数が多い部分構成情報については適用優先順位を下げる等の処理を行うことにより、誤検出であると判断された回数の多い部分構成情報を適用する可能性を減じることができ、障害が発生したイベントの検出精度を高めることができる。 On the other hand, when a plurality of erroneous detection configuration information 272, 272,... Is associated with one symptom, the number of times added to each of the erroneous detection configuration information 272, 272,. With a counting unit (not shown), partial configuration information with a large number of times determined to be erroneous detection is applied to partial configuration information with a large number of added times by performing processing such as lowering the application priority. The possibility of failure can be reduced, and the detection accuracy of an event in which a failure has occurred can be increased.

以上のように本実施の形態４によれば、部分構成情報をシンプトンが保有していない場合であっても、障害発生イベントの検出結果に応じて部分構成情報をシンプトンに付加することができる。したがって、付加された部分構成情報により、どのシンプトンを優先して用いるべきか容易に判断することができ、障害発生イベントの検出精度の向上を図ることができる。また、誤って障害発生イベントを検出した場合の部分構成情報も記憶しておくことにより、誤検出時に用いられた部分構成情報との適合度も提示することができ、適合度に基づくシンプトン適用の順位付けをより精度良く行うことが可能となる。 As described above, according to the fourth embodiment, even when the symptom does not hold the partial configuration information, the partial configuration information can be added to the symptom according to the detection result of the failure occurrence event. Therefore, it is possible to easily determine which symptom should be preferentially used based on the added partial configuration information, and it is possible to improve the detection accuracy of the failure event. Also, by storing the partial configuration information when a fault occurrence event is detected by mistake, it is possible to present the degree of conformity with the partial configuration information used at the time of erroneous detection. Ranking can be performed with higher accuracy.

なお、本発明は上記実施例に限定されるものではなく、本発明の趣旨の範囲内であれば多種の変更、改良等が可能である。例えばネットワークを介して本実施の形態に係る障害イベント検出装置に接続されている外部のコンピュータの記憶装置に、シンプトンデータベース、構成情報記憶部、履歴情報記憶部を備え、必要に応じて各種情報を読み書きするようにしても良い。 The present invention is not limited to the above-described embodiments, and various changes and improvements can be made within the scope of the present invention. For example, a storage device of an external computer connected to the failure event detection device according to the present embodiment via a network includes a symptom database, a configuration information storage unit, and a history information storage unit, and various types of information are stored as necessary. You may make it read and write.

本発明の実施の形態１に係る障害イベント検出装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the failure event detection apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る障害イベント検出装置の機能ブロック図である。It is a functional block diagram of the failure event detection apparatus according to Embodiment 1 of the present invention. 本発明の実施の形態１に係る障害イベント検出装置における部分構成情報を含むシンプトンの構成の例示図である。It is an illustration figure of the structure of the symptom including the partial structure information in the failure event detection apparatus which concerns on Embodiment 1 of this invention. 部分構成情報提示部により表示装置に提示される画面の例示図である。It is an illustration figure of the screen shown on a display apparatus by the partial structure information presentation part. 記憶される部分構成情報の例示図である。It is an illustration figure of the partial structure information memorize | stored. 本発明の実施の形態１に係る障害イベント検出装置のＣＰＵの部分構成情報の付加処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the addition process of the partial structure information of CPU of the failure event detection apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態２に係る障害イベント検出装置の機能ブロック図である。It is a functional block diagram of the failure event detection apparatus which concerns on Embodiment 2 of this invention. 部分構成情報提示部により表示装置に提示される画面の例示図である。It is an illustration figure of the screen shown on a display apparatus by the partial structure information presentation part. 本発明の実施の形態２に係る障害イベント検出装置のＣＰＵのシンプトン抽出処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the symptom extraction process of CPU of the failure event detection apparatus which concerns on Embodiment 2 of this invention. 本発明の実施の形態２に係る障害イベント検出装置のＣＰＵの一致度の算出処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the calculation process of the matching degree of CPU of the failure event detection apparatus which concerns on Embodiment 2 of this invention. 本発明の実施の形態２に係る障害イベント検出装置のＣＰＵの一致度の算出処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the calculation process of the matching degree of CPU of the failure event detection apparatus which concerns on Embodiment 2 of this invention. 部分構成情報と同一の構成がシステム構成情報に存在する場合の一致度算出例を示す模式図である。It is a schematic diagram which shows the coincidence degree calculation example when the same configuration as the partial configuration information exists in the system configuration information. 部分構成情報のリンクがシステム構成情報に間接的に存在する場合の一致度算出例を示す模式図である。It is a schematic diagram which shows the example of a coincidence calculation when the link of partial structure information exists indirectly in system structure information. 部分構成情報と同一のコンポーネントは存在するが、リンクが存在しない場合の一致度算出例を示す模式図である。It is a schematic diagram which shows the example of a coincidence calculation when the component same as partial structure information exists, but a link does not exist. 部分構成情報と同一のコンポーネントが存在しない場合の一致度算出例を示す模式図である。It is a schematic diagram which shows the coincidence degree calculation example when the component same as partial structure information does not exist. 本発明の実施の形態３に係る障害イベント検出装置の機能ブロック図である。It is a functional block diagram of the failure event detection apparatus which concerns on Embodiment 3 of this invention. 提示部により表示装置に提示される画面の例示図である。It is an illustration figure of the screen shown on a display apparatus by a presentation part. 本発明の実施の形態３に係る障害イベント検出装置のＣＰＵの障害検出処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the failure detection process of CPU of the failure event detection apparatus which concerns on Embodiment 3 of this invention. 本発明の実施の形態３に係る障害イベント検出装置のＣＰＵの適合度の算出処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the calculation process of the matching degree of CPU of the failure event detection apparatus which concerns on Embodiment 3 of this invention. 本発明の実施の形態３に係る障害イベント検出装置のＣＰＵの適合度の算出処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the calculation process of the matching degree of CPU of the failure event detection apparatus which concerns on Embodiment 3 of this invention. シンプトンに付加されている部分構成情報と、システム構成情報との適合度の概念を説明する模式図である。It is a schematic diagram explaining the concept of the adaptability of the partial configuration information added to the symptom and the system configuration information. シンプトン生成時に付加されている部分構成情報と同一の構成がシステム構成情報に存在する場合の適合度算出例を示す模式図である。It is a schematic diagram which shows the adaptation calculation example in case the same structure as the partial structure information added at the time of symptom production | generation exists in system structure information. シンプトン生成時に付加されている部分構成情報のリンクがシステム構成情報に間接的に存在する場合の適合度算出例を示す模式図である。It is a schematic diagram which shows the example of a fitness calculation in case the link of the partial structure information added at the time of symptom production | generation exists indirectly in system structure information. シンプトン生成時に付加されている部分構成情報と同一のコンポーネントは存在するが、リンクが存在しない場合の一致度算出例を示す模式図である。It is a schematic diagram which shows the example of a coincidence calculation when the component same as the partial structure information added at the time of symptom generation exists but a link does not exist. 本発明の実施の形態４に係る障害イベント検出装置の機能ブロック図である。It is a functional block diagram of the failure event detection apparatus which concerns on Embodiment 4 of this invention. 本発明の実施の形態４に係る障害イベント検出装置のＣＰＵの障害検出処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the failure detection process of CPU of the failure event detection apparatus which concerns on Embodiment 4 of this invention. 正常検出構成情報及び誤検出構成情報が付加されたシンプトンデータベースのデータ構成の例示図である。It is an illustration figure of the data structure of the symptom database to which normal detection structure information and false detection structure information were added.

Explanation of symbols

１障害イベント検出装置
１１ＣＰＵ
１２メモリ
１３記憶装置
１４Ｉ／Ｏインタフェース
１５通信インタフェース
１６ビデオインタフェース
１７可搬型ディスクドライブ
１８内部バス
２３表示装置
９０可搬型記録媒体
１００コンピュータプログラム
１３１シンプトンデータベース
１３２構成情報記憶部
１３３履歴情報記憶部 1 Failure Event Detection Device 11 CPU
12 memory 13 storage device 14 I / O interface 15 communication interface 16 video interface 17 portable disk drive 18 internal bus 23 display device 90 portable recording medium 100 computer program 131 symptom database 132 configuration information storage unit 133 history information storage unit

Claims

A history information collection unit system log information Homata that collects historical information of the system including any of the failure information output from each component upon occurrence of a failure in the system that includes at least a plurality of components,
Symptom storage means for storing a symptom in which predetermined additional information is added to a detection rule for detecting an event included in the component related to the failure that has occurred,
An event group specifying means for specifying an event group that matches the symptom based on the collected history information and the stored symptom;
Extraction means for extracting partial configuration information, which is system configuration information including related information between the component that sent each of the event groups and other components, based on the identified event groups;
Correct / incorrect information acquisition means for acquiring correct / incorrect information regarding whether or not an event in which a failure has occurred has been normally detected based on the identified event group and the extracted partial configuration information;
An update unit that updates the symptom that is the basis for specifying the event group based on the acquired correct / incorrect information and the extracted partial configuration information.

Comprising system configuration information acquisition means for acquiring system configuration information which is configuration information of the system;
The symptom storage means adds partial configuration information as additional information to the symptom, which is system configuration information related to a component that has transmitted the event that is the basis of the stored symptom among the acquired system configuration information. The apparatus of claim 1.

Collecting history information of the system including either log information of a system including at least a plurality of components or failure information output from each component when a failure occurs in the system;
  Storing a symptom in which predetermined additional information is added to a detection rule for detecting an event included in the component related to the failure that has occurred;
  Identifying a group of events that match the symptom based on collected history information and stored symptoms;
  Extracting partial configuration information, which is system configuration information including related information between the component that sent each of the event groups and other components, based on the identified event groups;
  Acquiring correct / incorrect information regarding whether or not an event in which a failure has occurred has been normally detected based on the identified event group and the extracted partial configuration information;
  Updating the symptom that is the basis for the event group identification based on the acquired correct / incorrect information and the extracted partial configuration information;
  Including methods.

It can be run on a computer that assists in detecting a failed event in a system with multiple components,
  The computer,
  History information collection means for collecting history information of the system including at least either log information of the system or failure information output from each component when a failure occurs in the system;
  Symptom storage means for storing a symptom in which predetermined additional information is added to a detection rule for detecting an event included in the component related to the failure that has occurred,
  An event group specifying means for specifying an event group matching the symptom based on the collected history information and the stored symptom;
  Extraction means for extracting partial configuration information, which is system configuration information including related information between a component that has transmitted each event group and another component, based on the identified event group;
  Correct / incorrect information acquisition means for acquiring correct / incorrect information regarding whether or not an event in which a failure has occurred is normally detected based on the identified event group and the extracted partial configuration information, and
  Updating means for updating the symptom that is the basis for the event group identification based on the acquired correct / incorrect information and the extracted partial configuration information
  A computer program that functions as a computer program.

The computer,
Function as system configuration information acquisition means for acquiring system configuration information which is configuration information of the system,
The symptom storage means functions as means for adding, as additional information, partial configuration information, which is system configuration information related to a component that has transmitted a stored symptom-based event, among the acquired system configuration information. The computer program according to claim 4 to be caused.

History information collection means for collecting history information of the system including either log information of a system including at least a plurality of components or failure information output from each component when a failure occurs in the system;
  Symptom storage means for storing a symptom in which predetermined additional information is added to a detection rule for detecting an event included in the component related to the failure that has occurred,
  An event group specifying means for specifying an event group that matches the symptom based on the collected history information and the stored symptom;
  Extraction means for extracting partial configuration information, which is system configuration information including related information between the component that sent each of the event groups and other components, based on the identified event groups;
  A fitness calculation means for calculating a fitness between the partial configuration information extracted by the extraction means and the partial configuration information included in the symptom applied to specify the event group by the event group specification means; ,
  Determining means for determining whether the calculated fitness is greater than a predetermined value;
  When the determination means determines that the detected value is larger than the predetermined value, the normal detection configuration information obtained by adding information indicating that the partial detection information is normally detected to the partial configuration information extracted by the extraction means is determined to be equal to or less than the predetermined value. Update means for updating the symptom by adding, to the sympton, erroneous detection configuration information to which information indicating that the partial detection information is erroneously detected is added to the partial configuration information extracted by the extraction unit.
  A device comprising: