JP2004320267A

JP2004320267A - Fault notice apparatus and fault notice method

Info

Publication number: JP2004320267A
Application number: JP2003109679A
Authority: JP
Inventors: Shinji Nimase; 真司二間瀬
Original assignee: NEC Software Chubu Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 2003-04-15
Filing date: 2003-04-15
Publication date: 2004-11-11
Anticipated expiration: 2023-04-15
Also published as: JP3902564B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a fault notice apparatus and a fault notice method for relieving a load on a maintenance center receiving a notice of the occurrence of a fault in an apparatus to be supervised. <P>SOLUTION: The fault notice apparatus is provided with: an input section; a definition table; a filter; a notice section; and a registration section and automatically updates a fault type about which the fault noticed apparatus informs the maintenance center attended with the operation of the fault notice apparatus. Fault information about which the apparatus to be supervised informs the maintenance center includes a fault type for denoting a type of a caused fault and a fault occurrence time when the fault takes place. The input section collects fault information from the apparatus supervised by the maintenance center. The definition table defines the fault type notified to the maintenance center. The filter section decides the fault information notified to the maintenance center among fault information items received by the input section on the basis of the definition table. The notice section informs the maintenance center for supervising the apparatus supervised about the fault information in response to the result of the decision by the filter. The registration section extracts the fault type of the fault information notified to the maintenance center on the basis of the fault information items collected by the input section and updates the definition table depending on the extracted fault type. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、被監視装置と被監視装置を監視する保守センタとの間に設置され、被監視装置から保守センタに通報される障害情報を選別する障害通報装置に関し、障害情報を選別する障害通報方法に関する。
【０００２】
【従来の技術】
被監視装置から保守センタに障害を通報するシステムにおいて、保守センタの負担を軽減する技術として、被監視装置と保守センタの間にマネージャ（装置）を設置する技術は、従来から知られている。
【０００３】
特開平０７−１８３９３２号公報によれば、階層型ネットワーク管理システムにおいて、エージェントが発行するトラップ情報のうち、統合マネージャには、通信ネットワーク管理に必要なトラップ情報だけが通知されるようにした管理情報の通信方式が知られていた。この通信方式は、エージェントと統合マネージャとの間に設置した各サブマネージャが、フィルタリングするための条件を格納したフィルタＭＩＢと、トラップ情報受信時にフィルタＭＩＢ中のフィルタ条件を参照してトラップ情報のフィルタリングするフィルタ機能を有し、統合マネージャからの要求によりフィルタ条件を変更および追加を行う。よって、統合マネージャが受信するトラップ情報の数を削減することができ、統合マネージャの負担を軽減することができる。
【０００４】
特開２００１−３３１３５０号公報によれば、計算機システム内で発生する複数の軽微な事象の相互関係により引き起こされる重大な障害を予報出力できる保守管理装置に関する技術が知られていた。この技術は、所定の時間差内で発生する複数の動作状況の組み合わせと起こり得る障害結果との関連を予め関連データベースに格納し、計算機システム内に発生した動作状況をその発生時刻情報とともに動作状況記録ファイルに記録する動作記録装置、動作状況記録ファイルを監視して、複数の動作状況の組み合わせが上記関連データベースに存在し、それらの発生時刻情報の差が上記所定の時間差内であれば、上記関連データベースに格納された今後起こり得る障害結果を表示装置に予報出力する記録監視装置を備えている。
【０００５】
また、特開２０００−２４２６２５号公報によれば、特徴の抽出に関して、文字数値属性が混在する分析対象の元データに対して目的属性がないような場合でもデータの特徴を抽出して視覚化対象属性を適確に選択し得る相関ルールを利用した属性選択する技術が知られている。この技術は、トランザクション形式のデータが入力されると、相関ルールが抽出され、この抽出されたすべての相関ルールに対して確信度Ｃｉと支持度Ｓｉを乗じた値Ｍｉを計算し、この値Ｍｉが大きい相関ルールから順に選択し、この選択した相関ルールから本体、頭部の順でアイテムを選択し、本体または頭部に複数のアイテムがある場合には、出現数の一番多いアイテムから順に選択し、この選択したアイテムが属する属性を順に選択する。
【０００６】
発生した障害の原因診断に関する技術について、特開平０６−１４９５７７号公報によれば、機器および通信路の物理的故障、誤操作、プログラムの不具合、パラメータ値の設定不備による誤動作などによる障害の診断を効率よく行えるようにする技術が知られていた。この技術は、機器および通信路に生じる障害の診断を行う障害診断方法および装置において、障害状態に対応させて記述した診断データを複数個のノードとして階層的に配列する診断木により構成された診断知識ベースを設け、対処すべき障害に最も関連がある診断知識ベース中のノードを指定する診断ノード指定部と、指定されたノードに対応する内容を検査する検査部と、障害状態を示すノードの候補の中で最も確かな障害状態を示すノードを選択する検査結果判定部とを備えるものである。
【０００７】
また、特開平０７−２００４９９号公報によれば、主記憶装置のサイズを軽減し、かつ診断プログラムの実行時間を効率化し、迅速かつ正確な障害診断を遂行する障害診断装置に関する技術が知られていた。中央処理装置に、入力装置、表示装置、記憶装置が接続され、中央処理装置に、障害を診断する診断装置を備える。記憶装置には、障害状態に対応させて記述した診断データを複数個設ける。診断装置には、知識ベースの障害状態を示すノードの中で対処すべき障害で関連ある診断木とノードを選択する診断木選択部と、指定されたノードに記述されている内容に従って検査を行う検査部と、障害状態を示すノードの中で最も確からしい障害状態を示すノードを検査結果に基づいて選択し、その結果を診断木指示部、あるいは表示装置に表示する検査結果判定部とを有するものである。
【０００８】
さらに、特開平０８−０７７２６０号公報によれば、通報された障害情報に対応する障害対策支援システムに関する技術が知られていた。この技術は、顧客先機器で発生した障害の対策に要する一連の作業手順を的確に誘導し、支援し得る構成の障害対策支援システムを提供するものである。顧客先機器から障害対策支援システムに通知された障害申告情報に基づいて障害対策受付処理部が各情報ファイルを検索し、検索結果に基づいて障害機器の正式機器型名を特定するとともに、障害原因、例えば不良部品の推定、部品の手配処理を行い、更に、関連部署端末制御部を通じて関連部署への連絡等を行う。その際、当該障害対策の処理者のための誘導情報、例えば次に行うべき作業、入力すべきデータの紹介等を端末表示誘導処理部で端末画面等へ表示し、必要なデータなどの入力を促すものである。
【０００９】
【特許文献１】
特開平０７−１８３９３２号公報
特開２００１−３３１３５０号公報
特開２０００−２４２６２５号公報
特開平０６−１４９５７７号公報
特開平０７−２００４９９号公報
特開平０８−０７７２６０号公報
【００１０】
【発明が解決しようとする課題】
本発明の目的は、被監視装置の障害を通報される保守センタの負担を軽減する障害通報装置および障害通報方法を提供することにある。
【００１１】
本発明の他の目的は、被監視装置の障害を選択通報する条件を人手ではなく自動的に設定する障害通報装置および障害通報方法を提供することにある。
【００１２】
また、本発明の他の目的は、被監視装置の障害を選択通報する条件をミスなく設定する障害通報装置および障害通報方法を提供することにある。
【００１３】
さらに、本発明の他の目的は、保守対応の時間を短縮する障害通報装置および障害通報方法を提供することにある。
【００１４】
【課題を解決するための手段】
以下に、［発明の実施の形態］で使用される番号・符号を用いて、課題を解決するための手段を説明する。これらの番号・符号は、［特許請求の範囲］の記載と［発明の実施の形態］との対応関係を明らかにするために付加されたものである。ただし、それらの番号・符号を、［特許請求の範囲］に記載されている発明の技術的範囲の解釈に用いてはならない。
【００１５】
本発明の観点によれば、障害通報装置（１１）は、入力部（２１）と、定義テーブル（３４、３５）、とフィルタ部（２２、２３）と、通報部（２４）と、登録部（２７、２８、２９、３１、３２）とを具備し、当該障害通報装置（１１）の稼働に伴って、保守センタ（１４）に通報すべき障害種別を自動的に更新する。被監視装置（１２）から保守センタ（１４）に通報される障害情報は、発生した障害の種類を示す障害種別と障害の発生した時刻を示す発生時刻とを含んでいる。入力部（２１）は、保守センタ（１４）によって監視される被監視装置（１２）から障害情報を収集する。定義テーブル（３４、３５）は、保守センタ（１４）に通報すべき障害種別を定義する。フィルタ部（２２、２３）は、定義テーブル（３４、３５）に基づいて入力部（２１）に入力した障害情報のうち、保守センタ（１４）に通報すべき障害情報を決定する。通報部（２４）は、フィルタ部（２２、２３）で決定した結果に応答して、被監視装置（１２）を監視する保守センタ（１４）に障害情報を通報する。登録部（２７、２８、２９、３１、３２）は、入力部（２１）で収集される障害情報に基づいて保守センタ（１４）に通報すべき障害情報の障害種別を抽出し、抽出した障害種別で定義テーブル（３５）を更新する。
【００１６】
本発明の障害通報装置（１１）において、登録部（２７、２８、２９、３１、３２）は、データ整形部（２７）と、相関ルール分析部（２８）と、通報定義登録部（２９）とを備えている。データ整形部（２７）は、障害情報が入力されるごとに関連通報ログ（３１）に障害情報を格納する。相関ルール分析部（２８）は、関連通報ログ（３１）から、障害の前後に発生する関連障害の関連障害種別を障害の障害種別に対応付けた相関ルールを抽出し、抽出した結果を相関ルールテーブル（３２）に格納する。通報定義登録部（２９）は、定義テーブル（３４、３５）と相関ルールテーブル（３２）とを参照して、保守センタ（１４）に通報すべき障害情報の障害種別を定義テーブル（３５）に登録する。
【００１７】
本発明の障害通報装置（１１）において、被監視装置（１２）に発生した第１障害の第１障害情報は、第１発生時刻と第１障害種別を含んでいる。第１発生時刻の前後の予め定められた時間Ｔ以内に発生した被監視装置（１２）の第２障害の第２障害情報は、第２障害種別を含んでいる。データ整形部（２７）は、第１障害種別と第１発生時刻と第２障害種別を対応付けて関連通報ログ（３１）に格納する。
【００１８】
本発明の障害通報装置（１１）において、相関ルール分析部（２８）は、関連通報ログ（３１）に格納される予め定められた数の障害情報から相関ルールを抽出する。相関ルールを抽出する障害情報の数を限定することによって、運用状態の変化などによる障害発生の変化に対応することができ、分析する標本数も確保できるので確度の向上も見込めるようになる。
【００１９】
本発明の障害通報装置（１１）において、相関ルール分析部（２８）は、関連通報ログ（３１）に格納される障害情報のうち、相関ルールを抽出する時刻から予め定められた時間以内に発生した障害の障害情報から相関ルールを抽出する。相関ルールを抽出する障害情報の時間を限定することによって、最近の運用状態における障害発生状況に合致した相関ルールを抽出できるようになる。
【００２０】
本発明の障害通報装置（１１）において、定義テーブル（３４、３５）は、通報定義テーブル（３４）と、相関通報定義テーブル（３５）とを含んでいる。通報定義テーブル（３４）は、保守センタ（１４）に通報すべき障害種別が予め登録されている。相関通報定義テーブル（３５）は、通報定義登録部（２９）によって保守センタ（１４）に通報すべき障害種別が登録される。フィルタ部（２２、２３）は、通報フィルタ部（２２）と、相関通報フィルタ部（２３）とを含んでいる。通報フィルタ部（２２）は、通報定義テーブル（３４）に登録されている障害種別の障害情報を通報部（２４）から保守センタ（１４）に通報させる。相関通報フィルタ部（２３）は、相関通報定義テーブル（３５）に登録されている障害種別の障害情報を通報部（２４）から保守センタ（１４）に通報させる。
【００２１】
本発明の障害通報装置（１１）において、相関通報定義テーブル（３５）に登録された障害種別の障害情報は、相関通報定義テーブル（３５）を参照して通報されたことを示すフラグが付加される。通報部（２４）は、フラグを付加された障害情報を保守センタ（１４）に通報する。
【００２２】
本発明の観点によれば、障害通報方法は、入力ステップと、フィルタステップと、通報ステップと、登録ステップとを具備し、当該方法を用いる障害通知装置（１１）の稼働に伴って、保守センタ（１４）に通報すべき障害種別を自動的に更新する。被監視装置（１２）から保守センタ（１４）に通報される障害情報は、発生した障害の種類を示す障害種別と障害の発生した時刻を示す発生時刻とを含んでいる。入力ステップは、保守センタ（１４）によって監視される被監視装置（１２）から障害情報を収集する。定義テーブル（３４、３５）は、保守センタ（１４）に通報すべき障害種別を定義する。フィルタステップは、定義テーブル（３４、３５）に基づいて入力ステップで入力した障害情報のうち、保守センタ（１４）に通報すべき障害情報を決定する。通報ステップは、フィルタステップで決定した結果に応答して、被監視装置（１２）を監視する保守センタ（１４）に障害情報を通報する。登録ステップは、入力ステップで収集される障害情報に基づいて保守センタ（１４）に通報すべき障害情報の障害種別を抽出し、抽出した障害種別で定義テーブル（３５）を更新する。
【００２３】
本発明の障害通報方法において、登録ステップは、データ整形ステップと、相関ルール分析ステップと、通報定義登録ステップとを備えている。データ整形ステップは、障害情報を入力されるごとに関連通報ログ（３１）に格納する。相関ルール分析ステップは、障害の前後に発生する関連障害の関連障害種別を障害の障害種別に対応付けた相関ルールを関連通報ログ（３１）から抽出し、抽出した結果を相関ルールテーブル（３２）に格納する。通報定義登録ステップは、定義テーブル（３４、３５）と相関ルールテーブル（３２）とを参照して、保守センタ（１４）に通報すべき障害情報の障害種別を定義テーブル（３５）に追加する。
【００２４】
本発明の障害通報方法において、被監視装置（１２）に発生した第１障害の第１障害情報は、第１発生時刻と第１障害種別を含んでいる。第１発生時刻の前後の予め定められた時間Ｔ以内に発生した被監視装置（１２）の第２障害の第２障害情報は、第２障害種別を含んでいる。データ整形ステップは、第１障害種別と第１発生時刻と第２障害種別を対応付けて関連通報ログ（３１）に格納する。
【００２５】
本発明の障害通報方法において、相関ルール分析ステップは、関連通報ログ（３１）に格納される予め定められた数の障害情報から相関ルールを抽出する。
【００２６】
本発明の障害通報方法において、相関ルール分析ステップは、関連通報ログ（３１）に格納される障害情報のうち、相関ルールを抽出する時刻から予め定められた時間以内に発生した障害の障害情報から相関ルールを抽出する。
【００２７】
本発明の障害通報方法において、定義テーブル（３４、３５）は、保守センタ（１４）に通報すべき障害種別が予め登録されている通報定義テーブル（３４）と、登録ステップによって保守センタ（１４）に通報すべき障害種別が登録される相関通報定義テーブル（３５）とを含んでいる。フィルタステップは、通報定義テーブル（３４）に登録されている障害種別の障害情報を通報ステップにより保守センタ（１４）に通報させる通報フィルタステップと、相関通報定義テーブル（３５）に登録されている障害種別の障害情報を通報ステップにより保守センタ（１４）に通報させる相関通報フィルタステップとを含んでいる。
【００２８】
本発明の障害通報方法において、相関通報定義テーブル（３５）に登録された障害種別の障害情報は、相関通報定義テーブル（３５）を参照して通報されたことを示すフラグが付加される。通報ステップは、フラグを付加された障害情報を保守センタ（１４）に通報する。フラグ付加により、通報された障害は、予め通報が定められている障害情報か、関連する障害として追加登録された障害情報であるかを区別することができ、障害分析等の保守対応の時間短縮が可能となる。
【００２９】
【発明の実施の形態】
本発明の障害通報装置を監視マネージャと称して本発明の実施の形態を説明する。図１は、本発明の監視マネージャを使用した障害監視システムの構成を示すブロック図である。障害監視システムは、被監視装置１２と監視マネージャ１１と保守センタ１４とを含んでいる。被監視装置１２は、障害が発生したことをその障害の程度によらず全て保守センタ１４に通報しようとする。通報する障害情報は、発生した障害の種類（例えば、「ディスク障害」、「データベースアクセス不能」など）を示す障害種別と、発生時刻とを含み、監視マネージャ１１に通報される。監視マネージャ１１は、通報された障害情報を選別し、保守センタ１４に通報して有効な障害情報のみ保守センタ１４に通報する。保守センタ１４は、通報された障害情報に基づいて、保守作業を行う。
【００３０】
図１では被監視装置１２は、１台しか図示していないが、複数台あってもよく、複数台の被監視装置１２から通報される障害情報を処理することによって以下に示す障害情報の選択条件を推定するサンプル数が増えるため確度の向上が期待できる。被監視装置１２ごとに障害情報を処理する形態にすると被監視装置１２ごとの個別状況に応じた対応が可能となる。
【００３１】
また、被監視装置１２の近傍と、遠隔地に保守センタ１４や保守端末などがある場合もあり、保守センタ１４も複数あってもよい。さらに、被監視装置１２と監視マネージャ１１、監視マネージャ１１と保守センタ１４の接続は、通信ネットワークによってもよい。
【００３２】
障害情報を選別して保守センタ１４に通報することによって、保守センタ１４では、被監視装置１２を保守する上で重要ではない障害情報が除去されているため、障害情報の解析などの負担が削減でき、保守時間の短縮が可能となる。
【００３３】
監視マネージャ１１は、ワークステーションなどに例示される情報処理装置であり、障害情報入力部２１、通報フィルタ部２２、相関通報フィルタ部２３、障害情報通報部２４、データ整形部２７、相関ルール分析部２８、通報定義登録部２９、関連通報ログ３１、相関ルールテーブル３２、通報定義テーブル３４、相関通報定義テーブル３５を具備している。
【００３４】
通報定義テーブル３４と相関通報定義テーブル３５は、障害情報を選択する条件を格納する定義テーブルである。これらの定義テーブルは、障害の種類を示す障害種別を一覧表形式で格納してある。定義テーブルに登録されている障害種別の障害情報は、保守センタ１４に通報される。定義テーブルに登録されていない障害種別の障害情報は、保守センタ１４に通報されずに破棄される。
【００３５】
通報定義テーブル３４は、被監視装置１２の保守に直接的に関係する障害の障害種別が予め登録されているテーブルである。登録されている障害種別は、固定されていて自動的に更新されることはない。保守運用上で必要となった障害種別の登録や不要となった障害種別の削除などのメンテナンスを保守者が行うことができることはいうまでもないが、登録や削除に誤設定が無いように十分注意して行うことが必要である。
【００３６】
相関通報定義テーブル３５は、被監視装置１２の保守に間接的に関係する障害の障害種別が登録されるテーブルである。被監視装置１２の保守に間接的に関係する障害とは、その障害自体は、軽微であって保守運用上で無視できるもので保守センタ１４に通報する必要はないが、通報定義テーブル３４に登録されている障害種別の障害の発生に関連すると推測される障害である。相関通報定義テーブル３５は、相関ルール分析部２８により分析された結果に基づいて通報定義登録部２９により自動的に更新される。
【００３７】
定義テーブルをこのように固定部分と可変部分に分離することにより、重要な固定部分が自動更新機能により誤設定されることを防止することができる。また、固定部分が明確になることによりメンテナンスにおける人為的ミスも防止する効果が期待できる。
【００３８】
障害情報入力部２１は、被監視装置１２とのインタフェースを備え、被監視装置１２から障害情報を収集する。収集した障害情報は、通報フィルタ部２２とデータ整形部２７に送られる。
【００３９】
通報フィルタ部２２と相関通報フィルタ部２３は、障害情報を選択するフィルタである。通報フィルタ２２は、障害情報入力部２１から障害情報を受け取り、通報定義テーブル３４に登録されている障害種別の障害情報を選択して障害情報通報部２４から保守センタ１４に通報させる。通報定義テーブル３４に登録されていない障害種別の障害情報は、相関通報フィルタ部２３に送られる。
【００４０】
相関通報フィルタ部２３は、通報フィルタ部２２から送られた障害情報から相関通報定義テーブル３５に登録されている障害種別の障害情報を選択して障害情報通報部２４から保守センタ１４に通報させる。相関通報定義テーブル３５に登録されていない障害種別の障害情報は、通報されないためここで破棄されることになる。通報される障害情報は、相関通報定義テーブル３５を参照して保守センタ１４に通報することを示すフラグが添付される。このフラグによって、保守センタ１４は、相関通報定義テーブル３５に登録された障害であることを識別して、被監視装置１２の予防保守などに役立てる。
【００４１】
これら障害情報のフィルタによって、軽微な障害などの被監視装置１２の保守に有効ではない障害は、保守センタ１４に通報されなくなり、保守センタ１４の負荷の軽減になる。
【００４２】
障害情報通報部２４は、保守センタ１４とのインタフェースを備え、通報フィルタ部２２と相関通報フィルタ部２３から送られる障害情報を保守センタ１４に送る。
【００４３】
データ整形部２７と相関ルール分析部２８と通報定義登録部２９と関連通報ログ３１と相関ルールテーブル３２は、障害情報を選択する条件を生成する登録部である。
【００４４】
データ整形部２７は、障害情報入力部２１から送られる障害情報を関連通報ログ３１に保存する。また、発生時刻から予め定められた時間Ｔの間に発生した過去の障害を関連通報ログ３１から抽出し、抽出した障害の関連する障害として障害情報入力部２１から受け取った障害情報の障害種別を関連通報ログ３１に登録する。その逆に、抽出した障害の障害種別を受け取った障害の関連障害として関連通報ログ３１に登録する。予め定められた時間Ｔは、全ての障害に対して一定の時間として以下の説明を行うが、障害の種類（障害種別）によって異なっていてもよい。障害種別ｎに対する異なる時間Ｔｎを閾値とした場合、それぞれの障害種別ｎにおいて関連する障害の発生する時間を調節できるため、相関ルール分析部２８で行われる分析の時間を短縮したり、相関のある障害の発生をより詳細に推定をすることができる。
【００４５】
関連通報ログ３１は、図２に示すように、障害情報入力部２１から送られる障害情報の障害種別と発生時刻に、関連する障害の障害種別をｎ個まで対応付けて格納されている。ここでは説明のため、関連障害の数を固定する表形式としたが、可変長のリスト形式で格納することもできる。可変長にすると各障害に対する関連障害の数が大幅に異なる場合には使用するメモリ領域などを節約できる。
【００４６】
相関ルール分析部２８は、定期的に関連通報ログ３１を分析して障害間の関連性のルールを抽出する。関連性のルールの分析方法は、相関ルール（ａｓｓｏｃｉａｔｉｏｎｒｕｌｅ）を用いるものとする。相関ルールとは、ある事象Ａが発生した場合に事象Ｂも発生するという事実を示すもので、一般的に次の式で記述される。
Ａ⇒Ｂ
ここでは、ある定められた時間Ｔを与え障害Ａと障害Ｂの相関をＡ⇒Ｂとして記述した場合、障害Ａが時刻ｔａに発生して、その前後の間隔Ｔの間に障害Ｂが発生したとき、つまり、
ｔａ−Ｔ＜ｔｂ＜ｔａ＋Ｔ
を満たす時刻ｔｂに障害Ｂが発生したとき、相関Ａ⇒Ｂを満たすとする。また、相関ルールＡ⇒Ｂ、Ｂ⇒Ａを満たす障害Ａ、Ｂを「障害Ａ、Ｂ間に関連がある」と定義する。
【００４７】
ルールの価値を表すものとして、確信度（ｃｏｎｆｉｄｅｎｃｅ）とサポート（ｓｕｐｐｏｒｔ）がある。確信度は、事象Ａが発生した数ａのうち、事象Ｂが発生した数ｂの割合であり、ｂ／ａで求められる。サポートは、Ａ⇒Ｂを満たすデータ数ｂの全データＮに対する割合でｂ／Ｎで求められる。例えば、図２のように障害が発生した場合、障害Ｃ⇒障害Ａの相関ルールを評価すると、確信度＝１／２＝５０％、サポート＝１／４＝２５％、となる。相関ルール分析部２８は、確信度とサポートの閾値を持ち、閾値以上である確信度とサポートの相関ルールを通報定義登録部２９に通知する。このように確信度とサポートを与えてそれ以上の確信度とサポートを持つ相関ルールを発見する手法は公知のものであり、アプリオリアルゴリズム等が知られている。
【００４８】
関連通報ログ３１に格納してある各障害間の確信度とサポートを計算した結果は、図３に示すような相関ルールテーブル３２に格納され、通報定義登録部２９に渡される。相関ルールテーブル３２は、障害種別Ａと、障害種別Ａに関連のある障害種別Ｂと、その確信度と、サポートとをまとめたものである。
【００４９】
通報定義登録部２９は、相関ルールテーブル３２に格納されている相関ルールに従い、通報定義テーブル３４を参照して通報すべき障害の障害種別を相関通報定義テーブル３５に登録する。この通報すべき障害は、通報定義テーブル３４に登録されている障害種別の障害に相関関係がある障害で、かつ通報定義テーブル３４に登録されていない障害である。つまり、相関通報定義テーブル３５に登録されていなければ保守センタ１４に通報されない障害であり、直接的に保守運用上で通報が必要となる障害ではなく、重要な障害の前兆を示すと予測される障害である。
【００５０】
本発明の監視マネージャ１１は、被監視装置１２から入力した障害情報を関連通報ログ３１に保存し、保存された障害情報から相関のある障害のルールを分析し、分析した相関ルールに基づいて定義テーブルを更新し、稼働にともなって更新されていく定義テーブルを参照して被監視装置１２から入力した障害情報を選別（フィルタリング）して保守センタ１４に通報する。その動作を以下に説明する。
【００５１】
被監視装置１２から入力した障害情報を関連通報ログ３１に保存する動作は、障害情報が入力されるごとに、図５のフローチャートに示されるような動作を繰り返す。図２に示す関連通報ログ３１に障害情報が蓄積される状態を、図４に示すように障害が発生したと想定して説明する。障害種別Ｃである障害１は、時刻ｔ１に発生し、その後時間Ｔ以内に障害種別Ａである障害２が時刻ｔ２に発生し、その後時間Ｔ以内に障害種別Ｂである障害３が時刻ｔ３に発生し、その後時間Ｔ以上経てから障害種別Ｃである障害４が時刻ｔ４に発生したものとする。
【００５２】
障害１が時刻ｔ１に発生し、被監視装置１２から障害情報が障害情報入力部２１に入力すると、データ整形部２７は、その障害情報から障害種別Ｃと発生時刻ｔ１を取り出し、関連通報ログ３１に登録する。障害１の発生時は、他に障害がないので登録するのみである。次に障害２が時刻ｔ２に発生し、被監視装置１２から障害情報が障害情報入力部２１に入力すると、データ整形部２７は、その障害情報から障害種別Ａと発生時刻ｔ２を取り出し、関連通報ログ３１に登録する。（ステップＳ１１）
【００５３】
関連障害を調べるため、関連通報ログ３１にある過去の障害情報を取り出す。障害２を登録するときは、関連通報ログ３１には障害１が登録されているので、障害１のデータを取り出す。（ステップＳ１３）
【００５４】
取り出したデータの発生時刻が、関連する障害とする時間Ｔ以内であるか比較する（ステップＳ１６）。障害１の発生時刻はｔ１であるから、障害２との時間差はＴ以内であり（ステップＳ１６−ＹＥＳ）、障害１と障害２は、関連する障害として登録する。障害２の障害種別Ａを障害１の関連障害種別１に、障害１の障害種別Ｃを障害２の関連障害種別１に、それぞれ関連通報ログ３１へ登録する。それ以上のデータは関連通報ログ３１に無いので障害２に対する処理は終わる。（ステップＳ１８）
【００５５】
次に障害３が時刻ｔ３に発生すると、まず、障害１、２と同様に関連通報ログ３１に障害３の障害種別Ｂと発生時刻ｔ３が登録される（ステップＳ１１）。関連通報ログ３１の過去の障害を検索し、時刻ｔ２に発生した障害種別Ａの障害２を抽出する（ステップＳ１３）。時刻ｔ２は時刻ｔ３から時間Ｔ以内であるので（ステップＳ１６−ＹＥＳ）、関連障害種別を関連通報ログ３１に登録する。障害３の障害種別Ｂを障害２の関連障害種別２に、障害２の障害種別Ａを障害３の関連障害種別１にそれぞれ登録する（ステップＳ１８）。
【００５６】
さらに過去の障害を検索すると、時刻ｔ１に発生した障害種別Ｃの障害１が抽出される（ステップＳ１３）。時刻ｔ１は、時刻ｔ３から時間Ｔ以上経過しているので障害３に対する処理は終了する（ステップＳ１６−ＮＯ）。
【００５７】
次に障害４が時刻ｔ４に発生すると、同様に関連通報ログ３１に障害４の障害種別Ｃと発生時刻ｔ４が登録される（ステップＳ１１）。関連通報ログ３１の過去の障害を検索し、時刻ｔ３に発生した障害種別Ｂの障害３を抽出する（ステップＳ１３）。時刻ｔ３は時刻ｔ４から時間Ｔ以上経過しているので障害４に対する処理は終了する（ステップＳ１６−ＮＯ）。このようにして、図２に示すような関連障害種別が登録されることになる。
【００５８】
図６を参照して関連通報ログ３１に保存されている障害情報から相関のある障害のルールを分析し、相関ルールテーブル３２に格納する動作を説明する。分析は、相関ルール分析部２８が定期的に行うが、障害情報が入力されて、関連通報ログ３１に保存されると並行して行ってもよい。
【００５９】
ルールを分析する範囲、つまり、関連通報ログ３１のデータの範囲は、保存されている件数とする。以下では予め決めてある障害の発生件数まで分析を行うことにして説明するが、分析する時点から一定の時間だけ過去に遡ったところまでの障害を対象にしてもよい。
【００６０】
相関ルール分析部２８は、関連通報ログ３１に保存されている障害のうち分析する範囲のはじめの障害にポインタを設定し、分析する障害の総数Ｎ、障害種別の数を計数するカウンタをクリアしておく。ここでは、分析範囲の最も過去の障害から分析するものとする（ステップＳ２１）。
【００６１】
関連通報ログ３１からポインタに指示された障害情報を取り出し、分析の総数を＋１する。例えば、図２の障害２を指示しているものとする（ステップＳ２２）。
【００６２】
障害Ａｉ（図２においては障害２）のデータを取り出し、障害種別の数ａｉを＋１する。図２の障害２では、障害種別Ａであるから、障害種別Ａの数を計数するカウンタを＋１することになる（ステップＳ２３）。
【００６３】
関連障害Ｂｉｊのデータを取り出し、関連する障害種別の数ｂｉｊを＋１する。図２の障害２では、関連障害種別１は、障害種別Ｃであるから、発生障害種別Ａに対する関連障害種別Ｃのカウンタを＋１する（ステップＳ２５）。
【００６４】
図２の障害２では、関連障害は２まであるので（ステップＳ２７−ＮＯ）、関連障害種別２についても計数する。関連障害種別２は、障害種別Ｂであるから、発生障害種別Ａに対する関連障害種別Ｂのカウンタを＋１する（ステップＳ２５）。
【００６５】
図２の障害２では、関連障害種別２までしか登録されていないので、障害２に対する処理は終わり、ポインタを次の障害（図２の障害３）に移す（ステップＳ２７−ＹＥＳ）。
【００６６】
次の障害（図２の障害３）は登録されているので（ステップＳ２８−ＮＯ）、ステップＳ２２から次の障害（図２の障害３）についても同様にカウントする。
【００６７】
分析範囲の最後の障害（図２の障害４）まで処理を進めると（ステップＳ２８−ＹＥＳ）、分析した障害の総数Ｎと障害Ａｉの発生数と障害Ａｉに対する関連障害Ｂｉｊの発生数が計数されているので、それぞれの確信度とサポートを計算する。確信度は、障害Ａｉが発生した数ａｉのうち、障害Ｂｉｊが発生した数ｂｉｊの割合であり、ｂｉｊ／ａｉで求められる。サポートは、Ａｉ⇒Ｂｉｊを満たすデータ数ｂｉｊの分析した障害の総数Ｎに対する割合であり、ｂｉｊ／Ｎで求められる。例えば、図２のように障害が発生した場合、障害Ｃ⇒障害Ａの相関ルールを評価すると、確信度＝１／２＝５０％、サポート＝１／４＝２５％、となる。（ステップＳ２９）
【００６８】
相関ルール分析部２８で計算した相関ルールは、図３に示されるような相関ルールテーブル３２に格納される。格納された相関ルールは、時間Ｔによって関連があるとした障害を全て列挙しているため、偶発的な障害間を関連あるとしたものも含まれているため、相関ルールテーブル３２内の確信度、サポートを参照して相関関係のあるものを選択し、相関ルールとする。相関ルールテーブル３２内の確信度、サポートは相関ルールとして有効とするか否かを判定するときに使用されるため、以上のような手順以外で相関ルールを抽出する場合は、必要がないこともある。以上によって相関ルールが相関ルールテーブル３２に登録されたことになる。
【００６９】
分析した相関ルールに基づいて、定義テーブルを更新する動作について図７を参照して説明する。相関ルールテーブル３２に登録されている相関ルールと通報定義テーブル３４に基づいて、通報定義登録部２９は、相関通報定義テーブル３５を更新する。相関ルールテーブル３２には、障害種別Ａの障害の発生に前後して障害種別Ｂの障害が発生すると推測できるという形式でルールが登録されている。関連通報ログ３１の保存方法と相関ルールの生成方法から、障害種別Ｂの障害の発生に前後して障害種別Ａの障害が発生すると推測できるというルールも登録されている。よって、通報定義登録部２９は、相関ルールに登録されているどちらかの障害について登録すればよいことになる。通報定義登録部２９が相関ルールを登録する際に、相関通報定義テーブル３５を予めクリアしておくと、相関ルールを抽出した期間の相関ルールに基づいて相関通報定義テーブル３５ができる。クリアせずに登録すると、過去の相関ルールに新しい期間の相関ルールが追加されることになる。適用するシステムの運用状態に応じて選択するとよい。
【００７０】
相関ルールを相関ルールテーブル３２から取り出す。取り出した相関ルールは、障害種別Ａ⇒障害種別Ｂであったとする。通報定義登録部２９は、障害種別Ａの登録状態によって障害種別Ｂを相関通報定義テーブル３５に登録するか否かを判断することになる（ステップＳ３１）。
【００７１】
障害種別Ａが、通報することになっている障害の障害種別を登録してある通報定義テーブル３４に登録されているか判定する（ステップＳ３２）。障害種別Ａが通報定義テーブル３４に登録されていなければ、障害種別Ａの障害は、保守センタ１４に通報しなくてもよい障害であり、その相関のある障害種別Ｂの障害も通報しなくてもよく、次の相関ルールの判定に進む（ステップＳ３２−ＮＯ）。
【００７２】
障害種別Ａが通報定義テーブル３４に登録されていると、障害種別Ａの障害は、保守センタ１４に通報すべき障害であり、その相関のある障害種別Ｂの障害も通報すべき障害と判断する（ステップＳ３５−ＹＥＳ）。
【００７３】
障害種別Ａに相関のある障害種別Ｂを取り出す。障害種別Ｂの障害は、通報すべき障害であっても既に通報する障害として登録されていると登録する必要がないため、未登録であることを確認する（ステップＳ３４）。
【００７４】
まず、通報定義テーブル３４に未登録であることを確認する。障害種別Ｂが通報定義テーブル３４に登録されている場合、登録する必要がないので次の相関ルールの判定に進む（ステップＳ３５−ＹＥＳ）。障害種別Ｂが通報定義テーブル３４に登録されていない場合は、相関通報定義テーブル３５の登録状態を確認する（ステップＳ３５−ＮＯ）。
【００７５】
障害種別Ｂが相関通報定義テーブル３５に既に登録されていれば、登録する必要がないので次の相関ルールの判定に進む（ステップＳ３６−ＹＥＳ）。障害種別Ｂが相関通報定義テーブル３５に登録されていなければ（ステップＳ３６−ＮＯ）、障害種別Ｂを相関通報定義テーブル３５に登録する（ステップＳ３８）。相関通報定義テーブル３５に登録することで、障害種別Ｂの障害は、保守センタ１４に通報されることになる。
【００７６】
相関のある障害種別Ａと障害種別Ｂはともに定義テーブルに登録されていないとならないが、本発明の方法では、障害種別Ａ⇒障害種別Ｂのとき、障害種別Ｂ⇒障害種別Ａとなるように構成されているので、障害種別Ａと障害種別Ｂはともに定義テーブルに登録される（通報定義テーブル３４、相関通報定義テーブル３５のいずれかに登録される）。
【００７７】
以上を相関ルールの終わりまで繰り返し（ステップＳ３９−ＮＯ）、全ての相関ルールを判定し終われば相関ルールをすべて相関通報定義テーブル３５に反映したことになる（ステップＳ３９−ＹＥＳ）。このようにして相関通報定義テーブル３５を更新する。
【００７８】
このように監視マネージャ１１の稼働にともなって更新されていく相関通報定義テーブル３５と、固定的に規定されている通報定義テーブル３４を参照して被監視装置１２から入力する障害情報を選別（フィルタリング）して保守センタ１４に通報する動作を、図８を参照して説明する。
【００７９】
障害情報入力部２１は、被監視装置１２から障害情報を入力し、通報フィルタ部２２に送る（ステップＳ４１）。
【００８０】
通報フィルタ部２２は、入力した障害情報の障害種別が通報定義テーブル３４に登録されているか判定する。通報定義テーブル３４に登録されていれば、入力した障害情報を保守センタ１４に通報するため、障害情報通報部２４に送る（ステップＳ４３−ＹＥＳ）。
【００８１】
通報定義テーブル３４に登録されていない障害は、相関通報フィルタ部２３に送られる。相関通報フィルタ部２３では、入力した障害情報の障害種別が相関通報定義テーブル３５に登録されているか判定する。相関通報定義テーブル３５に登録されていれば（ステップＳ４３−ＹＥＳ）、相関通報定義テーブル３５に登録されている障害種別の障害であることを示すフラグを付加する。フラグを付加された障害情報は、保守センタ１４に通報するため、障害情報通報部２４に送られる（ステップＳ４８）。障害情報通報部２４は、障害情報を保守センタ１４に通報する（ステップＳ４９）。
【００８２】
相関通報定義テーブル３５にも登録されていない障害種別の障害は、保守センタ１４に通報する必要がないので破棄される（ステップＳ４６−ＮＯ）。
【００８３】
このようにして、通報すべき障害は、保守センタ１４に通報され、通報する必要のない障害情報は、破棄されることになる。
【００８４】
以上のように、本発明の監視マネージャによると自動的に障害の通報条件を設定し、かつ、その稼働とともに最適な通報条件に転化するフィルタ条件を設定できるため、適切な障害通報が保守センタに通報されるようになる。よって、保守センタの負荷を削減することができ、保守時間の短縮が可能となる。
【００８５】
【発明の効果】
本発明によれば、被監視装置から収集される障害通報をフィルタ機能により保守センタに通報しなくてもよい軽微な障害通報を削除することができ、保守センタの負担を軽減する障害通報装置および障害通報方法を提供することができる。
【００８６】
また、本発明によれば、システムごとに異なる被監視装置の障害を選択通報する条件を人手ではなく自動的に設定する障害通報装置および障害通報方法を提供することができる。
【００８７】
さらに、本発明によれば、ミスしやすい人手による条件設定を排除することで被監視装置の障害を選択通報する条件をミスなく設定する障害通報装置および障害通報方法を提供することができる。
【００８８】
また、本発明によれば、関連のある障害を通報することにより障害発生を予測したり、原因究明を行うことができ、保守対応の時間を短縮する障害通報装置および障害通報方法を提供することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る障害監視システムの構成、及び監視マネージャの構成を示すブロック図である。
【図２】同関連通報ログの構成を示す図である。
【図３】同相関ルールテーブルの構成を示す図である。
【図４】同関連通報ログに障害情報が蓄積される状態を説明するための障害発生の時間関係などを示した図である。
【図５】同関連通報ログに障害情報を蓄積する動作を示すフローチャートである。
【図６】同相関ルール分析する動作を示すフローチャートである。
【図７】同相関通報定義テーブルに登録する動作を示すフローチャートである。
【図８】同通報動作を示すフローチャートである。
【符号の説明】
１１監視マネージャ
１２被監視装置
１４保守センタ
２１障害情報入力部
２２通報フィルタ部
２３相関通報フィルタ部
２４障害情報通報部
２７データ整形部
２８相関ルール分析部
２９通報定義登録部
３１関連通報ログ
３２相関ルールテーブル
３４通報定義テーブル
３５相関通報定義テーブル[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a fault reporting device that is installed between a monitored device and a maintenance center that monitors the monitored device, and that selects fault information reported from the monitored device to the maintenance center. About the method.
[0002]
[Prior art]
2. Description of the Related Art In a system in which a failure is reported from a monitored device to a maintenance center, a technology of installing a manager (device) between the monitored device and the maintenance center has been conventionally known as a technology for reducing the burden on the maintenance center.
[0003]
According to Japanese Patent Application Laid-Open No. 07-183932, in a hierarchical network management system, of trap information issued by an agent, management information in which an integrated manager is notified of only trap information necessary for communication network management. Communication method was known. In this communication method, each sub-manager installed between the agent and the integration manager refers to a filter MIB storing a condition for filtering and a filter condition in the filter MIB when the trap information is received. Filter function to change and add filter conditions according to a request from the integration manager. Therefore, the number of trap information received by the integrated manager can be reduced, and the burden on the integrated manager can be reduced.
[0004]
According to Japanese Patent Application Laid-Open No. 2001-331350, there has been known a technology related to a maintenance management device capable of predicting and outputting a serious failure caused by a correlation between a plurality of minor events occurring in a computer system. This technology stores in advance a relation between a combination of a plurality of operating situations occurring within a predetermined time difference and a possible failure result in an association database, and records the operating situation occurring in the computer system together with its occurrence time information in an operating situation record. The operation recording device that records the file, the operation status recording file is monitored, and if a combination of a plurality of operation statuses exists in the related database and the difference between the occurrence time information is within the predetermined time difference, A recording monitoring device is provided that outputs a possible failure result stored in the database in a forecast on a display device.
[0005]
According to Japanese Patent Application Laid-Open No. 2000-242625, a feature of data is extracted and a visualization target is extracted even when there is no target attribute with respect to original data to be analyzed in which character and numeric attributes are mixed. 2. Description of the Related Art A technique for selecting an attribute using an association rule capable of selecting an attribute properly is known. According to this technique, when data in a transaction format is input, an association rule is extracted, and a value Mi obtained by multiplying the confidence Ci and the support Si for all the extracted association rules is calculated. Are selected in order from the correlation rule with the largest is, and the items are selected in order of the body and the head from the selected association rule. Select and sequentially select the attributes to which the selected item belongs.
[0006]
According to Japanese Patent Application Laid-Open No. 06-149577, regarding a technique for diagnosing the cause of a fault that has occurred, it is possible to efficiently diagnose faults caused by physical faults of devices and communication paths, erroneous operations, program faults, malfunctions due to improper parameter value settings, and the like. Techniques that make it possible to do well were known. This technique is a diagnostic method and apparatus for diagnosing a fault occurring in a device and a communication path, wherein a diagnostic tree configured by a diagnostic tree in which diagnostic data described corresponding to a fault state is hierarchically arranged as a plurality of nodes. A diagnostic node specifying unit that specifies a node in a diagnostic knowledge base most relevant to a fault to be dealt with, a checking unit that checks the content corresponding to the specified node, and a node that indicates a fault state. A test result determination unit that selects a node that shows the most certain failure state among the candidates.
[0007]
Further, according to Japanese Patent Application Laid-Open No. 07-200499, there is known a technique related to a fault diagnosis apparatus that reduces the size of a main storage device, increases the efficiency of execution time of a diagnosis program, and performs a quick and accurate fault diagnosis. Was. An input device, a display device, and a storage device are connected to the central processing unit, and the central processing unit includes a diagnostic device that diagnoses a failure. The storage device is provided with a plurality of diagnostic data described in correspondence with the failure state. The diagnostic apparatus includes a diagnostic tree selecting unit that selects a diagnostic tree and a node related to a fault to be dealt with among nodes indicating a fault state in the knowledge base, and performs a test according to the content described in the specified node. An inspection unit, and a test result determination unit that selects a node that indicates the most probable failure state among the nodes that indicate the failure state based on the inspection result, and displays the result on a diagnostic tree instruction unit or a display device. Things.
[0008]
Further, according to Japanese Patent Application Laid-Open No. 08-077260, a technique related to a failure countermeasure support system corresponding to the reported failure information has been known. This technique provides a failure countermeasure support system having a configuration capable of properly guiding a series of work procedures required for countermeasures for a failure occurring in a customer device and supporting the failure. The failure countermeasure reception processing unit searches each information file based on the failure report information notified from the customer's equipment to the failure countermeasure support system, identifies the formal device model name of the failed device based on the search result, and For example, estimation of a defective part, processing for arranging a part, and communication with the related department through the related department terminal control unit are performed. At that time, guidance information for the processor of the trouble countermeasure, for example, work to be performed next, introduction of data to be input, and the like are displayed on a terminal screen or the like by the terminal display guidance processing unit, and input of necessary data and the like is performed. To encourage.
[0009]
[Patent Document 1]
JP-A-07-183932
JP 2001-331350 A
JP 2000-242625 A
JP 06-149577 A
JP 07-200499 A
JP 08-077260 A
[0010]
[Problems to be solved by the invention]
An object of the present invention is to provide a failure reporting device and a failure reporting method that reduce the burden on a maintenance center to be notified of a failure of a monitored device.
[0011]
Another object of the present invention is to provide a failure reporting device and a failure reporting method for automatically setting conditions for selectively reporting a failure of a monitored device, not manually.
[0012]
It is another object of the present invention to provide a failure reporting device and a failure reporting method for setting conditions for selectively reporting a failure of a monitored device without error.
[0013]
Still another object of the present invention is to provide a failure reporting device and a failure reporting method that reduce the time for maintenance.
[0014]
[Means for Solving the Problems]
The means for solving the problem will be described below using the numbers and symbols used in [Embodiments of the Invention]. These numbers and symbols have been added in order to clarify the correspondence between the description of [Claims] and [Embodiments of the Invention]. However, those numbers and symbols must not be used for interpreting the technical scope of the invention described in [Claims].
[0015]
According to an aspect of the present invention, a failure notification device (11) includes an input unit (21), a definition table (34, 35), a filter unit (22, 23), a notification unit (24), and a registration unit. (27, 28, 29, 31, 32), and automatically updates the fault type to be reported to the maintenance center (14) with the operation of the fault notification device (11). The fault information reported from the monitored device (12) to the maintenance center (14) includes a fault type indicating the type of fault that has occurred and an occurrence time indicating the time when the fault occurred. The input unit (21) collects failure information from the monitored device (12) monitored by the maintenance center (14). The definition tables (34, 35) define the types of faults to be reported to the maintenance center (14). The filter units (22, 23) determine the failure information to be reported to the maintenance center (14) from the failure information input to the input unit (21) based on the definition tables (34, 35). The reporting unit (24) reports failure information to a maintenance center (14) that monitors the monitored device (12) in response to the result determined by the filter units (22, 23). The registration unit (27, 28, 29, 31, 32) extracts a failure type of failure information to be reported to the maintenance center (14) based on the failure information collected by the input unit (21), and extracts the extracted failure. The definition table (35) is updated with the type.
[0016]
In the fault notification device (11) of the present invention, the registration unit (27, 28, 29, 31, 32) includes a data shaping unit (27), an association rule analysis unit (28), and a notification definition registration unit (29). And The data shaping section (27) stores the failure information in the related report log (31) every time the failure information is input. The correlation rule analysis unit (28) extracts, from the related report log (31), a correlation rule in which a related failure type of a related failure occurring before and after the failure is associated with the failure type of the failure, and extracts the extracted result as a correlation rule. It is stored in the table (32). The report definition registration unit (29) refers to the definition tables (34, 35) and the correlation rule table (32) to store the failure type of the failure information to be reported to the maintenance center (14) in the definition table (35). register.
[0017]
In the failure notification device (11) of the present invention, the first failure information of the first failure that has occurred in the monitored device (12) includes a first occurrence time and a first failure type. The second fault information of the second fault of the monitored device (12) that has occurred within the predetermined time T before and after the first occurrence time includes the second fault type. The data shaping unit (27) stores the first failure type, the first occurrence time, and the second failure type in the associated report log (31) in association with each other.
[0018]
In the trouble report device (11) of the present invention, the correlation rule analysis unit (28) extracts a correlation rule from a predetermined number of trouble information stored in the related report log (31). By limiting the number of pieces of fault information for extracting correlation rules, it is possible to cope with a change in fault occurrence due to a change in operation state and the like, and the number of samples to be analyzed can be secured, so that an improvement in accuracy can be expected.
[0019]
In the failure notification device (11) of the present invention, the correlation rule analysis unit (28) generates the failure information within a predetermined time from the time when the correlation rule is extracted from the failure information stored in the related notification log (31). The correlation rule is extracted from the failure information of the failed failure. By limiting the time of the failure information for extracting the correlation rule, it becomes possible to extract a correlation rule that matches the failure occurrence situation in the latest operation state.
[0020]
In the fault notification device (11) of the present invention, the definition tables (34, 35) include a notification definition table (34) and a correlation notification definition table (35). In the report definition table (34), a fault type to be reported to the maintenance center (14) is registered in advance. In the correlation report definition table (35), a failure type to be reported to the maintenance center (14) is registered by the report definition registration unit (29). The filter units (22, 23) include a report filter unit (22) and a correlation report filter unit (23). The report filter unit (22) causes the report unit (24) to report the failure information of the failure type registered in the report definition table (34) to the maintenance center (14). The correlation report filter section (23) causes the report section (24) to report the failure information of the failure type registered in the correlation report definition table (35) to the maintenance center (14).
[0021]
In the failure notification device (11) of the present invention, the failure information of the failure type registered in the correlation report definition table (35) is added with a flag indicating that the failure has been reported with reference to the correlation report definition table (35). You. The reporting unit (24) reports the flagged failure information to the maintenance center (14).
[0022]
According to an aspect of the present invention, a failure notification method includes an input step, a filter step, a notification step, and a registration step, and is performed by a maintenance center when a failure notification device (11) using the method operates. The fault type to be reported to (14) is automatically updated. The fault information reported from the monitored device (12) to the maintenance center (14) includes a fault type indicating the type of fault that has occurred and an occurrence time indicating the time when the fault occurred. The input step collects fault information from the monitored device (12) monitored by the maintenance center (14). The definition tables (34, 35) define the types of faults to be reported to the maintenance center (14). The filter step determines failure information to be reported to the maintenance center (14) from the failure information input in the input step based on the definition tables (34, 35). The reporting step reports failure information to a maintenance center (14) that monitors the monitored device (12) in response to a result determined in the filtering step. The registration step extracts a failure type of the failure information to be reported to the maintenance center (14) based on the failure information collected in the input step, and updates the definition table (35) with the extracted failure type.
[0023]
In the failure notification method according to the present invention, the registration step includes a data shaping step, an association rule analysis step, and a notification definition registration step. The data shaping step stores the failure information in the relevant report log (31) each time it is input. The association rule analysis step extracts, from the association report log (31), an association rule that associates the associated failure type of the associated failure occurring before and after the failure with the failure type of the failure, and extracts the extracted result from the association rule table (32). To be stored. The report definition registration step adds a failure type of failure information to be reported to the maintenance center (14) to the definition table (35) with reference to the definition tables (34, 35) and the correlation rule table (32).
[0024]
In the failure notification method of the present invention, the first failure information of the first failure that has occurred in the monitored device (12) includes a first occurrence time and a first failure type. The second fault information of the second fault of the monitored device (12) that has occurred within the predetermined time T before and after the first occurrence time includes the second fault type. In the data shaping step, the first failure type, the first occurrence time, and the second failure type are stored in the associated report log (31) in association with each other.
[0025]
In the failure notification method of the present invention, the correlation rule analysis step extracts a correlation rule from a predetermined number of pieces of failure information stored in the related notification log (31).
[0026]
In the failure notification method according to the present invention, the correlation rule analysis step includes, among failure information stored in the related notification log (31), failure information of a failure that has occurred within a predetermined time from the time when the correlation rule is extracted. Extract association rules.
[0027]
In the failure notification method according to the present invention, the definition tables (34, 35) include a notification definition table (34) in which a failure type to be reported to the maintenance center (14) is registered in advance, and the maintenance center (14) by a registration step. And a correlation report definition table (35) in which a fault type to be reported is registered. The filtering step includes a reporting filter step for causing the maintenance center (14) to report the failure information of the failure type registered in the reporting definition table (34) by the reporting step, and a failure registered in the correlation reporting definition table (35). A correlation report filter step for reporting the type of fault information to the maintenance center (14) by a report step.
[0028]
In the failure notification method according to the present invention, the failure information of the failure type registered in the correlation report definition table (35) is added with a flag indicating that the failure has been reported with reference to the correlation report definition table (35). The reporting step reports the flagged fault information to the maintenance center (14). By adding a flag, it is possible to distinguish whether the reported fault is fault information for which a report has been set in advance or fault information that has been additionally registered as a related fault, thereby reducing the time required for maintenance such as fault analysis. Becomes possible.
[0029]
BEST MODE FOR CARRYING OUT THE INVENTION
An embodiment of the present invention will be described with the fault notification device of the present invention referred to as a monitoring manager. FIG. 1 is a block diagram showing the configuration of a fault monitoring system using the monitoring manager of the present invention. The fault monitoring system includes a monitored device 12, a monitoring manager 11, and a maintenance center 14. The monitored device 12 attempts to report the occurrence of a failure to the maintenance center 14 regardless of the degree of the failure. The failure information to be reported includes a failure type indicating the type of the failure that has occurred (for example, “disk failure”, “database inaccessible”, etc.) and the time of occurrence, and is reported to the monitoring manager 11. The monitoring manager 11 selects the notified fault information, notifies the maintenance center 14, and notifies the maintenance center 14 of only valid fault information. The maintenance center 14 performs a maintenance operation based on the reported failure information.
[0030]
Although only one monitored device 12 is illustrated in FIG. 1, a plurality of monitored devices 12 may be provided. By processing the fault information reported from the plurality of monitored devices 12, selection of the following fault information is performed. Since the number of samples for estimating conditions increases, improvement in accuracy can be expected. When the failure information is processed for each monitored device 12, it is possible to cope with the individual situation of each monitored device 12.
[0031]
Further, there may be a case where the maintenance center 14 and the maintenance terminal are located near the monitored device 12 and at a remote place, and a plurality of maintenance centers 14 may be provided. Further, the connection between the monitored device 12 and the monitoring manager 11, and the connection between the monitoring manager 11 and the maintenance center 14 may be through a communication network.
[0032]
By sorting the fault information and reporting it to the maintenance center 14, the maintenance center 14 removes fault information that is not important for maintaining the monitored device 12, thereby reducing the burden of analyzing the fault information. And maintenance time can be reduced.
[0033]
The monitoring manager 11 is an information processing apparatus exemplified by a workstation or the like, and includes a failure information input unit 21, a report filter unit 22, a correlation report filter unit 23, a failure information report unit 24, a data shaping unit 27, a correlation rule analysis unit. 28, a report definition registration unit 29, a related report log 31, a correlation rule table 32, a report definition table 34, and a correlation report definition table 35.
[0034]
The report definition table 34 and the correlation report definition table 35 are definition tables that store conditions for selecting fault information. In these definition tables, fault types indicating the types of faults are stored in a list format. Fault information of the fault type registered in the definition table is reported to the maintenance center 14. Failure information of the failure type not registered in the definition table is discarded without being reported to the maintenance center 14.
[0035]
The report definition table 34 is a table in which fault types of faults directly related to maintenance of the monitored device 12 are registered in advance. The registered fault types are fixed and are not automatically updated. Needless to say, the maintenance person can perform maintenance such as registration of fault types required for maintenance operation and deletion of unnecessary fault types. Care must be taken.
[0036]
The correlation report definition table 35 is a table in which fault types of faults indirectly related to the maintenance of the monitored device 12 are registered. The fault indirectly related to the maintenance of the monitored device 12 is a fault that is minor and negligible in maintenance operation and need not be reported to the maintenance center 14, but is registered in the report definition table 34. This is a fault that is presumed to be related to the occurrence of a fault of the specified fault type. The correlation report definition table 35 is automatically updated by the report definition registration unit 29 based on the result analyzed by the correlation rule analysis unit 28.
[0037]
By separating the definition table into a fixed part and a variable part, it is possible to prevent an important fixed part from being erroneously set by the automatic update function. Moreover, the effect of preventing human error in maintenance can be expected by clarifying the fixed portion.
[0038]
The failure information input unit 21 has an interface with the monitored device 12 and collects failure information from the monitored device 12. The collected fault information is sent to the report filter unit 22 and the data shaping unit 27.
[0039]
The report filter unit 22 and the correlation report filter unit 23 are filters for selecting fault information. The report filter 22 receives the failure information from the failure information input unit 21, selects the failure information of the failure type registered in the report definition table 34, and causes the failure information reporting unit 24 to report to the maintenance center 14. The fault information of the fault type not registered in the report definition table 34 is sent to the correlation report filter unit 23.
[0040]
The correlation report filter unit 23 selects the failure information of the failure type registered in the correlation report definition table 35 from the failure information sent from the report filter unit 22 and causes the failure information reporting unit 24 to report to the maintenance center 14. Fault information of the fault type not registered in the correlation report definition table 35 is discarded here because it is not reported. The failure information to be reported is attached with a flag indicating that the failure information is reported to the maintenance center 14 with reference to the correlation report definition table 35. With this flag, the maintenance center 14 identifies that the failure is registered in the correlation report definition table 35, and uses it for preventive maintenance of the monitored device 12.
[0041]
Due to the filtering of the failure information, a failure that is not effective for maintenance of the monitored device 12 such as a minor failure is not reported to the maintenance center 14 and the load on the maintenance center 14 is reduced.
[0042]
The failure information reporting unit 24 has an interface with the maintenance center 14, and sends failure information sent from the report filter unit 22 and the correlation report filter unit 23 to the maintenance center 14.
[0043]
The data shaping section 27, the correlation rule analysis section 28, the report definition registration section 29, the related report log 31, and the correlation rule table 32 are registration sections that generate conditions for selecting failure information.
[0044]
The data shaping unit 27 stores the failure information sent from the failure information input unit 21 in the related notification log 31. Further, a past fault that occurred during a predetermined time T from the occurrence time is extracted from the related notification log 31 and the fault type of the fault information received from the fault information input unit 21 as the fault related to the extracted fault is described. Register in the related report log 31. On the contrary, the fault type of the extracted fault is registered in the related notification log 31 as a related fault of the received fault. The predetermined time T will be described below as a fixed time for all faults, but may be different depending on the type of fault (fault type). When a different time Tn for the failure type n is set as a threshold, the time at which a related failure occurs in each failure type n can be adjusted, so that the analysis time performed by the correlation rule analysis unit 28 can be shortened or there is a correlation. The occurrence of a failure can be estimated in more detail.
[0045]
As shown in FIG. 2, the related notification log 31 stores the fault type and the occurrence time of the fault information sent from the fault information input unit 21 in association with up to n fault types of related faults. Here, for the sake of explanation, the number of related faults is fixed in a table format, but may be stored in a variable-length list format. If the length is variable, the memory area to be used can be saved if the number of related faults for each fault is significantly different.
[0046]
The correlation rule analysis unit 28 periodically analyzes the related report log 31 and extracts a rule of the relation between the failures. The association rule analysis method uses an association rule. The association rule indicates the fact that when an event A occurs, an event B also occurs, and is generally described by the following equation.
A⇒B
Here, if a predetermined time T is given and the correlation between the fault A and the fault B is described as A⇒B, the fault A occurs at the time ta, and the fault B occurs during the interval T before and after the time ta. When, that is,
ta-T <tb <ta + T
It is assumed that when a failure B occurs at time tb that satisfies the condition, the correlation A → B is satisfied. Further, faults A and B satisfying the association rules A → B and B⇒A are defined as “there is a relationship between faults A and B”.
[0047]
The value of the rule is represented by confidence and support. The certainty factor is the ratio of the number b in which the event B has occurred to the number a in which the event A has occurred, and is calculated as b / a. The support is determined by b / N, which is the ratio of the number of data b satisfying A to B to all data N. For example, when a failure occurs as shown in FIG. 2, when the correlation rule of the failure C → the failure A is evaluated, the confidence = 1/2 = 50% and the support = １／ = 25%. The correlation rule analysis unit 28 has a certainty factor and a support threshold, and notifies the message definition registration unit 29 of the certainty factor and the support correlation rule that are equal to or greater than the threshold value. The technique of giving a certainty factor and support in this way to find an association rule having a certainty factor and support is known, and an a priori algorithm or the like is known.
[0048]
The result of calculating the certainty factor and the support between the faults stored in the related report log 31 is stored in a correlation rule table 32 as shown in FIG. The correlation rule table 32 summarizes the fault type A, the fault type B related to the fault type A, the certainty factor, and the support.
[0049]
The report definition registration unit 29 registers the fault type of the fault to be reported in the correlation report definition table 35 by referring to the report definition table 34 in accordance with the correlation rule stored in the correlation rule table 32. The failure to be reported is a failure that is correlated with the failure of the failure type registered in the notification definition table 34 and is not registered in the notification definition table 34. In other words, if the failure is not registered in the correlation report definition table 35, the failure is not reported to the maintenance center 14, and is not a failure that needs to be reported directly in maintenance and operation, but is a sign of an important failure. It is an obstacle.
[0050]
The monitoring manager 11 of the present invention saves the failure information input from the monitored device 12 in the related notification log 31, analyzes the rule of the correlated failure from the stored failure information, and defines the rule based on the analyzed correlation rule. The table is updated, the failure information input from the monitored device 12 is selected (filtered) with reference to the definition table updated with the operation, and the failure information is reported to the maintenance center 14. The operation will be described below.
[0051]
The operation of storing the failure information input from the monitored device 12 in the related notification log 31 repeats the operation shown in the flowchart of FIG. 5 every time the failure information is input. The state in which the failure information is accumulated in the related notification log 31 shown in FIG. 2 will be described on the assumption that a failure has occurred as shown in FIG. Failure 1 of failure type C occurs at time t1, failure 2 of failure type A occurs at time t2 within time T, and failure 3 of failure type B occurs at time t3 within time T thereafter. It is assumed that the fault 4 of the fault type C has occurred at time t4 after the occurrence and the time T or more thereafter.
[0052]
When the failure 1 occurs at time t1 and the failure information is input from the monitored device 12 to the failure information input unit 21, the data shaping unit 27 extracts the failure type C and the occurrence time t1 from the failure information, Register with. When a failure 1 occurs, there is no other failure, and only registration is performed. Next, when the failure 2 occurs at the time t2 and the failure information is input from the monitored device 12 to the failure information input unit 21, the data shaping unit 27 extracts the failure type A and the occurrence time t2 from the failure information, Register in the log 31. (Step S11)
[0053]
In order to investigate the related trouble, the past trouble information in the related report log 31 is extracted. When the fault 2 is registered, the data of the fault 1 is taken out since the fault 1 is registered in the related report log 31. (Step S13)
[0054]
It is compared whether or not the occurrence time of the extracted data is within the time T as the related failure (step S16). Since the occurrence time of the fault 1 is t1, the time difference from the fault 2 is within T (step S16-YES), and the fault 1 and the fault 2 are registered as related faults. The failure type A of the failure 2 is registered in the related failure type 1 of the failure 1 and the failure type C of the failure 1 is registered in the related failure log 1 of the failure 2. Since there is no more data in the related report log 31, the processing for the failure 2 is completed. (Step S18)
[0055]
Next, when the failure 3 occurs at the time t3, first, similarly to the failures 1 and 2, the failure type B of the failure 3 and the occurrence time t3 are registered in the related notification log 31 (step S11). The related failure log 31 is searched for a past failure, and the failure 2 of the failure type A that occurred at the time t2 is extracted (step S13). Since the time t2 is within the time T from the time t3 (step S16-YES), the related fault type is registered in the related notification log 31. The fault type B of the fault 3 is registered as the related fault type 2 of the fault 2 and the fault type A of the fault 2 is registered as the related fault type 1 of the fault 3 (step S18).
[0056]
When a past failure is searched, a failure 1 of the failure type C that occurred at the time t1 is extracted (step S13). At time t1, since the time T has elapsed since time t3, the process for the failure 3 ends (step S16-NO).
[0057]
Next, when the failure 4 occurs at the time t4, the failure type C of the failure 4 and the occurrence time t4 are similarly registered in the related notification log 31 (step S11). The related failure log 31 is searched for a past failure, and the failure 3 of the failure type B that occurred at the time t3 is extracted (step S13). At time t3, since the time T has elapsed since time t4, the processing for the failure 4 ends (step S16-NO). In this way, the related fault types as shown in FIG. 2 are registered.
[0058]
With reference to FIG. 6, an operation of analyzing a rule of a correlated fault from the fault information stored in the related notification log 31 and storing the rule in the correlation rule table 32 will be described. The analysis is periodically performed by the correlation rule analysis unit 28, but may be performed in parallel when the failure information is input and stored in the related notification log 31.
[0059]
The range for analyzing the rule, that is, the range of the data of the related notification log 31 is the number of stored cases. In the following description, the analysis is performed up to a predetermined number of occurrences of failures. However, failures up to a predetermined time from the analysis time to the past may be targeted.
[0060]
The correlation rule analysis unit 28 sets a pointer to the first failure in the range to be analyzed among the failures stored in the related notification log 31, and clears a counter for counting the total number N of failures to be analyzed and the number of failure types. Keep it. Here, it is assumed that the analysis is performed from the oldest failure in the analysis range (step S21).
[0061]
The failure information pointed to by the pointer is extracted from the related notification log 31, and the total number of analyzes is incremented by one. For example, it is assumed that failure 2 in FIG. 2 has been instructed (step S22).
[0062]
The data of the fault Ai (failure 2 in FIG. 2) is extracted, and the number ai of the fault types is incremented by one. In the fault 2 of FIG. 2, since the fault type is A, the counter for counting the number of fault types A is incremented by 1 (step S23).
[0063]
The data of the related fault Bij is extracted, and the number bij of the related fault types is incremented by one. In the fault 2 of FIG. 2, since the related fault type 1 is the fault type C, the counter of the related fault type C for the generated fault type A is incremented by 1 (step S25).
[0064]
In the fault 2 of FIG. 2, since there are up to two related faults (step S27-NO), the related fault type 2 is also counted. Since the related fault type 2 is the fault type B, the counter of the related fault type B for the generated fault type A is incremented by 1 (step S25).
[0065]
In the fault 2 of FIG. 2, since only the related fault type 2 has been registered, the processing for the fault 2 ends, and the pointer is moved to the next fault (the fault 3 in FIG. 2) (step S27-YES).
[0066]
Since the next fault (fault 3 in FIG. 2) has been registered (step S28-NO), the next fault (fault 3 in FIG. 2) is similarly counted from step S22.
[0067]
When the process proceeds to the last failure in the analysis range (fault 4 in FIG. 2) (step S28-YES), the total number N of failures analyzed, the number of failures Ai, and the number of failures Bij related to failure Ai are counted. So calculate the confidence and support for each. The certainty factor is a ratio of the number bij in which the failure Bij has occurred among the number ai in which the failure Ai has occurred, and is calculated by bij / ai. The support is a ratio of the number bij of data satisfying Ai to Bij to the total number N of analyzed failures, and is calculated by bij / N. For example, when a failure occurs as shown in FIG. 2, when the correlation rule of the failure C → the failure A is evaluated, the confidence = 1/2 = 50% and the support = １／ = 25%. (Step S29)
[0068]
The association rules calculated by the association rule analysis unit 28 are stored in an association rule table 32 as shown in FIG. The stored association rules enumerate all failures that are related by time T, and include those that are related between accidental failures. With reference to the support, those having a correlation are selected and set as a correlation rule. Since the certainty and support in the association rule table 32 are used when determining whether or not to be valid as an association rule, there is no necessity when extracting an association rule other than the above procedure. is there. Thus, the association rule is registered in the association rule table 32.
[0069]
The operation of updating the definition table based on the analyzed association rule will be described with reference to FIG. The report definition registration unit 29 updates the correlation report definition table 35 based on the correlation rules registered in the correlation rule table 32 and the report definition table 34. In the correlation rule table 32, rules are registered in such a format that a failure of the failure type B can be estimated to occur before or after the failure of the failure type A. From the method of storing the related report log 31 and the method of generating the correlation rule, a rule is also registered that it can be estimated that a failure of the failure type A will occur before or after the failure of the failure type B. Therefore, the notification definition registration unit 29 only needs to register one of the failures registered in the correlation rule. If the correlation report definition table 35 is cleared in advance when the report definition registration unit 29 registers the correlation rule, the correlation report definition table 35 is created based on the correlation rule of the period in which the correlation rule was extracted. If you register without clearing, a correlation rule for a new period will be added to the past correlation rules. The selection should be made according to the operation state of the system to be applied.
[0070]
The association rule is extracted from the association rule table 32. It is assumed that the retrieved association rule is fault type A → failure type B. The notification definition registration unit 29 determines whether to register the failure type B in the correlation notification definition table 35 based on the registration status of the failure type A (step S31).
[0071]
It is determined whether the failure type A is registered in the notification definition table 34 in which the failure type of the failure to be reported is registered (step S32). If the failure type A is not registered in the report definition table 34, the failure of the failure type A is a failure that does not need to be reported to the maintenance center 14, and the failure of the failure type B having the correlation is not reported. Alternatively, the process proceeds to determination of the next association rule (step S32-NO).
[0072]
If the fault type A is registered in the report definition table 34, the fault of the fault type A is a fault to be reported to the maintenance center 14, and the fault of the fault type B having a correlation therewith is determined to be a fault to be reported. (Step S35-YES).
[0073]
The fault type B having a correlation with the fault type A is extracted. Since it is not necessary to register the fault of the fault type B as a fault to be reported as already registered as a fault to be reported, it is confirmed that the fault has not been registered (step S34).
[0074]
First, it is confirmed that the information has not been registered in the report definition table 34. If the fault type B is registered in the report definition table 34, it is not necessary to register the fault type B, and the process proceeds to the next correlation rule determination (step S35-YES). If the failure type B is not registered in the report definition table 34, the registration status of the correlation report definition table 35 is confirmed (step S35-NO).
[0075]
If the fault type B has already been registered in the correlation report definition table 35, there is no need to register it, and the process proceeds to the next correlation rule determination (step S36-YES). If the failure type B is not registered in the correlation report definition table 35 (step S36-NO), the failure type B is registered in the correlation report definition table 35 (step S38). By registering in the correlation report definition table 35, a failure of the failure type B is reported to the maintenance center 14.
[0076]
It is necessary that both the fault type A and the fault type B having a correlation are not registered in the definition table. However, according to the method of the present invention, when the fault type A is the fault type B, the fault type B is the fault type A. Because of this configuration, both the fault type A and the fault type B are registered in the definition table (registered in either the report definition table 34 or the correlation report definition table 35).
[0077]
The above is repeated until the end of the correlation rule (step S39-NO), and when all the correlation rules are determined and completed, all the correlation rules are reflected in the correlation report definition table 35 (step S39-YES). Thus, the correlation report definition table 35 is updated.
[0078]
By referring to the correlation report definition table 35 updated with the operation of the monitoring manager 11 and the fixedly defined report definition table 34, the fault information input from the monitored device 12 is selected (filtered). 8), the operation of notifying the maintenance center 14 will be described with reference to FIG.
[0079]
The failure information input unit 21 receives the failure information from the monitored device 12 and sends it to the notification filter unit 22 (Step S41).
[0080]
The report filter unit 22 determines whether the fault type of the input fault information is registered in the report definition table 34. If registered in the notification definition table 34, the input failure information is sent to the failure information notification unit 24 to notify the maintenance center 14 (step S43-YES).
[0081]
Failures not registered in the report definition table 34 are sent to the correlation report filter unit 23. The correlation report filter unit 23 determines whether the failure type of the inputted failure information is registered in the correlation report definition table 35. If the failure is registered in the correlation report definition table 35 (step S43-YES), a flag indicating that the failure is of the failure type registered in the correlation report definition table 35 is added. The failure information to which the flag has been added is sent to the failure information reporting unit 24 for reporting to the maintenance center 14 (step S48). The failure information reporting unit 24 reports the failure information to the maintenance center 14 (Step S49).
[0082]
Faults of the fault type not registered in the correlation report definition table 35 are discarded because there is no need to notify the maintenance center 14 (step S46-NO).
[0083]
In this way, a failure to be reported is reported to the maintenance center 14, and failure information that does not need to be reported is discarded.
[0084]
As described above, according to the monitoring manager of the present invention, a fault report condition can be automatically set, and a filter condition that can be converted to an optimum report condition along with its operation can be set. Be notified. Therefore, the load on the maintenance center can be reduced, and the maintenance time can be reduced.
[0085]
【The invention's effect】
ADVANTAGE OF THE INVENTION According to this invention, the trouble report which does not need to report the trouble report collected from a monitored apparatus to a maintenance center by a filter function can be deleted, and the trouble report apparatus which reduces the burden of a maintenance center, and A fault reporting method can be provided.
[0086]
Further, according to the present invention, it is possible to provide a failure reporting device and a failure reporting method that automatically set, instead of manually, conditions for selectively reporting a failure of a monitored device that differs for each system.
[0087]
Further, according to the present invention, it is possible to provide a failure reporting device and a failure reporting method for setting a condition for selectively reporting a failure of a monitored device without making a mistake by eliminating a condition setting by a human who tends to make a mistake.
[0088]
Further, according to the present invention, it is possible to provide a failure notification device and a failure notification method that can predict occurrence of a failure by investigating a related failure and investigate the cause, thereby shortening maintenance response time. Can be.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a fault monitoring system and a configuration of a monitoring manager according to an embodiment of the present invention.
FIG. 2 is a diagram showing a configuration of the related report log.
FIG. 3 is a diagram showing a configuration of the correlation rule table.
FIG. 4 is a diagram illustrating a time relationship of a failure occurrence for explaining a state in which failure information is accumulated in the related notification log.
FIG. 5 is a flowchart showing an operation of accumulating failure information in the related notification log.
FIG. 6 is a flowchart showing an operation of analyzing the association rule.
FIG. 7 is a flowchart showing an operation of registering in the correlation report definition table.
FIG. 8 is a flowchart showing the notification operation.
[Explanation of symbols]
11 Monitoring Manager
12 Monitored device
14 Maintenance Center
21 Failure information input section
22 Report Filter Section
23 Correlation Report Filter
24 Failure information reporting section
27 Data Formatter
28 Association Rule Analysis Unit
29 Report definition registration section
31 Related Report Log
32 Association rule table
34 Report definition table
35 Correlation Report Definition Table

Claims

An input unit configured to collect, from the monitored device, fault information including a fault type indicating a type of the fault that has occurred and an occurrence time indicating the time when the fault has occurred;
A definition table defining the fault type of the fault information to be reported;
A filter unit that determines the fault information to be reported based on the definition table,
A reporting unit that reports the failure information to a maintenance center that monitors the monitored device in response to a result determined by the filter unit;
A registration unit that extracts the failure type to be notified to the maintenance center based on the failure information collected by the input unit, and updates the definition table with the extracted failure type,
A failure reporting device that automatically updates the failure type of the failure information to be reported to the maintenance center with the operation of the failure reporting device.

The registration unit,
A data shaping unit that stores the failure information in a related report log every time the failure information is input;
A correlation rule analysis unit that extracts a correlation rule that associates a related failure type of a related failure occurring before and after the failure with the failure type of the failure from the related notification log, and stores the extracted result in a correlation rule table. ,
The failure report according to claim 1, further comprising: a report definition registration unit that registers the failure type of the failure information to be reported to the maintenance center in the definition table with reference to the definition table and the correlation rule table. apparatus.

The first fault information of the first fault that has occurred in the monitored device includes a first occurrence time and a first fault type,
The second failure information of the second failure of the monitored device that has occurred within a predetermined time T before and after the first occurrence time includes a second failure type,
The failure notification device according to claim 2, wherein the data shaping unit stores the first failure type, the first occurrence time, and the second failure type in the related report log in association with each other.

4. The failure notification device according to claim 2, wherein the correlation rule analysis unit extracts the correlation rule from a predetermined number of pieces of the failure information stored in the related notification log. 5.

The correlation rule analysis unit extracts the correlation rule from the failure information of the failure that occurred within a predetermined time from the time at which the correlation rule was extracted, among the failure information stored in the related notification log. The fault notification device according to claim 2 or 3, which performs the fault notification.

The definition table is
A report definition table in which the fault types to be reported to the maintenance center are registered in advance;
A correlation report definition table in which the fault type to be reported to the maintenance center is registered by the report definition registration unit.
The filter unit includes:
A report filter unit for reporting the failure information of the failure type registered in the report definition table from the report unit to the maintenance center,
A correlation report filter unit that reports the failure information of the failure type registered in the correlation report definition table from the report unit to the maintenance center.
The fault notification device according to claim 2.

The fault notification device according to claim 6, wherein the notification unit notifies the maintenance center of the fault information added with a flag indicating that the notification has been made with reference to the correlation report definition table.

An input step of collecting fault information including a fault type indicating the type of the fault that has occurred and an occurrence time indicating the time at which the fault has occurred from the monitored device;
Based on a definition table that defines the fault type of the fault information to be reported, a filter step of determining the fault information to be reported,
A reporting step of reporting the failure information to a maintenance center that monitors the monitored device in response to a result determined in the filtering step;
A registration step of extracting the fault type of the fault information to be notified to the maintenance center based on the fault information collected in the input step, and updating the definition table with the extracted fault type,
A failure notification method for automatically updating the failure type of the failure information to be reported to the maintenance center with the operation of the failure notification device using the method.

The registration step includes:
A data shaping step of storing in the relevant report log each time the failure information is input;
A correlation rule analysis step of extracting, from the related notification log, a correlation rule in which a related failure type of the related failure occurring before and after the failure is associated with the failure type of the failure, and storing the extracted result in a correlation rule table; 9. The failure report according to claim 8, further comprising: a report definition registration step of referring to the definition table and the correlation rule table and registering the failure type of the failure information to be reported to the maintenance center in the definition table. Method.

The first fault information of the first fault that has occurred in the monitored device includes a first occurrence time and a first fault type,
The second failure information of the second failure of the monitored device that has occurred within a predetermined time T before and after the first occurrence time includes a second failure type,
The failure notification method according to claim 9, wherein the data shaping step stores the first failure type, the first occurrence time, and the second failure type in the related report log in association with each other.

The failure notification method according to claim 9, wherein the correlation rule analysis step extracts the correlation rule from a predetermined number of pieces of the failure information stored in the related notification log.

The correlation rule analysis step extracts the correlation rule from the failure information of the failure that has occurred within a predetermined time from the time at which the correlation rule is extracted, among the failure information stored in the related notification log. The failure notification method according to claim 9 or 10, wherein

The definition table is
A report definition table in which the fault types to be reported to the maintenance center are registered in advance;
A correlation report definition table in which the fault types to be reported to the maintenance center are registered by the report definition registration step.
The filtering step includes:
A report filtering step of causing the maintenance center to report the failure information of the failure type registered in the report definition table by the reporting step;
A correlation report filter step of causing the maintenance center to report the failure information of the failure type registered in the correlation report definition table by the reporting step.
The failure notification method according to any one of claims 9 to 12.

14. The failure reporting method according to claim 13, wherein the reporting step reports the failure information to which the flag indicating that the report was made with reference to the correlation report definition table is added to the maintenance center.