JP4983806B2

JP4983806B2 - System monitoring apparatus and monitoring method using dual timer

Info

Publication number: JP4983806B2
Application number: JP2008549181A
Authority: JP
Inventors: 佳生廣瀬
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2006-12-15
Filing date: 2006-12-15
Publication date: 2012-07-25
Anticipated expiration: 2026-12-15
Also published as: JPWO2008072350A1; WO2008072350A1

Description

本発明はシステムの監視方式に係り、タイマからの割込みに対応して定期的に定められた処理を実施しているシステムにおいて、タイマの故障を適切に検出し、システム停止などの必要な処理を行うことを可能とするシステム監視装置、および監視方法に関する。 The present invention relates to a system monitoring method, and in a system that regularly performs processing corresponding to an interrupt from a timer, it appropriately detects a timer failure and performs necessary processing such as system shutdown. The present invention relates to a system monitoring apparatus and a monitoring method that can be performed.

システムに定期的に何かの動作をさせたい場合、タイマを使って定期的にシステムを制御するプロセッサのＣＰＵに割込みをかける、といった方法が一般的に用いられている。またシステムの信頼性を上げるために、多少コストがかかっても、信頼性向上のための制御などを２重化、３重化することもよくとられる手法である。ＣＰＵに割込みをかけるタイマについても同様で、タイマの信頼性を上げるためにタイマを２重化している従来例として次のような文献がある。 When it is desired to cause the system to periodically perform some operation, a method of interrupting a CPU of a processor that periodically controls the system using a timer is generally used. Further, in order to increase the reliability of the system, even if it costs a little, it is a technique that is often used to double or triple the control for improving the reliability. The same applies to the timer that interrupts the CPU, and there is the following document as a conventional example in which the timer is duplicated in order to increase the reliability of the timer.

特許文献１では、図１に示すような構成で、２つのタイマに異なるタイムアウト時間を設定する。図２に示すように、タイマＴ１にはタイムアウト時間としてＡという値を、タイマＴ２にはタイムアウト時間としてＢという値を設定する。ここでＡ＜Ｂとする。ここでタイマは０からカウントアップして、設定したタイムアウト時間になると割込みをあげる構成になっているものとしている。タイマをセットしてＡ時間経過後、タイマＴ１がタイムアウトしてＣＰＵに割込みがかかる。割込みがかかるとＣＰＵはタイマＴ２のタイマ値をリードする。リードしたタイマ値をＡ‘とすると、タイマ２が正常であれば、Ａ＜Ａ’の関係が成り立っているはずである。従って
・タイマＴ１より先にタイマＴ２がタイムアウトする、
・タイマＴ１のタイムアウト後、タイマＴ２のタイマ値がＡ＜Ａ’の関係を満たしていない、
ことを検出すると、タイマが故障したと判断することができる。In Patent Literature 1, different timeout periods are set for the two timers with the configuration shown in FIG. As shown in FIG. 2, a value of A is set as the timeout time for the timer T1, and a value of B is set as the timeout time for the timer T2. Here, A <B. Here, it is assumed that the timer is configured to count up from 0 and raise an interrupt when the set timeout time is reached. After the timer is set and A time elapses, the timer T1 times out and the CPU is interrupted. When interrupted, the CPU reads the timer value of timer T2. Assuming that the read timer value is A ′, if timer 2 is normal, the relationship of A <A ′ should be established. Therefore, timer T2 times out before timer T1,
-After timer T1 times out, the timer value of timer T2 does not satisfy the relationship of A <A '.
If this is detected, it can be determined that the timer has failed.

特許文献１では、タイマＴ１がタイムアップして両方のタイマが正常に動作していると判断した場合は、次回はタイマＴ２にタイムアウト時間Ａを、タイマＴ１にタイムアウト時間Ｂを設定して、両者のタイマの機能を、１回ごとに入れ換えている。そこで、１回のタイマからの割込みに対応して本来実行されるべき作業に必要な処理をＣＰＵに行わせるのに、タイマの設定を２回ずつ行う必要が有り、処理が煩雑であった。 In Patent Document 1, if it is determined that the timer T1 has timed out and both timers are operating normally, the next time the timer T2 is set to the timeout time A and the timer T1 is set to the timeout time B. The timer function is replaced every time. Therefore, in order to cause the CPU to perform processing necessary for work that should be originally executed in response to one interrupt from the timer, it is necessary to set the timer twice, and the processing is complicated.

別の従来技術としての特許文献２における構成を図３に、タイムチャートを図４に示す。この文献では、２つのタイマＴ１、タイマＴ２に、タイマＴ１の方が先にタイムアウトするように、タイムアウト時間をセットする。まずタイマＴ１がタイムアウトするが、ＭＰＵは割込みを受けて、作業フラグをＯＮにするとともに、タイマＴ１をタイマＴ２がタイムアウトしてからタイムアウトするように、タイムアウト時間を再セットする。次にタイマＴ２がタイムアウトして割込みが発生すると、ＭＰＵは作業フラグがＯＮであるかどうか確認し、ＯＮであれば作業フラグをＯＦＦにする。この時作業フラグがもともとＯＦＦであればタイマＴ１が故障したと判断する。次に正常であれば再びタイマＴ１がタイムアウトして割込みが発生する。ＭＰＵは作業フラグがＯＦＦであるか確認し、ＯＮになっているとタイマＴ２が故障したと判断する。ＯＦＦの場合はもとに戻って、タイマＴ１、タイマＴ２をタイマＴ１が先にタイムアウトするようにタイムアウト時間をセットする。特許文献２でも、１回のタイマからの割込みに対応して実行されるべき作業に必要な処理をＭＰＵに行わせるのに、タイマの設定を例えば３回行う必要が有り、処理が煩雑であった。 FIG. 3 shows a configuration in Patent Document 2 as another prior art, and FIG. 4 shows a time chart. In this document, a timeout time is set to the two timers T1 and T2 so that the timer T1 times out first. First, the timer T1 times out, but the MPU receives an interrupt, turns on the work flag, and resets the timeout time so that the timer T1 times out after the timer T2 times out. Next, when the timer T2 times out and an interrupt occurs, the MPU checks whether the work flag is ON. If it is ON, the MPU turns the work flag OFF. At this time, if the work flag is originally OFF, it is determined that the timer T1 has failed. Next, if normal, the timer T1 times out again and an interrupt occurs. The MPU confirms whether the work flag is OFF, and determines that the timer T2 has failed if it is ON. In the case of OFF, the process returns to the original, and the timeout time is set so that the timer T1 and the timer T2 time out first. Also in Patent Document 2, it is necessary to set the timer three times, for example, to make the MPU perform the processing necessary for the work to be executed in response to one interrupt from the timer, and the processing is complicated. It was.

このように特許文献１、特許文献２の従来技術では、タイマからの割込みに対応して実行されるべき作業に必要な処理をプロセッサに行わせるためにタイマの設定を複数回ずつ行う必要があり、処理が煩雑になるという問題点があった。 As described above, in the conventional techniques of Patent Document 1 and Patent Document 2, it is necessary to set the timer multiple times each time in order to cause the processor to perform processing necessary for work to be executed in response to the interrupt from the timer. There is a problem that the processing becomes complicated.

またどちらの特許文献においても、二重化されたタイマのいずれに故障が発生したかを判別することができず、タイマの故障発生時には基本的にはシステムをシャットダウンするしか方法はないという問題点があった。特許文献２においては、故障タイマの判別が可能と記述されているが、この判別はタイマが故障して割込みをあげなくなったような場合に対応しており、例えばタイマが設定された時間よりも早く割込みをあげるようになったような場合には必ずしも故障タイマの判別ができるとは限らないという問題点があった。
特開昭６０−０５９４４７「マイクロコンピュータシステム」特開平１１−６５９８６「タイマの障害検出システム及び検出方法並びに検出方法を実行するためのプログラムを記録した記録媒体」 In both patent documents, it is impossible to determine which of the duplicated timers has failed, and there is a problem that the only method is basically to shut down the system when a timer failure occurs. It was. In Patent Document 2, it is described that a failure timer can be determined, but this determination corresponds to a case where the timer fails and no longer raises an interrupt. There is a problem that a failure timer cannot always be discriminated when an interrupt is quickly raised.
JP-A-60-059447 “Microcomputer system” JP-A-11-65986 “Timer Fault Detection System, Detection Method, and Recording Medium Recording Program for Executing Detection Method”

本発明の目的は、上述の問題点に鑑み、タイマに対するタイムアウト時間設定のオーバヘッドを削減することと、二重化されたタイマのうちで故障したタイマの判別を可能とすることである。 In view of the above-described problems, an object of the present invention is to reduce the time-out time setting overhead for a timer and to determine a faulty timer among duplicated timers.

本発明のシステム監視装置は、タイマから割込み信号を出力し、監視対象システム内のプロセッサに予め定められた処理を実行させるものであり、共通の時間間隔（２Ｔ）毎に、その時間間隔が互いに半分（Ｔ）ずれた時点で割込み信号を出力する動作を、タイムアウト時間のリロード機能を用いて繰り返す２つのタイマを少なくとも備えるものであって、プロセッサが２つのタイマのいずれかからの割込み信号の入力が繰り返される毎に、２つのタイマの故障の有無を判定する。 The system monitoring apparatus of the present invention outputs an interrupt signal from a timer and causes a processor in the monitoring target system to execute a predetermined process, and the time intervals are mutually equal for each common time interval (2T). At least two timers that repeat an operation of outputting an interrupt signal at a time point shifted by half (T) using a timeout time reload function, and the processor inputs an interrupt signal from one of the two timers. Each time is repeated, it is determined whether or not two timers have failed.

また本発明のシステム監視装置は、いずれかのタイマからの割込み信号の入力に対応して、その割込み信号を出力したタイマの識別子を示すフラグを格納するフラグレジスタをさらに備え、プロセッサがタイマからの次の割込み信号の入力に対応して、その割込み信号を出力したタイマの識別子とフラグレジスタの格納内容とを比較して、タイマ故障の有無を判定する。 The system monitoring apparatus of the present invention further includes a flag register that stores a flag indicating an identifier of the timer that has output the interrupt signal in response to the input of the interrupt signal from any one of the timers. Corresponding to the input of the next interrupt signal, the identifier of the timer that output the interrupt signal is compared with the stored contents of the flag register to determine the presence or absence of a timer failure.

さらに本発明のシステム監視装置は、２つのタイマからの割込み信号の入力に対応して、システム内に備えられているシステム時計が示す時刻を格納するためのメモリをさらに備え、プロセッサが前述のタイマ識別子の比較の結果と、次の割込み信号の入力時のシステム時計の示す時刻とメモリに格納されている前回の割込み信号入力時刻との時間差とに基づいて、二重化されたタイマのうちの故障タイマの識別を行う。 Furthermore, the system monitoring apparatus of the present invention further includes a memory for storing a time indicated by a system clock provided in the system in response to the input of interrupt signals from the two timers, and the processor includes the timer described above. Based on the result of the comparison of the identifiers and the time difference between the time indicated by the system clock when the next interrupt signal is input and the previous interrupt signal input time stored in the memory, the fault timer among the duplicated timers Identify.

このように本発明においては、基本的にタイマに対するタイムアウト時間の設定がタイマのリロード機能を用いて行われ、タイマに対するタイムアウト時間の設定のためのオーバヘッドを削減することができる。 In this way, in the present invention, the timeout time setting for the timer is basically performed using the reload function of the timer, and the overhead for setting the timeout time for the timer can be reduced.

また割込み信号を出力したタイマの識別子とフラグレジスタの格納内容との比較によって、例えば割込み信号を出力しなくなったタイマを識別することや、割込み信号の入力時点と前回の割込み信号の入力時点との比較によって、例えば設定されたタイムアウト時間としての２Ｔと大きく異なる時間間隔で割込みをあげた故障タイマを識別することが可能となる。 In addition, by comparing the identifier of the timer that output the interrupt signal and the stored contents of the flag register, for example, the timer that no longer outputs the interrupt signal can be identified, or the time when the interrupt signal is input and the time when the previous interrupt signal is input By comparison, for example, it becomes possible to identify a failure timer that raised an interrupt at a time interval greatly different from 2T as a set timeout time.

本発明によれば、タイマに対するタイムアウト時間のリロード機能を利用して、タイマに対するタイムアウト時間設定のオーバヘッドを従来技術に比較して大きく削減することが可能となる。またフラグレジスタの格納内容やシステム時計の示す時刻を利用して、二重化されたタイマのうちの故障タイマの識別を行うことが可能となり、一方のタイマが故障しても、故障していないタイマを利用してシステム動作を継続することも可能となり、二重化タイマを用いたシステム監視装置の実用性向上に寄与するところが大きい。 According to the present invention, it is possible to significantly reduce the time-out time setting overhead for the timer as compared with the prior art by using the time-out time reload function for the timer. It is also possible to identify the failure timer among the duplicated timers using the contents stored in the flag register and the time indicated by the system clock. It is also possible to continue system operation using this, which greatly contributes to improving the practicality of a system monitoring device using a duplex timer.

第１の従来例におけるシステム監視方式の構成ブロック図である。It is a block diagram of the system monitoring system in the first conventional example. 第１の従来例におけるタイムアウト時間設定方法の説明図である。It is explanatory drawing of the timeout time setting method in a 1st prior art example. 第２の従来例におけるシステム監視方式の構成ブロック図である。It is a configuration block diagram of a system monitoring method in a second conventional example. 第２の従来例におけるタイムアウト時間設定方法の説明図である。It is explanatory drawing of the timeout time setting method in the 2nd prior art example. 第１の実施例の構成ブロック図である。It is a block diagram of the configuration of the first embodiment. 第１の実施例におけるタイマ故障検出処理のメインルーチンのフローチャートである。It is a flowchart of the main routine of the timer failure detection process in a 1st Example. 図６に対する割込み処理１のフローチャートである。It is a flowchart of the interruption process 1 with respect to FIG. 図６に対する割込み処理２のフローチャートである。It is a flowchart of the interruption process 2 with respect to FIG. 高信頼タイマの第１の例の構成ブロック図である。It is a block diagram of a first example of a reliable timer. 高信頼タイマの第２の例の構成ブロック図である。It is a block diagram of a second example of the high reliability timer. 第２の実施例の構成ブロック図である。It is a block diagram of the configuration of the second embodiment. 第２の実施例における割込み処理１のフローチャートである。It is a flowchart of the interruption process 1 in a 2nd Example. 第２の実施例における割込み処理２のフローチャートである。It is a flowchart of the interruption process 2 in a 2nd Example. 第３の実施例の構成ブロック図である。It is a block diagram of the configuration of the third embodiment. 第３の実施例における割込み処理２のフローチャートである。It is a flowchart of the interruption process 2 in a 3rd Example.

図５は、本発明の第１の実施例の構成ブロック図である。同図において第１の実施例は２つのタイマ１、２、割込みコントローラ３、プロセッサ４を備え、プロセッサ４と２つのタイマ１、２との間はバス５によって接続され、タイマ１、２からはそれぞれ割込み通知信号が割込みコントローラ３に与えられ、割込みコントローラ３からはプロセッサ４に対して割込み制御信号が与えられる構成となっている。プロセッサ４は、タイマ１、２からの割込みに対応してタイマの故障の有無を判定するとともに、割込み処理として予め定められた作業を実行するために必要な処理を実行するＣＰＵ６を備えると共に、図示しないメモリとしてのＲＯＭやＲＡＭ、入出力部などを備えることは当然である。 FIG. 5 is a block diagram showing the configuration of the first embodiment of the present invention. In the figure, the first embodiment includes two timers 1 and 2, an interrupt controller 3, and a processor 4. The processor 4 and the two timers 1 and 2 are connected by a bus 5. An interrupt notification signal is supplied to the interrupt controller 3, and an interrupt control signal is supplied from the interrupt controller 3 to the processor 4. The processor 4 includes a CPU 6 that determines whether or not a timer has failed in response to an interrupt from the timers 1 and 2 and performs processing necessary for executing a predetermined operation as interrupt processing. Needless to say, a ROM, a RAM, an input / output unit, and the like are provided as the memories that do not.

図５において本発明の二重化タイマを用いたシステム監視装置は、タイマ１、２、および割込みコントローラ３によって構成されると考えることもでき、また図５の構成要素をすべて備えたプロセッサシステムとして１つのチップ上に形成されると考えることもできる。 In FIG. 5, the system monitoring apparatus using the duplex timer of the present invention can be considered to be composed of timers 1 and 2 and the interrupt controller 3, and one processor system having all the components shown in FIG. It can also be considered that it is formed on a chip.

なお割込みコントローラ３は、タイマ１とタイマ２とから同時に割込み（通知）信号が出力されたとき、あるいは片方のタイマから割込み信号が出力され、それに対応した割込み処理が実行されている間に他方のタイマから割込み信号が出力された場合の調停動作などを行い、その調停動作の結果として、割込み信号を出力したタイマの識別子などを示す割込み制御信号をプロセッサ４の内部のＣＰＵ６に出力するものであり、本発明のタイマ故障検出方式とは直接には関係のないものである。 The interrupt controller 3 outputs an interrupt signal (notification) from the timer 1 and the timer 2 at the same time, or an interrupt signal is output from one timer and the other interrupt processing is being executed. An arbitration operation is performed when an interrupt signal is output from the timer, and as a result of the arbitration operation, an interrupt control signal indicating the identifier of the timer that output the interrupt signal is output to the CPU 6 inside the processor 4. The timer failure detection method of the present invention is not directly related.

図６から図８は、第１の実施例におけるタイマ故障検出処理のフローチャートである。まず図６のメインルーチンにおいて処理が開始されると、ステップＳ１でタイマ１のタイムアウト時間がＴ、タイマ２のタイムアウト時間が２Ｔに設定される。この時、タイマ１はリロード機能がない状態に、タイマ２はリロード機能がある状態に設定される。リロード機能とはタイマがタイムアウトしたときに、例えばリロード用レジスタに予め設定されたタイムアウト時間を自動的にタイマに再設定し、タイマとしての機能を継続する機能であり、ステップＳ１でタイムアウト時間が２Ｔに設定されたタイマ２は時間２Ｔ毎にタイムアウトし、定期的に割込み信号を割込みコントローラ３に与えることになる。 6 to 8 are flowcharts of the timer failure detection process in the first embodiment. First, when processing is started in the main routine of FIG. 6, the timeout time of timer 1 is set to T and the timeout time of timer 2 is set to 2T in step S1. At this time, timer 1 is set to a state without a reload function, and timer 2 is set to a state with a reload function. The reload function is a function that automatically resets a timeout time preset in the reload register to the timer and continues the function as a timer when the timer times out. In step S1, the timeout time is 2T. The timer 2 set to the time-out time out every time 2T, and an interrupt signal is periodically given to the interrupt controller 3.

続いてステップＳ２でタイマ１からの割込み待ち状態となるが、ここでタイマ１から割込み信号が出力されたとき、すなわち割込みが上がった場合には、ＣＰＵ６によって割込み処理１が実行されるように、プロセッサ４の内部で図示しないメモリに保持されている割込みテーブルの内容が設定される。 Subsequently, in step S2, an interrupt waiting state from the timer 1 is entered. Here, when an interrupt signal is output from the timer 1, that is, when an interrupt is raised, the interrupt processing 1 is executed by the CPU 6. The contents of an interrupt table held in a memory (not shown) inside the processor 4 are set.

図６のメインルーチンのスタート時点から時間Ｔが経過すると、タイマ１がタイムアウトし、割込みをあげることによって、サブルーチンとしての図７の割込み処理１が開始される。この割込み処理１では、まずステップＳ６でタイマ１のタイムアウト時間がメインルーチンスタート時点のＴではなく２Ｔに設定され、タイマ１に対してもリロード機能ありの設定が行われる。さらに次の割込みの時点で、前回どちらのタイマから割込みがあがっていたかを識別するためのフラグとしての、プリービアス・タイマ・フラグを格納するフラグレジスタが、例えばタイマ１、タイマ２と割込みコントローラ３とによって構成されるシステム監視装置内に用意され、この時点で割り込みをあげたタイマ１の識別子としての“０”がそのフラグとして、プロセッサ４内のＣＰＵ６によって、バス５を介してフラグレジスタにセットされる。 When the time T elapses from the start point of the main routine of FIG. 6, the timer 1 times out and an interrupt is raised to start the interrupt process 1 of FIG. 7 as a subroutine. In the interrupt process 1, first, in step S6, the timeout time of the timer 1 is set to 2T instead of T at the time of starting the main routine, and the timer 1 is set with a reload function. Further, at the time of the next interrupt, a flag register for storing a previous timer flag as a flag for identifying which timer has been interrupted last time is, for example, timer 1, timer 2 and interrupt controller 3 "0" as the identifier of the timer 1 that raised the interrupt at this time is set as a flag in the flag register via the bus 5 by the CPU 6 in the processor 4 The

続いてステップＳ７で前述の割込みテーブルの内容が、割込みに対応して割込み処理２を実行するように変更され、タイマ割込みに対して本来実行すべき予め定められた作業を行うための処理が起動され、メインルーチンへのリターンが行われる。以上の処理によって、タイマ１とタイマ２は共通で同一の時間間隔２Ｔ毎に、かつその時間間隔の半分、すなわち時間Ｔだけ互いにずれた時点で、割込み信号を割込みコントローラ３に出力することになる。 Subsequently, in step S7, the contents of the above-described interrupt table are changed to execute the interrupt process 2 in response to the interrupt, and a process for performing a predetermined operation to be originally executed for the timer interrupt is started. And a return to the main routine is performed. As a result of the above processing, the timer 1 and the timer 2 are in common and output an interrupt signal to the interrupt controller 3 at the same time interval 2T and when they are shifted from each other by half the time interval, that is, by the time T. .

続いて図６のステップＳ３でタイマ１、またはタイマ２からの割込み待ちの状態となる。２つのタイマが正常に動作している場合には、タイマ２がさらに時間Ｔの後にタイムアウトし、割込みをあげることによって、割込み処理２が実行される。 Subsequently, in step S3 of FIG. 6, the timer 1 or the timer 2 waits for an interrupt. When the two timers are operating normally, the timer 2 further times out after time T, and interrupt processing 2 is executed by raising an interrupt.

図８は、サブルーチンとしての割込み処理２のフローチャートである。割込み処理２がスタートすると、まずステップＳ１０で割込みをあげたタイマの識別子（タイマＩＤ）がチェックされ、ステップＳ１１でそのタイマＩＤとプリービアス・タイマ・フラグの値が一致するか否かが判定される。 FIG. 8 is a flowchart of interrupt processing 2 as a subroutine. When the interrupt process 2 starts, first, the identifier (timer ID) of the timer that raised the interrupt is checked in step S10, and in step S11, it is determined whether or not the timer ID matches the value of the previous timer flag. .

タイマの動作が正常であれば、ここではタイマＩＤの値はタイマ２のＩＤとしての“１”であり、プリービアス・タイマ・フラグの値は“０”であるために、これらの値は一致せず、ステップＳ１２でプリービアス・タイマ・フラグの値が反転、すなわち“１”とされ、最後にタイマ割込みに対応して本来実行すべき作業に必要な処理が起動されて、メインルーチンへのリターンが行われる。 If the operation of the timer is normal, the value of the timer ID is “1” as the ID of timer 2 and the value of the previous timer flag is “0”. First, in step S12, the value of the previous timer flag is inverted, that is, “1”, and finally, the processing necessary for the work to be executed in response to the timer interrupt is started, and the return to the main routine is performed. Done.

タイマが正常動作を続けている間は、さらに時間Ｔの後にタイマ１がタイムアウトし、同様の割込み処理２が繰り返されることになる。タイマの故障として、例えばタイマ１が故障して割込みをあげなくなった場合には、タイマ２から連続して割込みがあがるようになる。あるいはタイマ１が設定された時間より短い時間でタイムアウトを起こしてしまうようになった場合には短くなった度合いにもよるが、ある程度時間が経過するとタイマ１から連続して割込みがあがるようになる。 While the timer continues to operate normally, after a time T, the timer 1 times out and the same interrupt process 2 is repeated. As a failure of the timer, for example, when the timer 1 fails and no interrupt is given, the interrupt is continuously raised from the timer 2. Alternatively, if the timer 1 is timed out in a shorter time than the set time, depending on the degree of the time-out, an interruption will be continuously generated from the timer 1 after a certain amount of time has elapsed. .

このようにタイマの故障によって、どちらかのタイマから連続して割込みがあがるようになると、図８のステップＳ１１でタイマＩＤとプリービアス・タイマ・フラグとの値が一致すると判定され、ステップＳ１３でシステムの停止が指示される。 As described above, when an interrupt is continuously generated from one of the timers due to the failure of the timer, it is determined in step S11 in FIG. 8 that the values of the timer ID and the previous timer flag match, and in step S13, the system is determined. Is stopped.

タイマ１とタイマ２の２つのタイマのうちでどちらのタイマが故障したのかを判別できれば、故障したタイマを切り離し、故障していないタイマを用いてシステム動作を継続することもできるが、前述のように故障の仕方によって、故障した方のタイマから連続して割込みがあがる場合と、故障していないタイマから連続して割込みがあがる場合との両方のケースが存在するために、この第１の実施例では故障タイマの識別を行うことができず、どちらかのタイマが故障した時点でシステム停止が行われる。 If it is possible to determine which of the two timers, timer 1 and timer 2, has failed, it is possible to disconnect the failed timer and continue the system operation using a non-failed timer. Depending on the failure method, there are both cases in which an interrupt is continuously generated from the timer that has failed and a case in which an interrupt is continuously generated from the timer that has not failed. In the example, the failure timer cannot be identified, and the system is stopped when one of the timers fails.

この第１の実施例では、機能的には従来例と同等のシステム監視機能を実現することができる。従来例では１回のタイマ割込み処理に対応して、タイマに対して少なくとも２回のタイムアウト時間の設定を毎回行うことが必要であったが、第１の実施例ではタイマのリロード機能を利用することによって、タイムアウト時間設定の回数を大幅に削減することができ、システム監視装置におけるタイムアウト時間設定のオーバヘッドを格段に少なくすることが可能となる。 In the first embodiment, a system monitoring function that is functionally equivalent to that of the conventional example can be realized. In the conventional example, it is necessary to set at least two timeout periods for the timer every time in response to one timer interrupt process. However, in the first embodiment, the timer reload function is used. As a result, the number of times of setting the timeout time can be greatly reduced, and the overhead of setting the timeout time in the system monitoring apparatus can be remarkably reduced.

図５の実施例ではリロード機能を持つ一般的なタイマを２個使うために、タイマ１に対して最初にタイムアウト時間としてＴを設定し、次に２Ｔを設定する処理が必要となる。処理をさらに簡略化するために図９、または図１０の高信頼タイマを使用することによって、例えばシステムの起動時にＣＰＵ６から２つのタイマ１、タイマ２に１つのコマンドを与えるだけで２つのタイマに対するタイムアウト時間の設定を１回で行うことが可能となる。 In the embodiment of FIG. 5, in order to use two general timers having a reload function, it is necessary to first set T as a timeout time for timer 1 and then set 2T. In order to further simplify the processing, the high-reliability timer shown in FIG. 9 or FIG. 10 is used. For example, when the system starts up, the CPU 6 gives two timers 1 and 2 only one command to the two timers. The timeout time can be set once.

図９は、高信頼タイマの第１の構成例のブロック図である。同図において高信頼タイマ１０は２つのタイマ１１と１２とによって構成され、図５のＣＰＵ６からはバス５を介して同一のコマンドが２つのタイマに対して与えられる。割込み信号がタイマ１１とタイマ１２とからそれぞれ割込みコントローラ３に与えられる点は図５と同じである。 FIG. 9 is a block diagram of a first configuration example of the high reliability timer. In the figure, the high-reliability timer 10 is composed of two timers 11 and 12, and the same command is given to the two timers from the CPU 6 of FIG. An interrupt signal is supplied from the timer 11 and the timer 12 to the interrupt controller 3, respectively, as in FIG.

２つのタイマ１１、１２のうちで、タイマ１２はリロード機能を持つ一般的なタイマと同一の構成を持っている。すなわちタイマ１２に対してＣＰＵ６からタイムアウト時間として２Ｔを設定するためのコマンドが与えられると、その値２Ｔがリロード用レジスタ１５に格納されると共に、例えばセレクタを介してカウンタ１６に対して設定される。カウンタ１６がダウンカウンタであるとすると、カウントダウンしてカウント時間が２Ｔに達し、カウント値が“０”となった時点で、０検出回路１７から割込み信号が割込みコントローラ３に対して出力されることになる。そしてこの０検出の時点で、０検出回路１７からカウンタ１６の図示しないセット端子に対してセット信号が与えられ、リロード用レジスタ１５の格納内容がカウンタ１６にセットされ、カウントダウンの動作が続行される。 Of the two timers 11 and 12, the timer 12 has the same configuration as a general timer having a reload function. That is, when a command for setting 2T as a timeout time is given from the CPU 6 to the timer 12, the value 2T is stored in the reload register 15 and set to the counter 16 via a selector, for example. . Assuming that the counter 16 is a down counter, when the count time reaches 2T and the count value reaches “0”, an interrupt signal is output from the 0 detection circuit 17 to the interrupt controller 3. become. At the time of detection of 0, a set signal is given from the 0 detection circuit 17 to a set terminal (not shown) of the counter 16, the stored contents of the reload register 15 are set in the counter 16, and the countdown operation is continued. .

これに対してタイマ１１は、本発明に特有の構成として、さらに右１ビットシフト回路１８を備えている。このタイマ１１においては、バス５を介してＣＰＵ６から時間２Ｔを設定するコマンドが与えられると、右１ビットシフト回路１８によってその値を２で割る演算が実行され、実行結果の時間Ｔの値が、例えばセレクタを介してカウンタ１６に設定され、カウンタ１６のカウント時間がＴに達し、カウント値が“０”となると、０検出回路１７から割込み信号が割込みコントローラ３に与えられる。またコマンドの入力時点でリロード用レジスタ１５には時間２Ｔの値が格納される。そして０検出回路１７からカウンタ１６に対してセット信号が与えられた時点で、セレクタを介してリロード用レジスタ１５に格納されている時間２Ｔの値がカウンタ１６に設定され、カウンタ１６のカウントダウンの動作が行われることになる。 On the other hand, the timer 11 further includes a right 1-bit shift circuit 18 as a configuration unique to the present invention. In the timer 11, when a command for setting the time 2T is given from the CPU 6 via the bus 5, the right 1-bit shift circuit 18 performs an operation of dividing the value by 2, and the value of the time T as the execution result is calculated. For example, the counter 16 is set via a selector, and when the count time of the counter 16 reaches T and the count value becomes “0”, an interrupt signal is given from the 0 detection circuit 17 to the interrupt controller 3. At the time of command input, the reload register 15 stores the value of time 2T. When a set signal is given from the 0 detection circuit 17 to the counter 16, the value of the time 2T stored in the reload register 15 is set in the counter 16 via the selector, and the countdown operation of the counter 16 is performed. Will be done.

図１０は、高信頼タイマの第２の例の構成ブロック図である。同図において高信頼タイマ２０は２つのタイマ２１と２２によって構成されている。これらの２つのタイマはいずれも従来から使用されている一般的なものではなく、それぞれ左１ビットシフト回路２５を備え、本発明に特有の構成を持っている。 FIG. 10 is a configuration block diagram of a second example of the high reliability timer. In the figure, the high-reliability timer 20 includes two timers 21 and 22. These two timers are not general ones conventionally used, and each has a left 1-bit shift circuit 25 and has a configuration unique to the present invention.

図１０の高信頼タイマ２０に対しては、図５のＣＰＵ６から時間Ｔをタイムアウト時間として設定することを指示するコマンドが与えられる。タイマ２１側ではその時間Ｔの値がカウンタ１６に設定されると共に、２を乗算する演算を行う左１ビットシフト回路２５によって２倍された値２Ｔがリロード用レジスタ１５に格納される。そしてカウンタ１６のカウント値が“０”になった時点で、前述と同様にリロード用レジスタ１５の内容がカウンタ１６にセットされることになる。 A command instructing to set the time T as a timeout time is given to the high-reliability timer 20 in FIG. 10 from the CPU 6 in FIG. On the timer 21 side, the value of the time T is set in the counter 16, and the value 2T doubled by the left 1-bit shift circuit 25 that performs an operation of multiplying 2 is stored in the reload register 15. When the count value of the counter 16 becomes “0”, the contents of the reload register 15 are set in the counter 16 as described above.

タイマ２２側では、ＣＰＵ６からのコマンドの入力時点で左１ビットシフト回路２５によって２Ｔの値が得られ、その値がそのままカウンタ１６に設定され、同時にリロード用レジスタ１５にも２Ｔの値が格納される。カウンタ１６がカウントダウンし、０検出回路１７によってカウント値としての“０”が検出されると、リロード用レジスタ１５に格納されている内容がカウンタ１６にセットされ、カウントダウンの動作が続行される。 On the timer 22 side, the value of 2T is obtained by the left 1-bit shift circuit 25 when the command is input from the CPU 6, and the value is set as it is in the counter 16. At the same time, the value of 2T is also stored in the reload register 15. The When the counter 16 counts down and “0” as a count value is detected by the 0 detection circuit 17, the contents stored in the reload register 15 are set in the counter 16 and the countdown operation is continued.

以上に説明した第１の実施例では、２つのタイマのうち故障したタイマの識別を行うことができないため、タイマの故障が検出された時点でシステムの動作が停止される。これに対して、２つのタイマのうちで故障したタイマの識別を可能とし、一方のタイマが故障しても、他方のタイマの機能を利用してシステムの動作を続行することが可能な実施例を第２の実施例として説明する。なお、第１の実施例においてフラグレジスタはシステム監視装置内に備えられるものとしたが、図５のプロセッサ４内にフラグレジスタを備えることも当然可能である。 In the first embodiment described above, the failed timer of the two timers cannot be identified, so that the system operation is stopped when a timer failure is detected. On the other hand, an embodiment in which the failure of one of the two timers can be identified, and even if one of the timers fails, the operation of the system can be continued using the function of the other timer. Will be described as a second embodiment. Although the flag register is provided in the system monitoring apparatus in the first embodiment, it is naturally possible to provide the flag register in the processor 4 of FIG.

図１１は、第２の実施例の構成ブロック図である。同図を第１の実施例を示す図５と比較すると、システム全体としての統一的な時刻を示すシステム時計３０がバス５にさらに接続されている点が異なっている。そしてこの第２の実施例では、プロセッサ４の内部のＣＰＵ６が、実施例１におけるタイマＩＤとプリービアス・タイマ・フラグの値との比較に加えて、割込み制御信号が割込みコントローラ３から与えられた時刻と、例えばシステム監視装置内で図示しないメモリに格納されている前回の割込み制御信号の入力時刻（Ｔｐｒｅｖ）の値を比較することによって、２つのタイマのうちでどちらのタイマが故障したかを判別する処理を行うことになる。 FIG. 11 is a block diagram of the configuration of the second embodiment. 5 is different from FIG. 5 showing the first embodiment in that a system clock 30 indicating a unified time for the entire system is further connected to the bus 5. In the second embodiment, the CPU 6 in the processor 4 compares the timer ID and the value of the previous timer flag in the first embodiment with the time when the interrupt control signal is given from the interrupt controller 3. And, for example, by comparing the value of the previous interrupt control signal input time (Tprev) stored in a memory (not shown) in the system monitoring device, it is determined which of the two timers has failed. Will be processed.

第２の実施例におけるタイマ故障検出処理のメインルーチンのフローチャートは第１の実施例に対する図６と同じであり、その説明を省略する。
図１２は、第２の実施例における割込み処理１のフローチャートである。同図の処理は、第１の実施例における図６のステップＳ２と同様に、タイマ１からの割込みがあがった時点で開始される処理であり、まずステップＳ１６で図７のステップＳ６と同様にタイマ１のタイムアウト時間が２Ｔに設定され、またリロード機能ありの状態が設定され、プリービアス・タイマ・フラグの値がタイマ１の識別子を示す“０”に設定された後に、ステップＳ１７で図７のステップＳ７における処理に加えて、システム時計の示す現在時刻の値Ｔｎｏｗがリードされ、その値がＴｐｒｅｖの値を格納する、図示しないメモリに格納されて、メインルーチンへのリターンが行われる。The flowchart of the main routine of the timer failure detection process in the second embodiment is the same as that in FIG. 6 for the first embodiment, and a description thereof will be omitted.
FIG. 12 is a flowchart of interrupt processing 1 in the second embodiment. The process shown in FIG. 6 is started when an interrupt from the timer 1 is generated, as in step S2 of FIG. 6 in the first embodiment. First, in step S16, as in step S6 of FIG. After the time-out time of the timer 1 is set to 2T, the state with the reload function is set, and the value of the previous timer flag is set to “0” indicating the identifier of the timer 1, step S17 in FIG. In addition to the processing in step S7, the value Tnow of the current time indicated by the system clock is read, the value is stored in a memory (not shown) that stores the value of Tprev, and the process returns to the main routine.

図１３は、第２の実施例における割込み処理２のフローチャートである。メインルーチンとしての図６のステップＳ３でタイマ１、またはタイマ２からの割込みがあがると、ステップＳ２０で割込みをあげたタイマの識別子（タイマＩＤ）がチェックされ、また図１１のシステム時計３０の指示する時刻の値（Ｔｎｏｗ）がリードされ、ステップＳ２１でタイマＩＤとプリービアス・タイマ・フラグの値が一致するか否かが判定される。 FIG. 13 is a flowchart of interrupt processing 2 in the second embodiment. When an interrupt from timer 1 or timer 2 is raised in step S3 of FIG. 6 as the main routine, the identifier (timer ID) of the timer that raised the interrupt is checked in step S20, and the instruction of the system clock 30 of FIG. The time value (Tnow) to be read is read, and in step S21, it is determined whether or not the timer ID matches the value of the previous timer flag.

両者の値が一致しない場合にはタイマの動作は正常なものであると判定され、ステップＳ２２でプリービアス・タイマ・フラグの値が反転され、ステップＳ２３でＴｐｒｅｖの値を格納するメモリにＴｎｏｗの値が代入され、タイマ割込みに対して実行すべき本来の作業に必要な処理が起動されて、メインルーチンへのリターンが行われる。２つのタイマが正常に動作している間は、割込み処理２としてステップＳ２０からステップＳ２３までの処理が繰り返される。 If the two values do not match, it is determined that the operation of the timer is normal, the value of the previous timer flag is inverted in step S22, and the value of Tnow is stored in the memory storing the value of Tprev in step S23. Is substituted, a process necessary for the original work to be executed in response to the timer interrupt is started, and a return to the main routine is performed. While the two timers are operating normally, the processing from step S20 to step S23 is repeated as interrupt processing 2.

タイマ１、タイマ２のいずれかに故障が発生すると、前述のように同一のタイマから連続して割込みがあがることになる。同一のタイマから割込みがあがった場合には、ステップＳ２１でタイマＩＤの値とプリービアス・タイマ・フラグの値が一致すると判定され、ステップＳ２４で現在の時刻ＴｎｏｗとＴｐｒｅｖの値を格納するメモリの値との差がＴｄｉｆｆとして計算され、ステップＳ２５でその値が２つのタイマの定常的な割込み周期２Ｔ程度に一致するか否かが判定される。実際にはある程度の誤差があると考えられるので、例えば±１０％程度の範囲であればＴｄｉｆｆと２Ｔとが一致したものと判定され、ステップＳ２６で割込みを上げなかった側のタイマが故障したものとしてそのタイマが切り離され、割込みをあげた方のタイマのタイムアウト時間がＴに再設定され、リロード機能ありの状態とされる。そしてＴｐｒｅｖを格納するメモリにＴｎｏｗの値が代入され、タイマ割込みに対する本来の作業に必要な処理が起動されて、メインルーチンへのリターンが行われる。 When a failure occurs in either the timer 1 or the timer 2, interrupts are continuously generated from the same timer as described above. If an interrupt is generated from the same timer, it is determined in step S21 that the timer ID value matches the value of the previous timer flag, and in step S24, the value of the memory that stores the current time Tnow and Tprev values. Is calculated as Tdiff, and it is determined in step S25 whether or not the value matches the steady interrupt period 2T of the two timers. Actually, it is considered that there is a certain amount of error. For example, if it is in the range of about ± 10%, it is determined that Tdiff matches 2T, and the timer that did not raise an interrupt in step S26 has failed. The timer is disconnected, the timeout time of the timer that gave the interrupt is reset to T, and the reload function is enabled. Then, the value of Tnow is assigned to the memory for storing Tprev, processing necessary for the original operation for the timer interrupt is started, and the process returns to the main routine.

ステップＳ２１でタイマＩＤの値とプリービアス・タイマ・フラグの値が一致したと判定された以後のステップＳ２５の判定において、Ｔｄｉｆｆの値が２Ｔに一致しないと判定されると、ステップＳ２７でＴｄｉｆｆの値がＴ程度であるか否かが±１０％の誤差の範囲で判定され、両者が一致していると判定されると、以前にステップＳ２６でタイムアウト時間がＴに再設定された、切り離されていない方のタイマが正常に動作しているものとして、ステップＳ２３でＴｐｒｅｖを格納するメモリにＴｎｏｗの値が代入され、タイマ割込みに対応する本来の作業に必要な処理が起動されて、メインルーチンへのリターンが行われる。 If it is determined in step S25 that the value of the timer ID does not match 2T in the determination in step S25 after it is determined in step S21 that the value of the timer ID matches the value of the previous timer flag, the value of Tdiff in step S27. Is determined within a range of error of ± 10%, and if it is determined that they match, the time-out time has been reset to T in step S26 before. Assuming that the other timer is operating normally, the value of Tnow is assigned to the memory storing Tprev in step S23, and the processing necessary for the original work corresponding to the timer interrupt is started, and the process returns to the main routine. Return is made.

タイマが設定タイムアウト時間よりも短い時間で割込みをあげるような壊れ方をすると、Ｔｄｉｆｆが、例えば２Ｔにも、またＴにも一致しない値となる。例えばメインルーチンの図６のステップＳ２でタイマ１からの割込みがあがり、図１２の割込み処理１が終了してメインルーチンのステップＳ３で、タイマ２からの割込みがあがる前にタイマ１からの割込みがあがったとすると、図１０のステップＳ２１でタイマＩＤとプリービアス・タイマ・フラグの値が一致すると判定されるが、Ｔｄｉｆｆの値が２Ｔにも、またＴにも一致しないものとすると、ステップＳ２８の処理に移行し、割込みをあげなかった方のタイマ、ここではタイマ２が稼働中であるか否かが判定され、稼働中である場合には、ステップＳ２９で割込みをあげた側のタイマ、すなわちタイマ１が切り離されて、メインルーチンへのリターンが行われる。この時、割込みをあげなかった方のタイマ２が稼働中でない場合には、ステップＳ３０で両方のタイマが故障したものと判定され、システム停止の指示が行われる。 If the timer breaks in such a way that an interrupt is raised in a time shorter than the set timeout time, Tdiff becomes a value that does not match 2T or T, for example. For example, an interrupt from the timer 1 is raised in step S2 of FIG. 6 of the main routine, the interruption process 1 of FIG. 12 is completed, and an interruption from the timer 1 is interrupted before an interruption from the timer 2 is raised in step S3 of the main routine. If it is determined that the timer ID and the value of the previous timer flag match in step S21 of FIG. 10, if the value of Tdiff does not match 2T or T, the process of step S28 is performed. It is determined whether or not the timer that has not raised an interrupt, here the timer 2 is in operation, and if it is in operation, the timer that has issued the interrupt in step S29, that is, the timer 1 is disconnected and a return to the main routine is performed. At this time, if the timer 2 that has not given an interrupt is not in operation, it is determined in step S30 that both timers have failed, and an instruction to stop the system is issued.

ステップＳ２９でタイマ１が切り離され、タイマ２が稼働中である状態で再びメインルーチンの図６のステップＳ３でタイマ２からの割込みがあがると、ステップＳ２１においてタイマＩＤはタイマ２の識別子であり、プリービアス・タイマ・フラグの値はタイマ１の識別子のままとなっているために、両者は一致しないと判定され、ステップＳ２２でプリービアス・タイマ・フラグの値が反転され、ステップＳ２３でＴｐｒｅｖを格納するメモリにＴｎｏｗが格納され、タイマ割込みに対する本来の作業に必要な処理が起動され、メインルーチンへのリターンが行われる。 When the timer 1 is disconnected in step S29 and the timer 2 is in operation, if the interrupt from the timer 2 is raised again in step S3 of FIG. 6 of the main routine, the timer ID is the identifier of the timer 2 in step S21. Since the value of the previous timer flag remains the identifier of timer 1, it is determined that they do not match, the value of the previous timer flag is inverted in step S22, and Tprev is stored in step S23. Tnow is stored in the memory, processing necessary for the original operation for the timer interrupt is started, and a return to the main routine is performed.

すでにタイマ１が切り離されているために、メインルーチンの図６のステップＳ３で検出される次の割込みはタイマ２からの割込みである。この割込みがあがると、ステップＳ２１でタイマＩＤとプリービアス・タイマ・フラグとの値が一致していると判定され、ステップＳ２４で求められるＴｄｉｆｆの値がステップＳ２５で２Ｔ程度であると判定され、ステップＳ２６の処理の後にメインルーチンへのリターンが行われる。ただしここでは割込みをあげなかった側のタイマ、すなわちタイマ１はすでに切り離されているので、タイマ切り離しの処理は省略され、その他の処理が行われた後にメインルーチンへのリターンが行われる。 Since timer 1 has already been disconnected, the next interrupt detected in step S3 of FIG. 6 of the main routine is an interrupt from timer 2. When this interruption occurs, it is determined in step S21 that the values of the timer ID and the previous timer flag match, the value of Tdiff obtained in step S24 is determined to be about 2T in step S25, After the process of S26, a return to the main routine is performed. However, since the timer that did not raise the interrupt, that is, the timer 1, has already been disconnected, the timer disconnecting process is omitted, and the return to the main routine is performed after the other processes are performed.

以上説明したように、この第２の実施例ではシステム全体で統一的な時刻を示すシステム時計を活用することによって、２つのタイマのうちどちらが故障したかの判別が可能となり、１つのタイマが故障してもシステム動作を継続することが可能となる。ただし１つのタイマによる監視機能を用いてシステム動作を継続している場合には、例えば残ったタイマが故障して割込みをあげなくなってもそれを検出できず、故障検出機能が十分に働かないために、１つのタイマを用いての動作は、例えばシステムの停止が困難であるときの緊急避難的な処置である。基本的には片方のタイマの故障検出時に、例えばアラームを発生することによってシステムの管理者に注意を促し、システムを停止できるタイミングになったらシステムの動作を停止し、故障したタイマを含む基板の交換などの修理を行うことが必要である。 As described above, in the second embodiment, it is possible to determine which of the two timers has failed by utilizing a system clock that indicates a uniform time in the entire system, and one timer fails. Even then, the system operation can be continued. However, if system operation is continued using a monitoring function with one timer, for example, even if the remaining timer fails and no interrupt is raised, it cannot be detected and the failure detection function does not work sufficiently. The operation using one timer is an emergency evacuation procedure when it is difficult to stop the system, for example. Basically, when a failure of one timer is detected, for example, an alarm is generated to alert the system administrator, and when it is time to stop the system, the operation of the system is stopped. It is necessary to perform repairs such as replacement.

また第２の実施例では、システム内の時刻を統一的に示すシステム時計を利用してタイマ故障の判別を行ったが、必ずしもシステム時計を使う必要はなく、同様のカウンタなどがシステムに搭載されていれば、それを使うことも当然可能である。さらにシステム時計の示す時刻を格納するメモリを、システム監視装置内でなく、図１１のプロセッサ４の内部に備えることも当然可能である。 In the second embodiment, a timer failure is determined using a system clock that uniformly indicates the time in the system. However, it is not always necessary to use the system clock, and a similar counter is mounted on the system. Of course, it is possible to use it. Furthermore, it is naturally possible to provide a memory for storing the time indicated by the system clock in the processor 4 of FIG. 11 instead of in the system monitoring apparatus.

次に第３の実施例について説明する。この第３の実施例は、本発明のシステム監視方式を高信頼組込みマルチプロセッサシステムに適用したものである。図１４は、第３の実施例の構成ブロック図である。同図においては複数、ここでは４個のプロセッサ・エレメント（ＰＥ）４_０から４_３、および共有メモリ３５がマルチプロセッサシステムを構成しており、各プロセッサ・エレメント４_０から４_３に対して、図５の第１の実施例におけると同様に割込みコントローラ３が接続される構成となっている。Next, a third embodiment will be described. In the third embodiment, the system monitoring system of the present invention is applied to a highly reliable embedded multiprocessor system. FIG. 14 is a configuration block diagram of the third embodiment. In the figure a plurality, here four processor elements (PE) _{4 0} _{4 3,} and the shared memory 35 constitute a multi-processor system, for each processor element _{4 _0-4} _3, As in the first embodiment of FIG. 5, the interrupt controller 3 is connected.

この第３の実施例では、マルチプロセッサシステムとしての信頼性を向上させるために、４つのプロセッサ・エレメント（ＰＥ）４_０から４_３が、ある決められた時間毎に共有メモリ３５内の所定のデータ、すなわち生存情報を更新するものとする。各ＰＥはタイマからの割込みに対応して起動されるチェックルーチンによって、共有メモリ３５に書き込まれた各ＰＥの生存情報をチェックし、更新されていないものがあればＰＥは故障しているものと判断する。In the third embodiment, in order to improve the reliability of the multi-processor system, four processor elements (PE) 4 ₀ 4 _3, there determined time each predetermined in the shared memory 35 to Data, that is, survival information is updated. Each PE checks the survival information of each PE written in the shared memory 35 by a check routine that is activated in response to an interrupt from the timer, and if any PE has not been updated, the PE is faulty. to decide.

共有メモリ３５に書き込まれる生存情報は、タイマからの割込みがある毎に更新されるものであればどのようなデータでもよく、各ＰＥに内蔵されるローカルなタイマの値を使うことも可能である。第３の実施例でタイマを二重化していない場合には、タイマが故障してしまうと各ＰＥの内部でチェックルーチンが起動されず、ＰＥの故障を検出することができなくなる。 The survival information written in the shared memory 35 may be any data as long as it is updated every time there is an interrupt from the timer, and a local timer value built in each PE can also be used. . When the timer is not duplicated in the third embodiment, if the timer fails, the check routine is not started inside each PE, and it becomes impossible to detect the PE failure.

この第３の実施例では複数個、ここでは４個のＰＥの中でマスタとなるＰＥを決めておき、マスタＰＥが故障したＰＥを切り離すことによって、システムの信頼性を確保するものとする。マスタＰＥの決め方はどのような方法を用いてもよく、例えば識別子（ＩＤ）が最も小さいＰＥがマスタとなるというルールを用いることもできる。マスタＰＥが故障する場合もあるため、例えばマスタＰＥの次にＩＤが小さいＰＥが次のマスタ候補になるという規則を決めておき、マスタＰＥが故障した場合には次のマスタ候補のＰＥがマスタＰＥを切り離し、以後マスタＰＥとして動作するものとする。 In the third embodiment, the master PE is determined among a plurality of PEs, here four PEs, and the master PE is separated from the faulty PE, thereby ensuring the reliability of the system. Any method may be used for determining the master PE. For example, a rule that the PE with the smallest identifier (ID) becomes the master may be used. Since the master PE may fail, for example, a rule that the PE with the next smallest ID after the master PE becomes the next master candidate is determined. If the master PE fails, the next master candidate PE becomes the master candidate. It is assumed that the PE is disconnected and thereafter operates as a master PE.

第３の実施例におけるタイマ故障検出処理のメインルーチン、および割込み処理１のフローチャートは第１の実施例に対する図６、図７と同じであるものとする。ただしここでは図１４のＰＥ４_０から４_３のそれぞれが、図６のメインルーチン、図７の割込み処理１、および図１５で説明する割込み処理２を基本的に実行するものとし、前述のように、例えば故障ＰＥの切り離しやシステム全体の緊急停止指示などに必要な処理は、マスタＰＥだけが行うものとする。メインルーチン、割込み処理１、割込み処理２のすべてをマスタＰＥだけが実行することも可能であるが、マスタＰＥが故障した場合の処理の引継ぎなどが面倒になるため、ここではメインルーチンを含む処理の大部分が各ＰＥによって並列的に実行されるものとして、フローチャートを説明する。The main routine of the timer failure detection process in the third embodiment and the flowchart of the interrupt process 1 are the same as those in FIGS. 6 and 7 for the first embodiment. However where the respective PE4 ₀ 4 ₃ of FIG. 14 main routine of FIG. 6, the interrupt processing 1 in Fig. 7, and the interrupt handling 2 as described in FIG. 15 shall be essentially performed, as described above For example, it is assumed that only the master PE performs processing necessary for detaching a faulty PE or for an emergency stop instruction for the entire system. Although it is possible for only the master PE to execute all of the main routine, interrupt processing 1, and interrupt processing 2, since it is troublesome to take over the processing when the master PE fails, the processing including the main routine is performed here. The flowchart will be described on the assumption that most of these are executed in parallel by each PE.

図１５は、第３の実施例における割込み処理２のフローチャートである。メインルーチン、すなわち図６のステップＳ３でタイマ１、またはタイマ２からの割込みがあがると、割込み処理２がスタートし、まずステップＳ３５で割込みをあげたタイマのＩＤがチェックされ、ステップＳ３６で自ＰＥの生存情報を含めて、すべてのＰＥの生存情報（各ＰＥ個別の共有メモリの値）がチェックされ、ステップＳ３７で故障と判定されたＰＥの数が“０”、“１”、またはそれ以上のいずれであるかが判定される。 FIG. 15 is a flowchart of interrupt processing 2 in the third embodiment. When there is an interrupt from the timer 1 or timer 2 in the main routine, ie, step S3 in FIG. 6, the interrupt process 2 starts. First, in step S35, the ID of the timer that raised the interrupt is checked. The survival information of all PEs (the value of each PE's individual shared memory), including the survival information of each PE, is checked, and the number of PEs determined to be failed in step S37 is “0”, “1”, or more. Is determined.

故障したＰＥの数が“０”である場合には当然マスタＰＥも正常であり、各ＰＥはステップＳ３８で自分がマスタＰＥであるか否かを判定し、マスタＰＥでない場合にはメインルーチンへのリターン動作を実行する。そしてマスタＰＥだけがステップＳ３９以降の処理を実行する。 If the number of failed PEs is “0”, the master PE is also normal, and each PE determines whether or not it is a master PE in step S38. Execute the return operation. Only the master PE executes the processing from step S39.

すなわちマスタＰＥによってステップＳ３９で、ステップＳ３５でチェックされたタイマＩＤとプリービアス・タイマ・フラグの値が一致するか否かが判定され、一致しない場合にはタイマの動作が正常であるため、ステップＳ４０でプリービアス・タイマ・フラグの値が反転されて、メインルーチンへのリターンが行われる。 That is, in step S39, the master PE determines whether or not the timer ID checked in step S35 matches the value of the previous timer flag. If not, the timer operates normally. As a result, the value of the previous timer flag is inverted and the process returns to the main routine.

ステップＳ３９でタイマＩＤとプリービアス・タイマ・フラグの値が一致する場合には、ステップＳ４１からＳ４７で、第２の実施例に対する図１３のステップＳ２４からＳ３０までに類似した処理が行われる。すなわち第２の実施例でシステム時計の示す時刻を用いてＴｄｉｆｆが計算されたのに対して（もちろん本実施例でもシステム時計の値を用いても構わない）、第３の実施例ではステップＳ４１で共有メモリに格納された生存情報からＴｄｉｆｆの値が計算され、ステップＳ４２でその値が±１０％以内の誤差を含んで２Ｔ程度であるか否かが判定され、２Ｔ程度であれば、ステップＳ４３で割込みをあげなかった側のタイマが切り離され、割込みを上げたほうのタイマのタイムアウト時間がＴに再設定された後に、ステップＳ４０でプリービアス・タイマ・フラグの値が反転されて、メインルーチンへのリターンが行われる。 If the timer ID and the value of the previous timer flag match in step S39, processing similar to that in steps S24 to S30 in FIG. 13 for the second embodiment is performed in steps S41 to S47. That is, Tdiff is calculated using the time indicated by the system clock in the second embodiment (of course, the system clock value may be used in this embodiment as well), but in the third embodiment, step S41 is performed. In step S42, the value of Tdiff is calculated from the survival information stored in the shared memory. In step S42, it is determined whether the value is about 2T including an error within ± 10%. In S43, the timer that did not raise the interrupt is disconnected, and the timeout time of the timer that raised the interrupt is reset to T. Then, in step S40, the value of the previous timer flag is inverted, and the main routine Return to is made.

Ｔｄｉｆｆの値が２Ｔ程度でない場合には、ステップＳ４４でその値がＴ程度であるか否かが判定され、Ｔ程度である場合にはすでに１つのタイマが切り離され、残りのタイマによって動作が継続されているものと判定されて、メインルーチンへのリターンが行われる。Ｔ程度でない場合には、ステップＳ４５で割込みをあげなかったほうのタイマが稼働中であるか否かが判定され、稼働中である場合にはステップＳ４６で割込みをあげた側のタイマが切り離されて、メインルーチンへのリターンが行われる。割込みをあげなかった方のタイマが稼働中でない場合には、２つのタイマがともに故障したことになるため、ステップＳ４７でシステムの緊急停止指示が行われる。 If the value of Tdiff is not about 2T, it is determined in step S44 whether or not the value is about T. If it is about T, one timer is already disconnected, and the operation continues with the remaining timers. It is determined that it has been performed, and a return to the main routine is performed. If it is not about T, it is determined in step S45 whether or not the timer that did not raise an interrupt is in operation, and if it is in operation, the timer that raised the interrupt in step S46 is disconnected. Return to the main routine. If the timer that did not raise the interrupt is not in operation, both timers have failed, and an emergency stop instruction for the system is issued in step S47.

ステップＳ３７で故障と判定されたＰＥの数が１個である場合には、ステップＳ５０で故障したＰＥがマスタＰＥであるか否かが判定され、マスタＰＥでない場合にはステップＳ５１で自分がマスタＰＥであるか否かが判定され、マスタＰＥでない場合にはメインルーチンへのリターンが行われる。 If the number of PEs determined to be faulty in step S37 is one, it is determined in step S50 whether the faulty PE is the master PE. It is determined whether or not it is a PE, and if it is not a master PE, a return to the main routine is performed.

ステップＳ５０でマスタＰＥが故障したと判定されると、ステップＳ５２で自分が次のマスタＰＥの候補であるか否かが判定され、その候補でない場合にはメインルーチンへのリターンが行われ、その候補である場合、およびステップＳ５１で自分がマスタＰＥであると判定された場合には、ステップＳ５３でマスタＰＥ（あるいは新しいマスタＰＥ）によって故障ＰＥが切り離され、ステップＳ４０でプリービアス・タイマ・フラグの値が反転されて、メインルーチンへのリターンが行われる。 If it is determined in step S50 that the master PE has failed, it is determined in step S52 whether or not it is a candidate for the next master PE, and if it is not the candidate, a return to the main routine is performed. If it is a candidate and if it is determined in step S51 that it is a master PE, the faulty PE is separated by the master PE (or a new master PE) in step S53, and the previous timer flag is set in step S40. The value is inverted and a return to the main routine is made.

ここで故障と判定されたＰＥの数が１個の場合には、タイマＩＤとプリービアス・タイマ・フラグの値の比較などのタイマ故障検出のための処理は実行されないものとする。すなわち、例えば１ｍｓ程度の短い時間間隔でＰＥの故障判定を繰り返すものとすれば、その短い時間間隔の間にＰＥとタイマとが共に故障する確率は非常に小さいものと考えられるため、本実施例では故障と判定されたＰＥの数が１個だけの場合には、タイマの故障検出に必要な処理を行わないものとする。 Here, when the number of PEs determined to be faulty is one, processing for timer fault detection such as comparison of the timer ID and the value of the previous timer flag is not executed. That is, for example, if the PE failure determination is repeated at a short time interval of about 1 ms, it is considered that the probability that both the PE and the timer will fail during the short time interval is very small. In the case where the number of PEs determined to be faulty is only one, processing necessary for timer fault detection is not performed.

ステップＳ３７で故障と判定されたＰＥの数が２個以上の場合には、ステップＳ６０でタイマＩＤとプリービアス・タイマ・フラグの値とが比較され、一致している場合にはタイマが故障して所定の周期より短い時間で連続して割込みをあげ、各ＰＥの生存情報が更新されなかったものと判断する。そしてステップＳ６１で自分がマスタＰＥであるか否かが判定され、マスタＰＥでない場合にはメインルーチンへのリターンが行われる。マスタＰＥである場合には、ステップＳ６２で割込みをあげなかった方のタイマが稼働中であるか否かが判定され、稼働中でない場合には両方のタイマが故障したことになるのでステップＳ６３でシステムに対して緊急停止が指示される。稼働中である場合には、ステップＳ６４で割込みをあげたタイマが切り離され、割込みをあげなかった方のタイマのタイムアウト時間がＴに再設定され、ステップＳ４０でプリービアス・タイマ・フラグの値が反転された後に、メインルーチンへのリターンが行われる。 If the number of PEs determined to be faulty in step S37 is 2 or more, the timer ID is compared with the value of the previous timer flag in step S60. If they match, the timer has failed. Interrupts are continuously raised in a time shorter than a predetermined cycle, and it is determined that the survival information of each PE has not been updated. In step S61, it is determined whether or not it is the master PE. If it is not the master PE, a return to the main routine is performed. If it is the master PE, it is determined in step S62 whether or not the timer that did not raise an interrupt is operating. If not, both timers have failed, so in step S63. An emergency stop is instructed to the system. If it is in operation, the timer that raised the interrupt in step S64 is disconnected, the timeout time of the timer that did not raise the interrupt is reset to T, and the value of the previous timer flag is inverted in step S40 After that, a return to the main routine is performed.

ステップＳ６０でタイマＩＤとプリービアス・タイマ・フラグの値が一致しないと判定されると、本当に複数のＰＥが同時に故障したと判断し、以下のステップで緊急停止処理を行う。すなわちステップＳ６５で自分が故障していない正常のＰＥで、かつ正常ＰＥの中で最もＩＤが小さいＰＥかが判定され、この２つの条件を満たす場合にはステップＳ６３で緊急停止の指示が行われる。これは図１４で４個のＰＥのうち２個が故障した状態に対して緊急停止が指示されるものである。 If it is determined in step S60 that the values of the timer ID and the previous timer flag do not match, it is determined that a plurality of PEs have failed at the same time, and emergency stop processing is performed in the following steps. That is, in step S65, it is determined whether the PE is a normal PE that has not failed and the PE having the smallest ID among the normal PEs. If these two conditions are satisfied, an emergency stop instruction is issued in step S63. . In FIG. 14, an emergency stop is instructed when two of the four PEs fail.

ステップＳ６５の条件が成立しない場合、例えば自分が故障ＰＥである場合には、ステップＳ６６ですべてのＰＥが故障し、かつ自分がマスタＰＥであるか否かが判定され、例えば自分がマスタＰＥでない場合にはメインルーチンへのリターンが行われる。すべてのＰＥが故障している場合にはリターン後の処理内容が明確ではないが、ここではステップＳ６６の条件が成立しない場合にはメインルーチンへのリターンが行われるものとする。ステップＳ６６の条件が成立する場合には、ステップＳ６３でシステムに対する緊急停止指示が行われる。ここでは単一のＰＥのみが緊急停止処理を指示したが、緊急時であるので全ＰＥが緊急停止処理を指示してもよい。 If the condition of step S65 is not satisfied, for example, if it is a faulty PE, it is determined in step S66 whether all the PEs are faulty and whether it is a master PE. In that case, a return to the main routine is performed. When all the PEs are faulty, the processing content after the return is not clear, but here, if the condition of step S66 is not satisfied, a return to the main routine is performed. If the condition in step S66 is satisfied, an emergency stop instruction is issued to the system in step S63. Here, only a single PE has instructed the emergency stop process, but since it is an emergency, all PEs may instruct the emergency stop process.

このように第３の実施例では、タイマからの割込みに対応してマルチプロセッサシステムを構成する各プロセッサ・エレメントの故障検出を行うシステムにおいてタイマを二重化することによって、システムの信頼性を上げることが可能となる。 As described above, in the third embodiment, the reliability of the system can be improved by duplicating the timer in the system that detects the failure of each processor element constituting the multiprocessor system in response to the interrupt from the timer. It becomes possible.

Claims

A system monitoring device that outputs an interrupt signal from a timer and causes a processor in the monitored system to execute a predetermined process,
For each common time interval, two timers that repeat the operation of outputting an interrupt signal when the time intervals deviate from each other by using a timeout time reload function are provided,
Each time the processor repeats input of an interrupt signal from one of the two timers, the processor determines whether the two timers are faulty ,
The two timers each have a reload register for storing the timeout time,
One timer further includes a right 1-bit shift circuit,
In response to a command sent from the processor at the time of starting the system to request the setting of a timeout time of the same and same time interval value, the counter in the one timer is connected to the counter via the right 1-bit shift circuit. A system monitoring apparatus using a duplex timer , wherein half the time interval is set as a count-out time .

The system monitoring device further includes a flag register that stores a flag indicating an identifier of the timer that has output the interrupt signal in response to an interrupt signal input from one of the two timers,
In response to the next interrupt signal input from the timer, the processor compares the identifier of the timer that has output the next interrupt signal with the stored contents of the flag register to determine the presence or absence of a timer failure. The system monitoring apparatus using the duplex timer according to claim 1.

The system monitoring device further includes a memory for storing a time indicated by a system clock provided in the system in response to an interrupt signal input from one of the two timers,
The processor is duplicated based on the comparison result of the timer identifier and the time difference between the time indicated by the system clock when the next interrupt signal is input and the previous interrupt signal input time stored in the memory. 3. A system monitoring apparatus using a duplex timer according to claim 2, wherein a failure timer is identified among the timers .

A system monitoring device that outputs an interrupt signal from a timer and causes a processor in the monitored system to execute a predetermined process,
For each common time interval, two timers that repeat the operation of outputting an interrupt signal when the time intervals deviate from each other by using a timeout time reload function are provided,
Each time the processor repeats input of an interrupt signal from one of the two timers, the processor determines whether the two timers are faulty,
The two timers each have a reload register for storing the timeout time,
One of the two paths between the bus from the processor and the counter in the one timer, on the path through the reload register, is left between the bus and the reload register. A 1-bit shift circuit,
The other timer is connected between the bus and the connection point of the two routes, the route through the reload register and the route through the reload register, between the bus from the processor and the counter in the other timer. A left 1-bit shift circuit,
In response to a command sent from the processor at the time of starting the system and requesting setting of a half value of the time interval as the timeout time, the counter in the one timer is not passed through the reload register. system monitoring apparatus using a double reduction timer you characterized in that the setting of the counting-out time is performed by the path.

A system monitoring method for outputting an interrupt signal from a timer and causing a processor in a monitored system to execute a predetermined process,
Two timers each include a reload register for storing the timeout time, and one timer further includes a right 1-bit shift circuit,
In response to a command that is sent from the processor at the time of starting the system and requests the setting of the timeout time of the same time interval value, the counter in the one timer is connected to the counter via the right 1-bit shift circuit. Set half the time interval as the countout time,
The two timer, for each of the common time interval, the operation to output the interrupt signal when said time interval is shifted half together repeated with reload function timeout period,
System monitoring using a duplex timer, wherein the processor determines the presence or absence of a failure of the two timers each time an interrupt signal from one of the two timers is repeatedly input Method.

When the system is started up, the processor sets, as a timeout time for the timer , half of the same time interval common to the one timer to disable the reload function, and sets the same time interval to the other timer. Set the value of to enable the reload function,
When the said one of the timer outputs a first interrupt signal, the time-out period of one timer said the value of the same time interval, according to claim 5, characterized in that resetting the reload function as an effective System monitoring method using a dual timer.

In response to the input of the interrupt signal from one of the two timers, a flag indicating the identifier of the timer that output the interrupt signal is stored in the flag register,
In response to the next interrupt signal input from the timer, the processor compares the identifier of the timer that has output the next interrupt signal with the stored contents of the flag register to determine the presence or absence of a timer failure. The system monitoring method using a duplex timer according to claim 5 .

In response to the input of an interrupt signal from one of the two timers, the time indicated by the system clock provided in the system is stored in the memory,
The processor is duplicated based on the comparison result of the timer identifier and the time difference between the time indicated by the system clock when the next interrupt signal is input and the previous interrupt signal input time stored in the memory. 8. The system monitoring method using a duplex timer according to claim 7, wherein a failure timer is identified among the timers.

A system monitoring device that outputs an interrupt signal from a timer and causes each processor in the monitoring target multiprocessor system to execute a predetermined process,
Two timers that repeat the operation of outputting an interrupt signal at the same common time interval and at a time shifted from each other by half of the time interval by using a reload function of a timeout time,
Each time at least one processor in the multiprocessor system repeats input of an interrupt signal from one of the two timers, it determines whether or not the two timers have failed ,
The two timers each have a reload register for storing the timeout time,
One timer further includes a right 1-bit shift circuit,
In response to a command sent from the at least one processor when the multiprocessor system is started to request the setting of a timeout time of the same and same time interval value, the counter in the one timer has the right A system monitoring apparatus using a duplex timer, wherein a half value of the time interval is set as a count-out time via a 1-bit shift circuit .