JP2004295494A

JP2004295494A - Multiple processing node system having versatility and real time property

Info

Publication number: JP2004295494A
Application number: JP2003087168A
Authority: JP
Inventors: Rei Kyo; 黎姜; Yoshihisa Saito; 美寿齋藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-03-27
Filing date: 2003-03-27
Publication date: 2004-10-21

Abstract

<P>PROBLEM TO BE SOLVED: To perform real time processing and pipeline processing by simple schedule management in a multiple processing node system. <P>SOLUTION: A plurality of processing nodes (PN1-PN4) performing prescribed tasks, in which processing throughputs to the distributed tasks different in weights are almost uniform, are respectively provided with schedulers (SC0-SC4). The scheduler transmits a task completion message through a message network (10) to the other scheduler to process the next task when task processing is completed in the corresponding processing node, and commands the processing start of the task to the corresponding processing node at the time of receiving the task completion message from the other scheduler. Thus, by installing a simple schedule management program to the scheduler, real time processing and pipeline processing in the multiple processing node system are performed. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、マルチ処理ノードを有するシステムに関し、特に汎用性及びリアルタイム性を有するマルチ処理ノードシステムに関する。
【０００２】
【従来の技術】
従来のリアルタイム処理システムは、一般的にＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩＣ）などのような専用ＬＳＩで構成される。かかる専用ＬＳＩによるシステムは、処理能力や消費電力においてメリットを有するものの、再構成が不可能であり、他のアプリケーションに対応することが困難であり、従って汎用性に欠けるというデメリットを有する。そこで、リアルタイム性に加えてある程度の汎用性を有するシステムが要求されているが、かかるシステムは、膨大な処理要求に応えるために複数の処理ノード（処理モジュール）を有するマルチ処理ノード構成をとることが望まれる。マルチ処理ノードシステムは、各処理ノードが同じ構成のもの（Ｈｏｍｏｇｅｎｅｏｕｓ）と各処理ノードの構成が異なる構成のもの（Ｈｅｔｅｒｏｇｅｎｅｏｕｓ）とがあり、本発明は、このヘテロジーニアス構造ベースのものである。各処理ノードがプログラム可能（ｐｒｏｇｒａｍｍａｂｌｅ）または再構成可能（ｒｅ−ｃｏｎｆｉｇｕｒａｂｌｅ）なものであり、各処理に適した異なる構成を有することが多い。これらの複数の処理ノードを協調的に制御するために、集中制御または分散制御の手法がある。集中制御の場合は、複数の処理ノードに対して共通のスケジューラプロセッサを設け、そのスケジューラが複数の処理ノードにタスク開始命令を送信し、複数の処理ノードからタスク終了メッセージを受信し、適宜タスクを各処理ノードに割り当てる。かかるマルチ処理システムは、例えば以下の特許文献１に記載されている。
【０００３】
図１は、従来のマルチ処理ノードシステムの構成例を示す図である。この例では、マルチ処理ノードシステム１が、４つの処理ノードＰＮ１〜ＰＮ４を有し、処理ノードはデータネットワーク２を介して接続され、データの送受信を行う。データネットワーク２には共通メモリ４が設けられ、この共通メモリ４に対して各処理ノードがデータを格納し、データを読み出す。また、４つの処理ノードＰＮ１〜ＰＮ４は、共通のスケジューラ（マイクロコントロールユニット：ＭＣＵ）３によってそれぞれの処理の制御が行われる。即ち、スケジューラ３は、各処理ノードから状態信号や割込信号Ｓ１〜Ｓ４をそれぞれ受信し、図示しない処理プログラムに従って、各処理ノードに制御信号Ｃ１〜Ｃ４を出力する。各処理ノードは同じ構成の汎用プロセッサであり、この制御信号Ｃ１〜Ｃ４に応答してそれぞれの処理を実行する。
【０００４】
【特許文献１】
特公平６−４２２３４号公報（図１）
【０００５】
【発明が解決しようとする課題】
マルチ処理ノードの制御方法としては、パイプライン処理を行わない制御方式と、パイプライン処理を行う制御方式とが考えられる。パイプライン処理を行わない制御方式では、制御は単純ではあるが、複数の処理ノードが並列動作を行わないで、一連の複数のタスクが終了するまで次の一連の複数タスクを開始せず、従って、処理ノードの稼働率が低くなり効率的ではないという問題がある。また、パイプライン処理ではないのでリアルタイム処理を実現できないというデメリットもある。
【０００６】
一方、パイプライン処理を行う制御方式では、処理ノードの稼働率が上がりリアルタイム処理に近づくものの、共通のスケジューラが、各処理ノードへのタスク起動、各処理ノードからのタスク終了割込処理を行うため、スケジューラのプログラムの複雑度が増えるというデメリットが伴う。複数の処理ノードが同じ構成のプロセッサの場合、各処理ノードに割り当てられたタスクの処理の重さのバランスが均等でなく、パイプライン動作の実現が極めて困難になる。特に分岐が発生したときに、パイプライン処理のための同期を取ることが困難になり、データ待ちに伴って処理順序が変更されるとますます制御が複雑化する。更に、処理ノードが増えると、割込処理に多くの処理時間を割く必要があり、リアルタイム性を確保できない可能性がある。このことは、システムの拡張性がなくスケーラビィリティが得られない可能性がある。また、共通のスケジューラに対応して全ての処理タスクが同じメモリ空間を共有する必要があり、処理ノードを異なるアーキテクチャ、コンパイラで設計することが困難になる。
【０００７】
そこで、本発明の目的は、制御を単純化でき、汎用性がありリアルタイム性を有するマルチ処理ノードシステムを提供することにある。
【０００８】
【課題を解決するための手段】
上記の目的を達成するために、本発明の一つの側面は、マルチ処理ノードシステムにおいて、それぞれ所定のタスクを行い、分配された重さが異なるタスクに対する処理スループットがほぼ均等に構成された複数の処理ノードと、前記複数の処理ノードに接続され当該複数の処理ノード間でデータ転送が行われるデータネットワークと、前記複数の処理ノードそれぞれに対応して設けられ、対応する処理ノードへタスク開始制御をしタスク終了信号を受信してタスクのスケジュール制御を行う複数のスケジューラと、前記複数のスケジューラに接続され、当該複数のスケジューラ間でのメッセージの転送が行われるメッセージネットワークとを有する。そして、前記スケジューラは、他のスケジューラから送信されるタスク終了メッセージに応答して、対応する処理ノードにタスク開始制御を行い、当該対応する処理ノードからのタスク終了信号に応答して、次のタスクを行う処理ノードのスケジューラに前記メッセージネットワークを介してタスク終了メッセージを送信する。
【０００９】
上記の発明の側面によれば、各処理ノードそれぞれに対応してスケジューラが設けられ、各スケジューラが対応する処理ノードへのタスク制御を行うとともに、スケジューラ間でタスク終了メッセージの転送を行ってタスク間のスケジュール制御を行う。従って、処理ノードＡがタスクを終了すると次のタスクが処理ノードＢで行われ、それと同時に処理ノードＡが次のタスクを開始することができる。つまり、複数の処理ノードがそれらに分配された重さが異なるタスクに対する処理スループットがほぼ均等になるように構成されているので、一連の複数タスクからなる処理要求に対して、複数の処理ノードが複数のタスクをパイプライン処理することができ、複数の処理ノードのタスク制御を各処理ノードに設けた複数のスケジューラに分散したことで、スケジュール制御が簡単になりリアルタイム処理が可能になる。
【００１０】
上記の発明の側面において、好ましい実施例では、前記処理ノードはローカルメモリを有し、タスク処理終了時に処理済みデータを当該ローカルメモリに格納し、前記スケジューラは、タスク終了メッセージに、終了したタスク情報と共に前記処理済みデータが格納されている処理ノード情報とローカルアドレスとを含める。そして、次のタスクを処理する処理ノードは、当該タスク終了メッセージに含まれた処理ノード情報とローカルアドレスに基づいて、前記データネットワークを介して前記処理済みデータを取得する。各処理ノードには、システムのグローバルアドレスと処理ノード内のローカルアドレスとを変換するアドレス変換回路を有する。
【００１１】
この好ましい実施例では、各処理ノードは、処理可能なタイミングでデータを取得することができる。また、各処理ノードは、それぞれのローカルアドレスに基づいてデータ処理を行うことができ、他の処理ノードとローカルアドレス空間を共有する必要がない。従って、各処理ノードは独自のアーキテクチャで構成し、独自のコンパイラを使用することが可能になり、設計のフレキシビリティが上がりシステムの拡張性が増大する。
【００１２】
上記の発明の側面において、別の好ましい実施例では、各スケジューラは、対応する処理ノードにタスク開始制御を行った時に当該処理ノードへのクロック供給を開始し、処理ノードからのタスク終了信号に応答して、当該クロック供給を停止する。各スケジューラが対応する処理ノードへのタスク制御を行うと共に動作クロックの供給と停止も行うことで、省電力化することができる。
【００１３】
上記の発明の側面において、更に別の実施例では、前記データネットワークは、前記複数の処理ノードに接続されたデータバスと、当該データバスのバス管理を行うバスアービタとを有する。
【００１４】
本発明の別の側面は、各スケジューラはタスク終了メッセージをプッシュ方式で次のタスクを行う処理ノードのスケジューラ供給し、各スケジューラはタスク終了メッセージの受信に応答して、タスク開始制御を行う。その時タスク処理中であれば、それが終了するまで待機する。また、各処理ノードは、タスク開始するときに前タスクを実行した処理ノードのローカルメモリから必要なデータをプル方式で取得する。このようにすることで、リアルタイム処理とパイプライン処理を簡単な制御により実現することができる。
【００１５】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態例を説明する。しかしながら、本発明の保護範囲は、以下の実施の形態例に限定されるものではなく、特許請求の範囲に記載された発明とその均等物にまで及ぶものである。
【００１６】
図２は、本実施の形態におけるマルチ処理ノードシステムの構成図である。このマルチ処理ノードシステム１は、外部バスとのインターフェースであるブロックインターフェースＢＩＦと、それに対応するスケジューラＳＣ０を有する。ブロックインターフェースＢＩＦは、入力バッファＩＢ０と出力バッファＯＢ０を有する。更に、マルチ処理ノードシステム１は、複数の、例えば４つの処理ノードＰＮ１〜ＰＮ４と、それに対応して処理のスケジュール管理を行うスケジューラＳＣ１〜ＳＣ４とを有する。各スケジューラは、対応する処理ノードに対してのみスケジュール管理を行い、他の処理ノードのスケジュール管理は行わない。但し、スケジューラ間はメッセージを転送してタスク処理の終了などを連絡する。従って、各スケジューラは、対応する処理ノードから状態信号Ｓ０〜Ｓ４を受信し、処理ノードにタスク開始などの制御信号Ｃ０〜Ｃ４を供給する。
【００１７】
処理ノードＰＮ１〜ＰＮ４は、例えば特定の処理を行うプロセッサや汎用プロセッサ、デジタルシグナルプロセッサ、再構成可能なハードウエア（ｒｅｃｏｎｆｉｇｕｒａｂｌｅｈａｒｄｗａｒｅ）などの回路モジュールであり、必ずしも全て同じ構成のプロセッサではなく、任意の目的に対応した異なる構成のプロセッサである。そして、処理ノードは、後述するとおり、分配される重さが異なるタスクに対する処理スループット（処理量）がほぼ均等になるような構成にそれぞれカスタマイズされる。つまり、各処理ノードに分配されるタスクの重さが異なっていても、各処理ノードの処理能力をそれに対応させて、各処理ノードでのスループットをほぼ均等にしている。各処理ノードは、ローカルメモリとして入力バッファＩＢ１〜ＩＢ４と出力バッファＯＢ１〜ＯＢ４を有する。また、スケジューラＳＣ０〜ＳＣ４は、例えばマイクロコントロールユニットであり、ＣＰＵと内部メモリと入出力バッファなどを内蔵する。この内部メモリには、対応する処理ノードのタスク制御プログラムが適宜格納される。
【００１８】
スケジューラＳＣ０〜ＳＣ４は、メッセージネットワーク１０を介して接続され、スケジューラ間でタスク制御に関するメッセージ転送が行われる。このメッセージには、後述するとおり、タスク終了メッセージが少なくとも含まれる。処理ノードＰＮ１〜ＰＮ４とブロックインターフェースＢＩＦは、データネットワーク２を介して接続され、それらの間でデータ転送が行われる。ブロックインターフェースＢＩＦや処理ノードＰＮ１〜ＰＮ４の内部は、ローカルアドレス空間であるのに対して、データネットワーク２ではそれらのブロックインターフェースＢＩＦや処理ノードＰＮ１に共通のグローバルアドレス空間である。そこで、ブロックインターフェースＢＩＦの出力バッファＯＢ０や各処理ノードの入出力バッファＩＢ１〜ＩＢ４、ＯＢ１〜ＯＢ４とデータネットワーク２との間には、グローバルアドレスとローカルアドレスのアドレス変換を行うためのデータポートＤＰ０〜ＤＰ４が設けられている。更に、データネットワーク２は、図２の例では、４つのデータバスＤＢ１〜ＤＢ４が設けられ、それぞれにバス管理を行うバスアービタ回路１２が設けられている。データネットワーク２のデータバスの本数は、データ転送の通信量に応じて適宜選択可能である。
【００１９】
図３は、本実施の形態におけるスケジューラ間のメッセージ転送と処理タスク間のデータ転送を説明する図である。説明のために仮に処理ノードＰＮＡがタスクＡを処理し、それに続いて処理ノードＰＮＢがタスクＢを処理する場合について説明する。まず、スケジューラＳＣＡがタスクＡの開始制御信号ＣＡを処理ノードＰＮＡに供給すると、処理ノードＰＮＡはタスクＡの処理を開始する。そして、処理ノードＰＮＡはタスクＡの処理が終わると、処理済みデータを出力バッファＯＢの所定のローカルアドレス内に格納し、タスクＡの終了を示す状態信号ＳＡをスケジューラＳＣＡに供給する（処理Ｐ１）。このときデータのローカルアドレスなどが状態信号としてスケジューラＳＣＡに供給される。スケジューラＳＣＡはタスク終了の割込信号ＳＡに応答して、タスクＡの終了メッセージをスケジューラＳＣＢにメッセージネットワーク１０を介して転送する（処理Ｐ２）。このタスク終了メッセージには、タスクＡが終了したことと、処理済みデータが格納されている処理ノード情報とそのローカルアドレスとが含まれる。
【００２０】
スケジューラＳＣＢは、タスクＡ終了メッセージに応答して、次のタスクＢのスケジュール制御を行う。即ち、スケジューラＳＣＢは、処理ノードＰＮＢにおいて先行するタスク処理が終了した時に、又は処理ノードＰＮＢにおいて何らタスク処理が行われていない時に、タスクＢの処理開始を指令する制御信号ＣＢを処理ノードＰＮＢに与えて、処理ノードＰＮＢでのタスクＢの処理を起動する。この時、タスクＡ終了メッセージに含まれていたデータ格納先情報、処理ノードＰＮＡ情報とそのローカルアドレス、も処理ノードＰＮＢに与えられる。
【００２１】
処理ノードＰＮＢは、タスクＢの開始制御に応答して、与えられた処理ノードＰＮＡ情報とそのローカルアドレスにしたがって、処理済みデータを処理ノードＰＮＡの出力バッファから取得する（処理Ｐ３）。このデータ取得は、データネットワーク内のバスにアドレスと共にアクセス要求を出すことにより行われる。前述のとおり、アクセス要求に対するバス管理はバスアービタ１２により行われる。そして、処理ノードＰＮＡの出力バッファＯＢから処理ノードＰＮＢの入力バッファＩＢに、要求したデータが転送される（処理Ｐ４）。なお、データポートＤＰでは、ローカルアドレス３０とグローバルアドレス２０とのアドレス変換が行われる。
【００２２】
図４は、本実施の形態におけるスケジューラのタスク制御のフローチャート図である。このフローチャートは、タスク１と２が割り当てられる処理ノードのスケジューラを例にしている。スケジューラは、割り当てられているタスク１またはタスク２の前のタスクの終了メッセージを受信したか否かを常時監視する（Ｓ１００，Ｓ１１０）。この監視は、メッセージネットワーク１０を介してタスク終了メッセージを受信したか否かをチェックすることで行われる。更に、スケジューラは、対応する処理タスクがタスク処理を終了したか否かについても監視する（Ｓ１２０）。この監視は、対応する処理ノードからの状態信号により行われる。
【００２３】
タスク終了メッセージの受信に応答して、現在処理ノードが別のタスクを処理中か否かをチェックし（Ｓ１０２，Ｓ１１２）、処理中でなければ、タスク開始制御を行い（Ｓ１０４，Ｓ１１４）、処理中であればタスク待ち行列に登録する（Ｓ１０６，Ｓ１１６）。また、処理ノードからのタスク終了信号に応答して、次のタスクを処理する処理ノードのスケジューラにタスク終了メッセージをメッセージネットワーク１０を介して送信する（Ｓ１２２）。送信後、待ちタスクがあれば（Ｓ１２４）、次のタスク開始制御信号を処理ノードに与えてタスク処理を開始させる（Ｓ１２６）。
【００２４】
更に、各タスク開始制御Ｓ１０４，Ｓ１１４，Ｓ１２６では、先行するタスクが処理して出力バッファに格納したデータが、後続のタスク処理に伴って別の処理ノードから読み出されて、出力バッファが空き状態になることが確認される。つまり、タスク処理に伴って処理済みデータを出力バッファに格納し、後続のタスク処理を行う処理ノードからその処理済みデータを読み出すようにしているので、出力バッファの容量に限りがある場合は、当該後続のタスク処理を行う処理ノードにより処理済みデータが読み出された後でなければ、新たなタスク処理を開始できない。
【００２５】
以上のように、スケジューラは、対応する処理ノードに割り当てられたタスクの開始制御を先行するタスクの終了メッセージに応答して行い、タスク終了後はタスク終了メッセージを別のスケジューラに送信する。従って、スケジューラは制御下の処理ノードに対してのみタスク制御を行えば良いので、制御が簡単であり、スケジュール制御のためのオーバーヘッドを少なくすることができる。また、一連の複数タスクからなる処理プログラムに対して、割り当てられたタスクを処理すれば、タスクの待ち行列がないかぎり次の処理プログラムに対するタスクを処理することができ、複数の処理ノードによる並列処理が可能になる。
【００２６】
図５は、本実施の形態におけるスケジューラのクロック制御を説明する図である。スケジューラＳＣ１は、処理ノードＰＮ１のタスク管理に伴って動作クロックの管理も行う。即ち、スケジューラＳＣ１は処理ノードにタスク開始制御を行う時に（時間ｔ１）、ＰＬＬ回路にクロックイネーブル信号を出力してＰＬＬ回路から動作クロックｃｌｏｃｋを処理ノードＰＮ１に供給させる。そして、処理ノードＰＮ１から状態信号Ｓ１によってタスクが完了したことを検出すると、次に実行すべきタスクがない場合は（時間ｔ２）、クロックイネーブル信号をディセーブルにしてＰＬＬ回路による動作クロックの供給を停止する。このようにスケジューラがタスク制御に加えて動作クロックの制御も行うことにより、処理ノードがタスク処理中でない時に動作クロックの供給を停止して省電力化する。図４のフローチャートでは、工程Ｓ１０４，Ｓ１１４でクロックイネーブル信号が活性化され、工程Ｓ１２４で待ちタスクがない場合に非活性化される。
【００２７】
図６は、プログラムの具体例（１）を示す図である。この具体例では、プログラムＡとプログラムＢとがタスク処理ノードシステムによる処理を要求されるプログラムであり、図中（ａ）に示されるように、プログラムＡは、シーケンシャルに処理されるタスクａ１〜ａ５からなり、プログラムＢは、シーケンシャルに処理されるタスクｂ１〜ｂ４からなる。そこで、図中（ｂ）に示されるように、これらのプログラムに含まれる重さが異なるタスクが、４つの処理ノードＰＮ１〜ＰＮ４においてできるだけ均等な処理量になるように分配される。つまり、プログラムＡについては、モジュール１によるタスクａ１，ａ３の処理スループットと、モジュール２によるタスクａ２，ａ５の（処理スループット）と、モジュール３によるタスクａ４の処理スループットとはほぼ均等になるように、各モジュールの構成及びタスクの分配が行われている。同様に、プログラムＢについては、モジュール１によるタスクｂ１の処理スループットと、モジュール２によるタスクｂ２の処理スループットと、モジュール３によるタスクｂ３の処理スループットと、モジュール４によるタスクｂ４の処理スループットもほぼ均等になるように、各モジュールの構成及びタスクの分配が行われている。このように、各モジュールの処理スループットを同程度にすることで、後述する例に示されるとおり、リアルタイム処理及びパイプライン処理が可能になる。
【００２８】
図中、処理ノードＰＮ１〜ＰＮ４は、モジュール１〜４とも称され、以下モジュールで説明する場合もあるが、それは処理ノードの意味である。また、各処理ノードは、それぞれ所定の処理に適した構成を有し、その適した処理に応じても前記タスクが分配される。この具体例のようにモジュールにタスクを分配した場合の各スケジューラのタスク管理プログラム例を以下説明する。
【００２９】
図７は、スケジューラＳＣ０のタスク管理プログラム例である。この管理プログラムは、ステップＰ１００〜Ｐ１０３からなり、ステップＰ１００では、プログラムＡの起動命令を受信するとモジュール１のスケジューラに「プログラムＡの起動、ＢＩＦのアドレスａｄｄｒ０、データサイズｓｉｚｅ」なるメッセージを送信する。ステップＰ１０１では、プログラムＡのタスクａ５が完了したメッセージを監視し、完了メッセージを受信したらモジュール２（ｍ２）のアドレスａｄｄｒ０にデータサイズｓｉｚｅのデータを、システム外部に出力する。また、ステップＰ１０２では、プログラムＢの起動命令を受信するとモジュール１のスケジューラに「プログラムＢの起動、ＢＩＦのアドレスａｄｄｒ１にデータサイズｓｉｚｅ」なるメッセージを送信する。ステップＰ１０３では、プログラムＢのタスクｂ４が完了したメッセージを監視し、完了メッセージを受信したらモジュール４（ｍ４）のアドレスａｄｄｒ０にデータサイズｓｉｚｅのデータを、システム外部に出力する。
【００３０】
図８は、スケジューラＳＣ１とＳＣ２のタスク管理プログラム例である。スケジューラＳＣ１のプログラム例では、ステップＰ１１０で、プログラムＡの起動メッセージ「ｐＡ−＞ｓｔａｒｔ」の受信を監視し、受信したら、メッセージに含まれているＢＩＦのローカルアドレスａｄｄｒ０にアクセスしてデータを取得し、タスクａ１を実行し、タスクａ１終了後にモジュール２のスケジューラにメッセージ「ｐＡ−＞ｔａｓｋ．ａ１，ｍ１．ａｄｄｒ０，ｓｉｚｅ」を送信する。ステップＰ１１１では、プログラムＡのタスクａ２の完了メッセージ「ｐＡ−＞ｔａｓｋ．ａ２」の受信を監視し、受信したら、メッセージに含まれているモジュール２のローカルアドレスａｄｄｒ０にアクセスしてデータを取得し、タスクａ３を実行し、タスクａ３終了後にモジュール３のスケジューラにメッセージ「ｐＡ−＞ｔａｓｋ．ａ３，ｍ１．ａｄｄｒ１，ｓｉｚｅ」を送信する。ステップＰ１１２は、プログラムＢの起動メッセージを受信したときの制御でありステップＰ１１０と同様である。
【００３１】
スケジューラＳＣ２のタスク管理プログラムでは、ステップＰ１２０で、プログラムＡのタスクａ１の完了メッセージを受信した時に次のタスクａ２を開始し、それが終わったらモジュール１のスケジューラに完了メッセージを送信する。ステップＰ１２１，Ｐ１２２も同様である。
【００３２】
図９は、スケジューラＳＣ３とＳＣ４のタスク管理プログラム例である。ステップＰ１３０では、プログラムＡのタスクａ３の完了メッセージを受信した時に次のタスクａ４を開始し、それが終わったらモジュール２のスケジューラに完了メッセージを送信する。ステップＰ１３１も同様である。スケジューラＳＣ４のタスク管理プログラムでは、プログラムＢのタスクｂ３の完了メッセージを受信した時に次のタスクｂ４を開始し、それが終わったらブロックインターフェースＢＩＦのスケジューラに完了メッセージを送信する。
【００３３】
以上のタスク管理プログラムを各スケジューラＳＣ０〜ＳＣ４にインストールすることで、プログラムＡとＢを外部のシステムバスからの起動要求に応答して、マルチ処理ノードシステム１がリアルタイム処理で且つパイプライン処理で実行することができる。
【００３４】
図１０は、具体例（１）におけるプログラムＡとＢに対するパイプライン処理を示すタイミングチャート図である。横軸が時間でありブロックインターフェースＢＩＦに供給されるデータと、各モジュールが行うタスクが示されている。この例では、プログラムＡとプログラムＢの実行要求が交互に行われた例であり、ブロックインターフェースＢＩＦに、データＡとデータＢとが交互に供給されている。また、前述のとおり、モジュール１〜４におけるそれぞれのタスク処理スループットがほぼ同じになるようにハードウエア構成及びタスク分配が行われている。
【００３５】
図１０では、パイプラインステージＰＳ１でデータＡがブロックインターフェースＢＩＦに供給されて、プログラムＡの実行要求が行われる。これに応答して、次のパイプラインステージＰＳ２で、モジュール１がタスクａ１を処理し、その完了メッセージに応答してモジュール２がタスクａ２をそれぞれ処理する。同じステージＰＳ２でデータＢがブロックインターフェースＢＩＦに供給されてプログラムＢの実行要求が行われる。
【００３６】
この実行要求に応答して、次のステージＰＳ３でモジュール１はタスクｂ１を処理する。この間、ステージＰＳ１のデータＡに対するタスクａ３の処理は、待ち行列に入れられる。また、このステージＰＳ３にてデータＡが供給される。
【００３７】
ステージＰＳ４では、ステージＰＳ３でのデータＡに応答して、モジュール１がタスクａ１を処理し、更に、モジュール１は、ステージＰＳ１のデータＡに対するタスクａ３を処理する。ステージＰＳ４では、モジュール２がタスクｂ２を処理しているので、モジュール２はステージＰＳ３のデータＡに対するタスクａ２を処理することはできない。
【００３８】
ステージＰＳ５では、モジュール２が待ち状態にあったステージＰＳ３のデータＡに対するタスクａ２を処理し、モジュール３がタスクａ４を処理する。同時に、モジュール１は、データＢに対するタスクｂ１を処理する。
【００３９】
更に、ステージＰＳ６では、モジュール２は、ステージＰＳ４のデータＢに対するタスクｂ２を処理し、モジュール３は、ステージＰＳ２のデータＢに対するタスクｂ３を処理する。
【００４０】
そして、ステージＰＳ７にて、ステージＰＳ１のデータＡに対するタスクａ５がモジュール２により処理され、ステージＰＳ２のデータＢに対するタスクｂ４がモジュール４により処理される。つまり、ステージＰＳ７で最初のプログラムＡと次のプログラムＢの処理が完了する。
【００４１】
それ以降は、データＡとデータＢが交互に供給されることに応答して、モジュール１は、タスクａ１，ｚ３の処理とタスクｂ１の処理とを交互に行い、モジュール２は、タスクａ２，ａ５の処理とタスクｂ２の処理とを交互に行い、モジュール３は、タスクａ４の処理とタスクｂ３の処理とを交互に行い、モジュール４は、タスクｂ４の処理を間欠的に行う。つまり、各処理ノードのタスク処理が飽和した状態で、各ステージ毎に要求されるプログラムＡ及びＢがパイプライン処理される。従って、データＡ，Ｂがリアルタイムで処理される。
【００４２】
また、図示しないが、ステージＰＳ２でデータＡが供給された場合は、モジュール２でのタスクａ２の処理に伴ってモジュール１内の出力バッファ内の処理済みデータが読み出されないと、モジュール１でタスクａ１を処理開始することはできない。従って、ステージＰＳ２でモジュール１に処理の余力があったとしても、モジュール１は、ステージＰＳ３でタスクａ１の処理を行う。出力バッファのタスクａ１に対する容量が大きい場合は、初期状態において、複数のタスクａ１を連続して処理することが可能である。但し、パイプライン処理が飽和した状態では、図１０のような処理になる。
【００４３】
図１１は、プログラムの具体例（２）を示す図である。この具体例では、プログラムＡとプログラムＢとがタスク処理ノードシステムで処理を要求されるプログラムであり、図中（ａ）に示されるように、プログラムＡは、タスクａ１〜ａ５を有し、タスクａ１を完了したときの状態がケース１の時はタスクａ２またはａ３のいずれか一方を実行し、タスクａ１を完了したときの状態がケース２の時はタスクａ２とａ３の両方を実行する分岐命令が含まれている。そして、タスクａ１，ａ５が処理ノードＰＮ１に分配され、タスクａ２，ａ３，ａ４がそれぞれ処理ノードＰＮ２，ＰＮ３，ＰＮ４に分配されている。この分配は、それぞれの処理ノードに対応するスケジューラのタスク管理プログラムを記述することにより行うことができる。
【００４４】
また、前述と同様に、各モジュールでの分配されたタスクの処理スループットがほぼ均等になるように、モジュールの構成とタスクの配分が行われている。
【００４５】
図１２は、スケジューラＳＣ０とＳＣ１のタスク管理プログラム例である。スケジューラＳＣ０の管理プログラムにおいて、ステップＰ１００では、プログラムＡの起動命令を受信するとモジュール１のスケジューラに「プログラムＡの起動、ＢＩＦのアドレスａｄｄｒ０、データサイズｓｉｚｅ」なるメッセージを送信する。ステップＰ１０４では、プログラムＡのタスクａ５が完了したメッセージを監視し、完了メッセージを受信したらモジュール１（ｍ１）のアドレスａｄｄｒ１にデータサイズｓｉｚｅのデータを、システム外部に出力する。
【００４６】
図１２は、ケース１の場合のスケジューラＳＣ１のプログラム例であり、ステップＰ１１３で、プログラムＡの起動メッセージ「ｐＡ−＞ｓｔａｒｔ」の受信を監視し、受信したら、メッセージに含まれているＢＩＦのローカルアドレスａｄｄｒ０にアクセスしてデータを取得し、タスクａ１を実行し、タスクａ１終了後に、モジュール１の状態が「タスク２」ならモジュール２のスケジューラにメッセージ「ｐＡ−＞ｔａｓｋ．ａ１，ｍ１．ａｄｄｒ０，ｓｉｚｅ」を送信し、モジュール１の状態が「タスク３」ならモジュール３のスケジューラにメッセージ「ｐＡ−＞ｔａｓｋ．ａ１，ｍ１．ａｄｄｒ０，ｓｉｚｅ」を送信する。ステップＰ１１４では、プログラムＡのタスクａ５の完了メッセージ「ｐＡ−＞ｔａｓｋ．ａ５」の受信を監視し、受信したら、メッセージに含まれているモジュール４のローカルアドレスａｄｄｒ０にアクセスしてデータを取得し、タスクａ５を実行し、タスクａ５終了後にＢＩＦのスケジューラにメッセージ「ｐＡ−＞ｔａｓｋ．ａ５，ｍ１．ａｄｄｒ１，ｓｉｚｅ」を送信する。
【００４７】
図１３は、スケジューラＳＣ２とＳＣ３のタスク管理プログラム例である。スケジューラＳＣ２のタスク管理プログラムでは、ステップＰ１２３で、プログラムＡのタスクａ１の完了メッセージを受信した時に次のタスクａ２を開始し、それが終わったらモジュール４のスケジューラに完了メッセージを送信する。また、スケジューラＳＣ３のタスク管理プログラムでは、ステップＰ１３２で、プログラムＡのタスクａ１の完了メッセージを受信した時に次のタスクａ３を開始し、それが終わったらモジュール４のスケジューラに完了メッセージを送信する。
【００４８】
図１４は、スケジューラＳＣ４のタスク管理プログラム例である。これもケース１の場合の例である。ステップＰ１４１で、プログラムＡのタスクａ２の完了メッセージを受信した時に次のタスクａ４を開始し、それが終わったらＢＩＦのスケジューラに完了メッセージを送信する。また、ステップＰ１４２で、プログラムＡのタスクａ３の完了メッセージを受信した時に次のタスクａ４を開始し、それが終わったらＢＩＦのスケジューラに完了メッセージを送信する。
【００４９】
図１５は、ケース２の場合のスケジューラＳＣ１とＳＣ４のタスク管理プログラム例である。スケジューラＳＣ１のプログラム例では、ステップＰ１１５で、プログラムＡの起動メッセージ「ｐＡ−＞ｓｔａｒｔ」の受信を監視し、受信したら、メッセージに含まれているＢＩＦのローカルアドレスａｄｄｒ０にアクセスしてデータを取得し、タスクａ１を実行し、タスクａ１終了後に、モジュール２と３のスケジューラにメッセージ「ｐＡ−＞ｔａｓｋ．ａ１，ｍ１．ａｄｄｒ０，ｓｉｚｅ」を送信する。ステップＰ１１４は前述の通りである。
【００５０】
また、スケジューラＳＣ４のタスク管理プログラムでは、ステップＰ１４３で、プログラムＡのタスクａ２とａ３の完了メッセージ「ｐＡ−＞ｔａｓｋ．ａ２」、「ｐＡ−＞ｔａｓｋ．ａ３」の受信を監視し、両方受信したら、メッセージに含まれているモジュール２及び３のローカルアドレスａｄｄｒ０にアクセスして両方のデータを取得し、タスクａ４を実行し、タスクａ４終了後にＢＩＦのスケジューラにメッセージ「ｐＡ−＞ｔａｓｋ．ａ４，ｍ４．ａｄｄｒ０，ｓｉｚｅ」を送信する。
【００５１】
図１６は、具体例（２）におけるプログラムＡのパイプライン処理を示すタイミングチャート図である。図１６（Ａ）がケース１の例であり、図１６（Ｂ）がケース２の例である。前述のとおり、各モジュールの構成とタスク分配は、モジュール１によるタスクａ１，ａ３の処理スループットと、モジュール２によるタスクａ２の処理スループットと、モジュール３によるタスクａ４の処理スループットと、モジュール４によるタスクａ４の処理スループットとがほぼ均等になるようにされている。以下、各パイプラインステージにてデータがブロックインターフェースに供給されて、プログラムＡの起動要求がある場合について、各ケース１，２のパイプライン処理について説明する。
【００５２】
図１６（Ａ）の場合、パイプラインステージＰＳ１で供給されたデータ１に対して、次のステージＰＳ２でモジュール１がタスクａ１を処理する。更に、タスクａ１の処理完了に伴って、モジュール２がタスクａ２を処理する。ステージＰＳ２で供給されたデータ２に対しては、モジュール２がタスクａ２の処理に伴ってモジュール１の出力バッファ内の処理済みデータを読み出しているので、ステージＰＳ２でモジュール１がデータ２に対するタスクａ１を開始することはない。
【００５３】
ステージＰＳ３では、モジュール１がデータ２に対するタスクａ１を処理する。また、タスクａ２の処理完了に応答して、モジュール４がタスクａ４の処理を開始する。また、データ２に対するタスクａ１を処理した結果、モジュール１の状態がタスクａ３になっているとすると、ステージＰＳ３でモジュール３がタスクａ３を処理開始する。
【００５４】
ステージＰＳ４では、モジュール１がデータ３に対するタスクａ１を処理し、残った時間でデータ１に対するタスクａ５を処理する。また、データ３に対するタスクａ１の処理の結果、タスクａ２の状態になり、モジュール２がタスクａ２を処理している。更に、モジュール４はデータ２に対するタスクａ４を処理する。
【００５５】
その後は、各パイプラインステージで、モジュール１はタスクａ１，ａ５を処理し、モジュール２または３はタスクａ２またはａ３を処理し、モジュール４はタスクａ４を処理する。ステージ毎にデータが供給された場合、この状態で飽和する。
【００５６】
図１６（Ａ）の場合は、タスクａ１を処理した後タスクａ２，ａ３が同時に処理される。それ以外は、図１６（Ａ）と同じである。
【００５７】
以上のように、上記の実施の形態では、複数の処理ノードの構成をそれぞれ異なる構成にして、それぞれに分配された重さが異なるタスクに対して処理スループットが均等になるようにし、更に、各処理ノードをスケジュール制御するスケジューラが、タスク完了メッセージを転送し、そのタスク完了メッセージに応答して次のタスク開始制御をする。その結果、パイプライン処理が可能になり、データのリアルタイム処理が可能になる。
【００５８】
以上、実施の形態例をまとめると以下の付記の通りである。
【００５９】
（付記１）マルチ処理ノードシステムにおいて、
それぞれ所定のタスクを行い、分配された重さが異なるタスクに対する処理スループットがほぼ均等に構成された複数の処理ノードと、
前記複数の処理ノードに接続され当該複数の処理ノード間でデータ転送が行われるデータネットワークと、
前記複数の処理ノードそれぞれに対応して設けられ、対応する処理ノードへタスク開始制御をし更にタスク終了信号を受信してタスクのスケジュール制御を行う複数のスケジューラと、
前記複数のスケジューラに接続され、当該複数のスケジューラ間でのメッセージの転送が行われるメッセージネットワークとを有し、
前記スケジューラは、他のスケジューラから送信されるタスク終了メッセージに応答して、対応する処理ノードにタスク開始制御を行い、当該対応する処理ノードからのタスク終了信号に応答して、次のタスクを行う処理ノードのスケジューラに前記メッセージネットワークを介してタスク終了メッセージを送信することを特徴とするマルチ処理ノードシステム。
【００６０】
（付記２）付記１において、
前記処理ノードはローカルメモリを有し、タスク処理終了時に処理済みデータを当該ローカルメモリに格納し、
前記スケジューラは、タスク終了メッセージに、終了したタスク情報と共に前記処理済みデータが格納されている処理ノード情報とローカルアドレスとを含め、
後続のタスクを処理する処理ノードは、当該タスク終了メッセージに含まれた処理ノード情報とローカルアドレスに基づいて、前記データネットワークを介して前記処理済みデータを取得することを特徴とするマルチ処理ノードシステム。
【００６１】
（付記３）付記１において、
各処理ノードには、システム内のグローバルアドレスと処理ノード内のローカルアドレスとを変換するアドレス変換回路を、前記データネットワークとの間に有することを特徴とするマルチ処理ノードシステム。
【００６２】
（付記４）付記１において、
各スケジューラは、対応する処理ノードにタスク開始制御を行った時に当該処理ノードへのクロック供給を開始し、処理ノードからのタスク終了信号に応答して当該クロック供給を停止することを特徴とするマルチ処理ノードシステム。
【００６３】
（付記５）付記１において、
前記データネットワークは、前記複数の処理ノードに接続されたデータバスと、当該データバスのバス管理を行うバスアービタとを有することを特徴とするマルチ処理ノードシステム。
【００６４】
（付記６）付記１において、
前記スケジューラは、他のスケジューラからのタスク完了メッセージの受信と、対応する処理ノードからのタスク終了信号の受信とを監視し、前記タスク完了メッセージを受信したときに、対応する処理ノードがタスク処理中の場合は、当該タスク完了メッセージに対応するタスクの開始を待機し、前記タスク終了信号の受信に応答して、当該待機させたタスクの開始を制御することを特徴とするマルチ処理ノードシステム。
【００６５】
（付記７）付記１において、
前記複数の処理ノードが、複数のタスクを並列に処理して、パイプライン処理が行われることを特徴とするマルチ処理ノードシステム。
【００６６】
（付記８）マルチ処理ノードシステムにおいて、
それぞれ所定のタスクを行う複数の処理ノードと、
前記複数の処理ノードに接続され当該複数の処理ノード間でデータ転送が行われるデータネットワークと、
前記複数の処理ノードそれぞれに対応して設けられ、対応する処理ノードへタスク開始制御を行う複数のスケジューラと、
前記複数のスケジューラに接続され、当該複数のスケジューラ間でのメッセージの転送が行われるメッセージネットワークとを有し、
各スケジューラは、対応する処理ノードからのタスク終了信号に応答して、タスク完了メッセージをプッシュ方式で次のタスクを行う処理ノードのスケジューラ転送し、各スケジューラは、タスク完了メッセージの受信に応答して、前記タスク開始制御を行い、
各処理ノードは、タスク処理を開始するときに前タスクを実行した処理ノードのローカルメモリから必要なデータをプル方式で取得することを特徴とするマルチ処理ノードシステム。
【００６７】
（付記９）付記８において、
前記タスク完了メッセージには、処理済みデータのローカルアドレスが含まれていることを特徴とするマルチ処理ノードシステム。
【００６８】
（付記１０）付記８において、
前記スケジューラや、前記タスク完了メッセージを受信した時に、対応する処理ノードがタスク処理中であれば、当該タスクが終了するまで前記タスク開始制御を待機することを特徴とするマルチ処理ノードシステム。
【００６９】
【発明の効果】
以上、本発明によれば、複数の処理ノードそれぞれにスケジューラを設け、スケジューラは対応する処理ノードでタスク処理が完了するとタスク完了メッセージを次のタスクを処理する他のスケジューラに送信し、他のスケジューラからタスク完了メッセージを受信すると対応する処理ノードにそのタスクの処理開始を指令するので、簡単なスケジュール管理で、リアルタイム処理及びパイプライン処理を実現することができる。
【図面の簡単な説明】
【図１】従来のマルチ処理ノードシステムの構成例を示す図である。
【図２】本実施の形態におけるマルチ処理ノードシステムの構成図である。
【図３】本実施の形態におけるスケジューラ間のメッセージ転送と処理タスク間のデータ転送を説明する図である。
【図４】本実施の形態におけるスケジューラのタスク制御のフローチャート図である。
【図５】本実施の形態におけるスケジューラのクロック制御を説明する図である。
【図６】プログラムの具体例（１）を示す図である。
【図７】スケジューラＳＣ０のタスク管理プログラム例である。
【図８】スケジューラＳＣ１とＳＣ２のタスク管理プログラム例である。
【図９】スケジューラＳＣ３とＳＣ４のタスク管理プログラム例である。
【図１０】具体例（１）におけるプログラムＡとＢに対するパイプライン処理を示すタイミングチャート図である。
【図１１】プログラムの具体例（２）を示す図である。
【図１２】スケジューラＳＣ０とＳＣ１のタスク管理プログラム例である。
【図１３】スケジューラＳＣ２とＳＣ３のタスク管理プログラム例である。
【図１４】スケジューラＳＣ４のタスク管理プログラム例である。
【図１５】ケース２の場合のスケジューラＳＣ１とＳＣ４のタスク管理プログラム例である。
【図１６】具体例（２）におけるプログラムＡのタスク処理のタイミングチャート図である。
【符号の説明】
ＰＮ１〜ＰＮ４：処理ノード、ＳＣ０〜ＳＣ４：スケジューラ
２：データネットワーク、１０：メッセージネットワーク
１２：バスアービタ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a system having multiple processing nodes, and more particularly to a multiprocessing node system having versatility and real-time property.
[0002]
[Prior art]
A conventional real-time processing system is generally configured by a dedicated LSI such as an ASIC (Application Specific IC). A system using such a dedicated LSI has advantages in processing capacity and power consumption, but cannot be reconfigured, is difficult to cope with other applications, and has a disadvantage that it lacks versatility. Therefore, there is a demand for a system having a certain degree of versatility in addition to real-time processing. Such a system has a multi-processing node configuration having a plurality of processing nodes (processing modules) in order to respond to a huge processing request. Is desired. The multi-processing node system has a configuration in which each processing node has the same configuration (Homogeneous) and a configuration in which each processing node has a different configuration (Heterogeneous). The present invention is based on this heterogeneous structure. Each processing node is programmable or re-configurable, and often has a different configuration suitable for each processing. In order to control these plural processing nodes cooperatively, there is a method of centralized control or distributed control. In the case of centralized control, a common scheduler processor is provided for a plurality of processing nodes, and the scheduler sends a task start instruction to the plurality of processing nodes, receives a task end message from the plurality of processing nodes, and appropriately executes the task. Assign to each processing node. Such a multi-processing system is described, for example, in Patent Document 1 below.
[0003]
FIG. 1 is a diagram illustrating a configuration example of a conventional multi-processing node system. In this example, the multi-processing node system 1 has four processing nodes PN1 to PN4, and the processing nodes are connected via the data network 2 and transmit and receive data. A common memory 4 is provided in the data network 2, and each processing node stores data in and reads data from the common memory 4. The processing of the four processing nodes PN1 to PN4 is controlled by a common scheduler (micro control unit: MCU) 3. That is, the scheduler 3 receives the state signals and the interrupt signals S1 to S4 from the respective processing nodes, and outputs the control signals C1 to C4 to the respective processing nodes according to a processing program (not shown). Each processing node is a general-purpose processor having the same configuration, and executes each processing in response to the control signals C1 to C4.
[0004]
[Patent Document 1]
Japanese Patent Publication No. 6-42234 (FIG. 1)
[0005]
[Problems to be solved by the invention]
As a control method of the multi-processing node, a control method that does not perform pipeline processing and a control method that performs pipeline processing can be considered. In a control scheme without pipeline processing, the control is simple, but the multiple processing nodes do not operate in parallel and do not start the next series of tasks until the series of tasks are completed, and therefore, However, there is a problem that the operation rate of the processing node is reduced and the processing node is not efficient. Another disadvantage is that real-time processing cannot be realized because the processing is not pipeline processing.
[0006]
On the other hand, in the control method of performing pipeline processing, although the operating rate of the processing nodes increases and approaches real-time processing, the common scheduler performs task activation to each processing node and task end interrupt processing from each processing node. However, there is a disadvantage that the complexity of the scheduler program increases. When a plurality of processing nodes are processors having the same configuration, the weights of the processing of the tasks assigned to each processing node are not balanced, and it is extremely difficult to realize a pipeline operation. In particular, when a branch occurs, it becomes difficult to synchronize for pipeline processing, and if the processing order is changed along with data waiting, control becomes more and more complicated. Furthermore, when the number of processing nodes increases, it is necessary to dedicate much processing time to interrupt processing, and there is a possibility that real-time performance cannot be ensured. This may not provide scalability due to lack of system scalability. In addition, all processing tasks need to share the same memory space corresponding to a common scheduler, which makes it difficult to design processing nodes with different architectures and compilers.
[0007]
Therefore, an object of the present invention is to provide a multi-processing node system that can simplify control, is versatile, and has real-time properties.
[0008]
[Means for Solving the Problems]
In order to achieve the above object, according to one aspect of the present invention, in a multi-processing node system, a plurality of tasks each performing a predetermined task and processing throughputs for tasks having different distributed weights are configured substantially equally. A processing node, a data network connected to the plurality of processing nodes and performing data transfer between the plurality of processing nodes, and a task network provided for each of the plurality of processing nodes, and controlling task start control to the corresponding processing node. A plurality of schedulers for receiving task end signals and controlling task schedules; and a message network connected to the plurality of schedulers for transferring messages between the plurality of schedulers. The scheduler performs a task start control on a corresponding processing node in response to a task end message transmitted from another scheduler, and responds to a task end signal from the corresponding processing node to execute a next task. A task end message is transmitted to the scheduler of the processing node performing the task via the message network.
[0009]
According to the above aspect of the present invention, a scheduler is provided for each processing node, and each scheduler controls a task to the corresponding processing node, and transfers a task end message between the schedulers to execute the task completion. Schedule control. Therefore, when the processing node A finishes the task, the next task is performed at the processing node B, and at the same time, the processing node A can start the next task. In other words, a plurality of processing nodes are configured so that the processing throughput for tasks distributed to them with different weights is substantially equal, so that a plurality of processing nodes respond to a processing request consisting of a series of multiple tasks. A plurality of tasks can be pipelined, and task control of a plurality of processing nodes is distributed to a plurality of schedulers provided in each processing node, so that schedule control is simplified and real-time processing becomes possible.
[0010]
In an aspect of the above invention, in a preferred embodiment, the processing node has a local memory, stores processed data in the local memory at the end of task processing, and the scheduler includes, in a task end message, And the local node address and the processing node information in which the processed data is stored. Then, the processing node that processes the next task acquires the processed data via the data network based on the processing node information and the local address included in the task end message. Each processing node has an address conversion circuit that converts a global address of the system and a local address in the processing node.
[0011]
In this preferred embodiment, each processing node can acquire data at a timing that can be processed. Further, each processing node can perform data processing based on its local address, and does not need to share a local address space with other processing nodes. Therefore, each processing node is configured with a unique architecture, and a unique compiler can be used, which increases design flexibility and expandability of the system.
[0012]
In another preferred embodiment of the above aspect of the present invention, each scheduler starts clock supply to a corresponding processing node when performing task start control on the corresponding processing node, and responds to a task end signal from the processing node. Then, the clock supply is stopped. Each scheduler controls the task to the corresponding processing node and also supplies and stops the operation clock, thereby saving power.
[0013]
In still another embodiment of the above aspect of the present invention, the data network includes a data bus connected to the plurality of processing nodes, and a bus arbiter that manages the data bus.
[0014]
According to another aspect of the present invention, each scheduler supplies a task end message to the scheduler of the processing node that performs the next task in a push manner, and each scheduler performs task start control in response to receiving the task end message. If the task is being processed at that time, it waits until the task is completed. Further, each processing node acquires necessary data from the local memory of the processing node that has executed the previous task at the time of starting the task by a pull method. By doing so, real-time processing and pipeline processing can be realized by simple control.
[0015]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the scope of protection of the present invention is not limited to the following embodiments, but extends to the inventions described in the claims and their equivalents.
[0016]
FIG. 2 is a configuration diagram of the multi-processing node system according to the present embodiment. This multi-processing node system 1 has a block interface BIF, which is an interface with an external bus, and a scheduler SC0 corresponding to the block interface BIF. The block interface BIF has an input buffer IB0 and an output buffer OB0. Further, the multi-processing node system 1 has a plurality of, for example, four processing nodes PN1 to PN4, and schedulers SC1 to SC4 for managing processing schedules corresponding thereto. Each scheduler manages schedules only for the corresponding processing nodes, and does not perform schedule management for other processing nodes. However, a message is transferred between the schedulers to notify the end of the task processing. Accordingly, each scheduler receives the status signals S0 to S4 from the corresponding processing node, and supplies control signals C0 to C4 such as task start to the processing node.
[0017]
The processing nodes PN1 to PN4 are, for example, circuit modules such as a processor that performs a specific process, a general-purpose processor, a digital signal processor, and reconfigurable hardware, and are not necessarily all processors having the same configuration, and are not necessarily processors having the same configuration. The processor has a different configuration corresponding to the purpose. As will be described later, the processing nodes are individually customized so that the processing throughput (processing amount) for the tasks to be distributed with different weights becomes substantially equal. That is, even if the weights of the tasks distributed to the respective processing nodes are different, the processing capacity of each processing node is made to correspond to it, and the throughput at each processing node is made substantially equal. Each processing node has input buffers IB1 to IB4 and output buffers OB1 to OB4 as local memories. The schedulers SC0 to SC4 are, for example, microcontrollers, and include a CPU, an internal memory, an input / output buffer, and the like. In this internal memory, a task control program of the corresponding processing node is stored as appropriate.
[0018]
The schedulers SC0 to SC4 are connected via the message network 10, and message transfer relating to task control is performed between the schedulers. This message includes at least a task end message, as described later. The processing nodes PN1 to PN4 and the block interface BIF are connected via the data network 2, and data transfer is performed between them. The inside of the block interface BIF and the processing nodes PN1 to PN4 is a local address space, whereas the data network 2 is a global address space common to the block interface BIF and the processing node PN1. Therefore, between the output buffer OB0 of the block interface BIF and the input / output buffers IB1 to IB4 and OB1 to OB4 of each processing node and the data network 2, the data ports DP0 for performing the address conversion between the global address and the local address are provided. DP4 is provided. Further, in the example of FIG. 2, the data network 2 is provided with four data buses DB1 to DB4, each of which is provided with a bus arbiter circuit 12 for performing bus management. The number of data buses of the data network 2 can be appropriately selected according to the traffic of data transfer.
[0019]
FIG. 3 is a diagram illustrating message transfer between schedulers and data transfer between processing tasks according to the present embodiment. For the sake of explanation, a case will be described in which the processing node PNA processes the task A and the processing node PNB subsequently processes the task B. First, when the scheduler SCA supplies the start control signal CA of task A to the processing node PNA, the processing node PNA starts processing of task A. When the processing of the task A ends, the processing node PNA stores the processed data in a predetermined local address of the output buffer OB, and supplies a status signal SA indicating the end of the task A to the scheduler SCA (processing P1). . At this time, the local address of the data is supplied to the scheduler SCA as a status signal. In response to the task end interrupt signal SA, the scheduler SCA transfers the end message of the task A to the scheduler SCB via the message network 10 (process P2). This task end message includes information indicating that the task A has ended, the processing node information storing the processed data, and the local address thereof.
[0020]
The scheduler SCB controls the schedule of the next task B in response to the task A end message. That is, the scheduler SCB sends a control signal CB for instructing the processing node PNB to start processing when the preceding task processing in the processing node PNB is completed or when no task processing is performed in the processing node PNB. Then, the processing of the task B in the processing node PNB is started. At this time, the data storage destination information, the processing node PNA information and its local address included in the task A end message are also given to the processing node PNB.
[0021]
In response to the start control of the task B, the processing node PNB acquires the processed data from the output buffer of the processing node PNA according to the given processing node PNA information and its local address (processing P3). This data acquisition is performed by issuing an access request together with an address to a bus in the data network. As described above, the bus management for the access request is performed by the bus arbiter 12. Then, the requested data is transferred from the output buffer OB of the processing node PNA to the input buffer IB of the processing node PNB (processing P4). In the data port DP, address conversion between the local address 30 and the global address 20 is performed.
[0022]
FIG. 4 is a flowchart of task control of the scheduler according to the present embodiment. This flowchart exemplifies a scheduler of a processing node to which tasks 1 and 2 are assigned. The scheduler constantly monitors whether a completion message of the task preceding the assigned task 1 or task 2 has been received (S100, S110). This monitoring is performed by checking whether or not a task end message has been received via the message network 10. Further, the scheduler also monitors whether the corresponding processing task has completed the task processing (S120). This monitoring is performed by a status signal from the corresponding processing node.
[0023]
In response to the reception of the task end message, it is checked whether the processing node is currently processing another task (S102, S112). If not, the task start control is performed (S104, S114). If it is medium, it is registered in the task queue (S106, S116). In addition, in response to the task end signal from the processing node, a task end message is transmitted via the message network 10 to the scheduler of the processing node that processes the next task (S122). After the transmission, if there is a waiting task (S124), the next task start control signal is given to the processing node to start the task processing (S126).
[0024]
Further, in each of the task start controls S104, S114, and S126, the data processed by the preceding task and stored in the output buffer is read from another processing node in accordance with the subsequent task processing, and the output buffer is in an empty state. Is confirmed. That is, the processed data is stored in the output buffer along with the task processing, and the processed data is read from the processing node that performs the subsequent task processing. Only after the processed data is read by the processing node that performs the subsequent task processing, new task processing cannot be started.
[0025]
As described above, the scheduler performs the start control of the task assigned to the corresponding processing node in response to the end message of the preceding task, and transmits the task end message to another scheduler after the end of the task. Therefore, the scheduler has to perform task control only on the processing node under control, so that the control is simple and the overhead for schedule control can be reduced. Also, if the assigned task is processed for a processing program consisting of a series of multiple tasks, the task for the next processing program can be processed as long as there is no task queue, and the parallel processing by multiple processing nodes can be performed. Becomes possible.
[0026]
FIG. 5 is a diagram illustrating clock control of the scheduler according to the present embodiment. The scheduler SC1 also manages the operation clock along with the task management of the processing node PN1. That is, when the task start control is performed on the processing node (time t1), the scheduler SC1 outputs a clock enable signal to the PLL circuit and causes the PLL circuit to supply the operation clock clock to the processing node PN1. When the completion of the task is detected from the processing node PN1 by the state signal S1, if there is no task to be executed next (time t2), the clock enable signal is disabled and the supply of the operation clock by the PLL circuit is performed. Stop. As described above, the scheduler also controls the operation clock in addition to the task control, so that the supply of the operation clock is stopped when the processing node is not performing the task processing, thereby saving power. In the flowchart of FIG. 4, the clock enable signal is activated in steps S104 and S114, and is deactivated when there is no waiting task in step S124.
[0027]
FIG. 6 is a diagram showing a specific example (1) of the program. In this specific example, a program A and a program B are required to be processed by the task processing node system. As shown in FIG. The program B is composed of tasks b1 to b4 that are sequentially processed. Therefore, as shown in FIG. 3B, tasks having different weights included in these programs are distributed so that the processing amounts are equalized as much as possible in the four processing nodes PN1 to PN4. That is, for the program A, the processing throughput of the tasks a1 and a3 by the module 1, the (processing throughput) of the tasks a2 and a5 by the module 2, and the processing throughput of the task a4 by the module 3 are almost equal. The configuration of each module and the distribution of tasks are performed. Similarly, for the program B, the processing throughput of the task b1 by the module 1, the processing throughput of the task b2 by the module 2, the processing throughput of the task b3 by the module 3, and the processing throughput of the task b4 by the module 4 are almost equal. Thus, the configuration of each module and the distribution of tasks are performed. In this way, by making the processing throughput of each module the same, real-time processing and pipeline processing can be performed as shown in an example described later.
[0028]
In the figure, the processing nodes PN1 to PN4 are also referred to as modules 1 to 4, and may be described below as modules, but that means the processing nodes. Each processing node has a configuration suitable for a predetermined process, and the task is distributed according to the suitable process. An example of a task management program of each scheduler when tasks are distributed to modules as in this specific example will be described below.
[0029]
FIG. 7 is an example of a task management program of the scheduler SC0. This management program includes steps P100 to P103. In step P100, upon receiving a start instruction of the program A, a message “start of the program A, the address of the BIF addr0, the data size” is transmitted to the scheduler of the module 1. In Step P101, a message indicating that the task a5 of the program A has been completed is monitored, and upon receiving the completion message, data of the data size is output to the address addr0 of the module 2 (m2) outside the system. In step P102, upon receiving the command to start the program B, a message “start of the program B, data size size to the address addr1 of the BIF” is transmitted to the scheduler of the module 1. In Step P103, a message that the task b4 of the program B has been completed is monitored, and upon receiving the completion message, data of the data size is output to the address addr0 of the module 4 (m4) to the outside of the system.
[0030]
FIG. 8 is an example of a task management program for the schedulers SC1 and SC2. In the example of the program of the scheduler SC1, in step P110, the reception of the start message "pA->start" of the program A is monitored, and when it is received, the data is obtained by accessing the local address addr0 of the BIF included in the message. , Execute the task a1, and transmit the message “pA-> task.a1, m1.addr0, size” to the scheduler of the module 2 after the completion of the task a1. In Step P111, the completion message “pA-> task.a2” of the task a2 of the program A is monitored to be received. When the completion message is received, the local address addr0 of the module 2 included in the message is accessed to acquire data. The task a3 is executed, and a message “pA-> task.a3, m1.addr1, size” is transmitted to the scheduler of the module 3 after the task a3 is completed. Step P112 is control when a start message of the program B is received, and is the same as step P110.
[0031]
In step P120, the task management program of the scheduler SC2 starts the next task a2 when the completion message of the task a1 of the program A is received, and transmits the completion message to the scheduler of the module 1 when the completion is completed. The same applies to steps P121 and P122.
[0032]
FIG. 9 is an example of a task management program for the schedulers SC3 and SC4. In Step P130, when the completion message of the task a3 of the program A is received, the next task a4 is started, and when it is completed, the completion message is transmitted to the scheduler of the module 2. The same applies to step P131. The task management program of the scheduler SC4 starts the next task b4 when receiving the completion message of the task b3 of the program B, and transmits the completion message to the scheduler of the block interface BIF when the task b4 is completed.
[0033]
By installing the task management program in each of the schedulers SC0 to SC4, the multi-processing node system 1 executes the programs A and B in real-time processing and pipeline processing in response to a start request from an external system bus. can do.
[0034]
FIG. 10 is a timing chart showing pipeline processing for programs A and B in the specific example (1). The horizontal axis represents time, and data supplied to the block interface BIF and tasks performed by each module are shown. In this example, the execution requests of the program A and the program B are alternately performed, and data A and data B are alternately supplied to the block interface BIF. Further, as described above, the hardware configuration and the task distribution are performed so that the respective task processing throughputs in the modules 1 to 4 become substantially the same.
[0035]
In FIG. 10, data A is supplied to the block interface BIF in the pipeline stage PS1, and an execution request of the program A is made. In response, in the next pipeline stage PS2, module 1 processes task a1, and module 2 processes task a2 in response to its completion message. At the same stage PS2, the data B is supplied to the block interface BIF, and the execution request of the program B is performed.
[0036]
In response to this execution request, the module 1 processes the task b1 in the next stage PS3. During this time, the processing of task a3 for data A in stage PS1 is queued. The data A is supplied at this stage PS3.
[0037]
In the stage PS4, the module 1 processes the task a1 in response to the data A in the stage PS3, and further, the module 1 processes the task a3 for the data A in the stage PS1. At the stage PS4, since the module 2 is processing the task b2, the module 2 cannot process the task a2 for the data A at the stage PS3.
[0038]
In the stage PS5, the module 2 processes the task a2 for the data A of the stage PS3 in the waiting state, and the module 3 processes the task a4. At the same time, module 1 processes task b1 for data B.
[0039]
Further, in the stage PS6, the module 2 processes the task b2 for the data B of the stage PS4, and the module 3 processes the task b3 for the data B of the stage PS2.
[0040]
Then, at stage PS7, task a5 for data A of stage PS1 is processed by module 2, and task b4 for data B of stage PS2 is processed by module 4. That is, the processing of the first program A and the next program B is completed in the stage PS7.
[0041]
Thereafter, in response to the data A and the data B being supplied alternately, the module 1 alternately performs the processing of the tasks a1 and z3 and the processing of the task b1, and the module 2 executes the tasks a2 and a5. And the process of the task b2 are alternately performed, the module 3 alternately performs the process of the task a4 and the process of the task b3, and the module 4 performs the process of the task b4 intermittently. That is, in a state where the task processing of each processing node is saturated, the programs A and B required for each stage are pipelined. Therefore, data A and B are processed in real time.
[0042]
Although not shown, when the data A is supplied at the stage PS2, if the processed data in the output buffer in the module 1 is not read out with the processing of the task a2 in the module 2, the task in the module 1 a1 cannot be started. Therefore, even if the module 1 has room for processing in the stage PS2, the module 1 performs the process of the task a1 in the stage PS3. When the capacity of the output buffer for the task a1 is large, a plurality of tasks a1 can be continuously processed in the initial state. However, when the pipeline processing is saturated, the processing is as shown in FIG.
[0043]
FIG. 11 is a diagram illustrating a specific example (2) of the program. In this specific example, a program A and a program B are programs required to be processed in the task processing node system, and as shown in (a) in the figure, the program A has tasks a1 to a5, A branch instruction that executes one of tasks a2 and a3 when the state when a1 is completed is case 1 and executes both tasks a2 and a3 when the state when task a1 is completed is case 2 It is included. Then, the tasks a1 and a5 are distributed to the processing nodes PN1, and the tasks a2, a3 and a4 are distributed to the processing nodes PN2, PN3 and PN4, respectively. This distribution can be performed by describing a task management program of a scheduler corresponding to each processing node.
[0044]
Further, as described above, the module configuration and the task distribution are performed so that the processing throughput of the distributed tasks in each module is substantially equal.
[0045]
FIG. 12 is an example of a task management program for the schedulers SC0 and SC1. In the management program of the scheduler SC0, in step P100, upon receiving a start instruction of the program A, a message "start of the program A, address BDR address addr0, data size size" is transmitted to the scheduler of the module 1. In Step P104, a message indicating that the task a5 of the program A has been completed is monitored, and upon receiving the completion message, data of the data size is output to the outside of the system at the address addr1 of the module 1 (m1).
[0046]
FIG. 12 is an example of a program of the scheduler SC1 in case 1. In step P113, the reception of the start message “pA-> start” of the program A is monitored, and if it is received, the local message of the BIF included in the message is monitored. The address addr0 is accessed to acquire data, the task a1 is executed, and after completion of the task a1, if the state of the module 1 is "task 2", the message "pA-> task.a1, m1.addr0, size ”, and if the status of the module 1 is“ task 3 ”, the message“ pA-> task.a1, m1.addr0, size ”is transmitted to the scheduler of the module 3. In Step P114, the reception of the completion message “pA-> task.a5” of the task a5 of the program A is monitored, and when the completion message is received, the local address addr0 of the module 4 included in the message is accessed to acquire the data. The task a5 is executed, and the message “pA-> task.a5, m1.addr1, size” is transmitted to the BIF scheduler after the task a5 is completed.
[0047]
FIG. 13 is an example of a task management program for the schedulers SC2 and SC3. In step P123, the task management program of the scheduler SC2 starts the next task a2 when the completion message of the task a1 of the program A is received, and transmits the completion message to the scheduler of the module 4 when the task is completed. In step P132, the task management program of the scheduler SC3 starts the next task a3 when the completion message of the task a1 of the program A is received, and transmits the completion message to the scheduler of the module 4 when the completion is completed.
[0048]
FIG. 14 is an example of a task management program of the scheduler SC4. This is also an example of case 1. In Step P141, when the completion message of the task a2 of the program A is received, the next task a4 is started, and when it is completed, the completion message is transmitted to the scheduler of the BIF. In step P142, when the completion message of the task a3 of the program A is received, the next task a4 is started, and when it is completed, the completion message is transmitted to the BIF scheduler.
[0049]
FIG. 15 is an example of a task management program of the schedulers SC1 and SC4 in case 2. In the example program of the scheduler SC1, in Step P115, the reception of the start message "pA->start" of the program A is monitored, and when it is received, the data is obtained by accessing the local address addr0 of the BIF included in the message. , The task a1, and after the task a1, the message “pA-> task.a1, m1.addr0, size” is transmitted to the schedulers of the modules 2 and 3. Step P114 is as described above.
[0050]
In the task management program of the scheduler SC4, in Step P143, the completion messages “pA-> task.a2” and “pA-> task.a3” of the tasks a2 and a3 of the program A are monitored. , The local address addr0 of the modules 2 and 3 included in the message is accessed to acquire both data, the task a4 is executed, and after the task a4 ends, the message “pA-> task.a4, m4” is sent to the BIF scheduler. .Addr0, size "is transmitted.
[0051]
FIG. 16 is a timing chart showing the pipeline processing of the program A in the specific example (2). FIG. 16A shows an example of Case 1 and FIG. 16B shows an example of Case 2. As described above, the configuration and task distribution of each module are based on the processing throughput of the tasks a1 and a3 by the module 1, the processing throughput of the task a2 by the module 2, the processing throughput of the task a4 by the module 3, and the task a4 by the module 4. And the processing throughput is substantially equalized. Hereinafter, the pipeline processing in each of Cases 1 and 2 in a case where data is supplied to the block interface in each pipeline stage and there is a request to start the program A will be described.
[0052]
In the case of FIG. 16A, for the data 1 supplied at the pipeline stage PS1, the module 1 processes the task a1 at the next stage PS2. Further, with the completion of the processing of the task a1, the module 2 processes the task a2. For the data 2 supplied at the stage PS2, the module 2 reads out the processed data in the output buffer of the module 1 along with the processing of the task a2. Never start.
[0053]
In stage PS3, module 1 processes task a1 for data 2. Further, in response to the completion of the processing of the task a2, the module 4 starts the processing of the task a4. Also, as a result of processing task a1 for data 2, assuming that the state of module 1 is task a3, module 3 starts processing task a3 at stage PS3.
[0054]
In stage PS4, module 1 processes task a1 for data 3, and processes task a5 for data 1 in the remaining time. As a result of the processing of task a1 on data 3, the state of task a2 is reached, and module 2 is processing task a2. Further, the module 4 processes the task a4 for the data 2.
[0055]
Thereafter, at each pipeline stage, module 1 processes tasks a1 and a5, module 2 or 3 processes task a2 or a3, and module 4 processes task a4. When data is supplied for each stage, saturation occurs in this state.
[0056]
In the case of FIG. 16A, after the task a1 is processed, the tasks a2 and a3 are simultaneously processed. Otherwise, it is the same as FIG. 16 (A).
[0057]
As described above, in the above-described embodiment, the configurations of the plurality of processing nodes are configured differently, so that the processing throughput is equal for tasks having different weights distributed to the respective processing nodes. A scheduler that schedules the processing node transfers the task completion message and controls the next task start in response to the task completion message. As a result, pipeline processing becomes possible, and real-time processing of data becomes possible.
[0058]
As described above, the embodiments are summarized as follows.
[0059]
(Supplementary Note 1) In the multi-processing node system,
A plurality of processing nodes each performing a predetermined task, and processing throughputs for tasks having different distributed weights are configured substantially equally;
A data network connected to the plurality of processing nodes and performing data transfer between the plurality of processing nodes;
A plurality of schedulers provided corresponding to each of the plurality of processing nodes, performing a task start control to the corresponding processing node, and further receiving a task end signal and performing a task schedule control;
A message network that is connected to the plurality of schedulers and that transfers messages between the plurality of schedulers;
The scheduler performs a task start control on a corresponding processing node in response to a task end message transmitted from another scheduler, and performs a next task in response to a task end signal from the corresponding processing node. A multi-processing node system for transmitting a task end message to a scheduler of a processing node via the message network.
[0060]
(Supplementary Note 2) In Supplementary Note 1,
The processing node has a local memory, stores the processed data in the local memory at the end of the task processing,
The scheduler includes, in a task end message, processing node information and a local address in which the processed data is stored together with the completed task information,
A processing node that processes a subsequent task, wherein the processing node obtains the processed data via the data network based on the processing node information and the local address included in the task end message. .
[0061]
(Supplementary Note 3) In Supplementary note 1,
A multi-processing node system, wherein each processing node has an address translation circuit for translating between a global address in the system and a local address in the processing node between the processing network and the data network.
[0062]
(Supplementary Note 4) In Supplementary Note 1,
Each scheduler starts clock supply to the corresponding processing node when task start control is performed on the corresponding processing node, and stops the clock supply in response to a task end signal from the processing node. Processing node system.
[0063]
(Supplementary Note 5) In Supplementary Note 1,
The multi-processing node system according to claim 1, wherein the data network includes a data bus connected to the plurality of processing nodes, and a bus arbiter for managing the data bus.
[0064]
(Supplementary Note 6) In Supplementary Note 1,
The scheduler monitors reception of a task completion message from another scheduler and reception of a task end signal from a corresponding processing node, and when the task completion message is received, the corresponding processing node is executing a task. In the case of (1), the multiprocessing node system waits for the start of the task corresponding to the task completion message, and controls the start of the waiting task in response to receiving the task end signal.
[0065]
(Supplementary Note 7) In Supplementary Note 1,
A multi-processing node system, wherein the plurality of processing nodes process a plurality of tasks in parallel to perform pipeline processing.
[0066]
(Supplementary Note 8) In the multi-processing node system,
A plurality of processing nodes each performing a predetermined task,
A data network connected to the plurality of processing nodes and performing data transfer between the plurality of processing nodes;
A plurality of schedulers provided corresponding to each of the plurality of processing nodes and performing task start control to the corresponding processing nodes;
A message network that is connected to the plurality of schedulers and that transfers messages between the plurality of schedulers;
Each scheduler transfers a task completion message to the scheduler of the processing node performing the next task in a push manner in response to a task end signal from the corresponding processing node, and each scheduler responds to the reception of the task completion message. Performing the task start control,
A multi-processing node system, wherein each processing node acquires necessary data from a local memory of a processing node that has executed a previous task by a pull method when task processing is started.
[0067]
(Supplementary Note 9) In Supplementary note 8,
A multi-processing node system, wherein the task completion message includes a local address of processed data.
[0068]
(Supplementary Note 10) In Supplementary Note 8,
If the corresponding processing node is processing a task when the scheduler or the task completion message is received, the task processing control waits until the task is completed.
[0069]
【The invention's effect】
As described above, according to the present invention, a scheduler is provided for each of a plurality of processing nodes, and when the task processing is completed in the corresponding processing node, the scheduler transmits a task completion message to another scheduler that processes the next task, and the other scheduler When a task completion message is received from, the corresponding processing node is instructed to start processing the task, so that real-time processing and pipeline processing can be realized with simple schedule management.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration example of a conventional multi-processing node system.
FIG. 2 is a configuration diagram of a multi-processing node system according to the present embodiment.
FIG. 3 is a diagram illustrating message transfer between schedulers and data transfer between processing tasks according to the present embodiment.
FIG. 4 is a flowchart of task control of a scheduler according to the present embodiment.
FIG. 5 is a diagram illustrating clock control of a scheduler according to the present embodiment.
FIG. 6 is a diagram showing a specific example (1) of a program.
FIG. 7 is an example of a task management program of the scheduler SC0.
FIG. 8 is an example of a task management program for schedulers SC1 and SC2.
FIG. 9 is an example of a task management program for schedulers SC3 and SC4.
FIG. 10 is a timing chart showing a pipeline process for programs A and B in a specific example (1).
FIG. 11 is a diagram showing a specific example (2) of a program.
FIG. 12 is an example of a task management program for schedulers SC0 and SC1.
FIG. 13 is an example of a task management program for schedulers SC2 and SC3.
FIG. 14 is an example of a task management program of the scheduler SC4.
FIG. 15 is an example of a task management program of schedulers SC1 and SC4 in case 2;
FIG. 16 is a timing chart of the task processing of the program A in the specific example (2).
[Explanation of symbols]
PN1 to PN4: processing nodes, SC0 to SC4: scheduler
2: Data network, 10: Message network
12: Bus Arbiter

Claims

In a multi-processing node system,
A plurality of processing nodes each performing a predetermined task, and processing throughputs for tasks having different distributed weights are configured substantially equally;
A data network connected to the plurality of processing nodes and performing data transfer between the plurality of processing nodes;
A plurality of schedulers provided corresponding to each of the plurality of processing nodes, performing a task start control to the corresponding processing node, and further receiving a task end signal and performing a task schedule control;
A message network that is connected to the plurality of schedulers and that transfers messages between the plurality of schedulers;
The scheduler performs a task start control on a corresponding processing node in response to a task end message transmitted from another scheduler, and performs a next task in response to a task end signal from the corresponding processing node. A multi-processing node system for transmitting a task end message to a scheduler of a processing node via the message network.

In claim 1,
The processing node has a local memory, stores the processed data in the local memory at the end of the task processing,
The scheduler includes, in a task end message, processing node information and a local address in which the processed data is stored together with the completed task information,
A processing node that processes a subsequent task, wherein the processing node obtains the processed data via the data network based on the processing node information and the local address included in the task end message. .

In claim 1,
A multi-processing node system, wherein each processing node has an address translation circuit for translating between a global address in the system and a local address in the processing node between the processing network and the data network.

In claim 1,
Each scheduler starts clock supply to the corresponding processing node when task start control is performed on the corresponding processing node, and stops the clock supply in response to a task end signal from the processing node. Processing node system.

In a multi-processing node system,
A plurality of processing nodes each performing a predetermined task,
A data network connected to the plurality of processing nodes and performing data transfer between the plurality of processing nodes;
A plurality of schedulers provided corresponding to each of the plurality of processing nodes and performing task start control to the corresponding processing nodes;
A message network that is connected to the plurality of schedulers and that transfers messages between the plurality of schedulers;
Each scheduler transfers a task completion message to the scheduler of the processing node performing the next task in a push manner in response to a task end signal from the corresponding processing node, and each scheduler responds to the reception of the task completion message. Performing the task start control,
A multi-processing node system, wherein each processing node acquires necessary data from a local memory of a processing node that has executed a previous task by a pull method when task processing is started.

In claim 5,
A multi-processing node system, wherein the task completion message includes a local address of processed data.