JP5708450B2

JP5708450B2 - Multi-core processor system, register utilization method, and register utilization program

Info

Publication number: JP5708450B2
Application number: JP2011246959A
Authority: JP
Inventors: 俊也大友; 浩一郎山下; 鈴木　貴久; 貴久鈴木; 宏真山内; 康志栗原
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-11-10
Filing date: 2011-11-10
Publication date: 2015-04-30
Anticipated expiration: 2031-11-10
Also published as: JP2013105217A

Description

本発明は、レジスタの利用方法に関するマルチコアプロセッサシステム、レジスタ利用方法、およびレジスタ利用プログラムに関する。 The present invention relates to a multi-core processor system, a register utilization method, and a register utilization program relating to a register utilization method.

近年、１つのシステム内に、複数のコアを有するマルチコアプロセッサシステムの形態を採用する機器が増加している。また、複数のコアを利用して、アプリケーションソフトウェア（以下、「アプリ」と称す）を複数のスレッドに分割し、スレッド単位での並列を行うことで、マルチコアプロセッサシステムは、単一のコアで処理を実行する場合より高速処理を可能にしている。なお、スレッドとはプログラムの実行単位である。 In recent years, an increasing number of devices adopt a form of a multi-core processor system having a plurality of cores in one system. In addition, by using multiple cores, application software (hereinafter referred to as “apps”) is divided into multiple threads and parallelized in units of threads. High-speed processing is possible than when executing. A thread is a unit of program execution.

また、スレッドの処理量を細かくし、細粒度並列性を用いることで、マルチコアプロセッサシステムは、スレッド単位の並列処理の性能を向上できる。このとき、細粒度スレッドは、各スレッド間でレジスタを共有しながら実行する。レジスタを共有する場合の処理コードとして、たとえば、レジスタから値を読み込むスレッドは、同期待ちを行い、レジスタの値を書き込むスレッドは、同期待ちスレッドに対して同期通知を行う。レジスタを共有する技術として、たとえば、コアが自身の内部レジスタを使用せずに他コアの内部レジスタを利用する方法を用いて、各スレッドを実行するものがある。また、各ＣＰＵが自身のレジスタに値を書き込むと、他のプロセッサのレジスタに値を書き込む技術が開示されている（たとえば、下記特許文献１、２を参照。）。 Further, by reducing the processing amount of threads and using fine-grain parallelism, the multi-core processor system can improve the performance of parallel processing in units of threads. At this time, the fine-grained thread is executed while sharing a register among the threads. As a processing code for sharing a register, for example, a thread that reads a value from a register waits for synchronization, and a thread that writes a register value sends a synchronization notification to the synchronization waiting thread. As a technique for sharing a register, for example, there is a technique in which a core executes each thread using a method of using an internal register of another core without using its own internal register. Further, a technique is disclosed in which each CPU writes a value to its own register, and the value is written to the register of another processor (see, for example, Patent Documents 1 and 2 below).

特開平６−２３１０８５号公報JP-A-6-231085 特開２００３−９９２４９号公報JP 2003-99249 A

しかしながら、上述した従来技術において、各スレッドは同期通知の回数と同期待ちの回数が偏っていたり、または均等であったりし、レジスタを共有する技術によって処理能力がコアごとで異なるため、コア全体の処理能力が低下してしまう問題がある。 However, in the above-described conventional technology, the number of synchronization notifications and the number of synchronization waits are uneven or equal for each thread, and the processing capability differs depending on the core depending on the technology sharing the register. There is a problem that the processing capacity is reduced.

本発明は、上述した従来技術による問題点を解消するため、処理能力が向上できるマルチコアプロセッサシステム、レジスタ利用方法、およびレジスタ利用プログラムを提供することを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to provide a multi-core processor system, a register use method, and a register use program capable of improving the processing capability in order to solve the above-described problems caused by the prior art.

上述した課題を解決し、目的を達成するため、本発明の一側面によれば、複数のコアのそれぞれに割り当てられるスレッドのうち少なくともいずれか一つのスレッドに関する同期通知数と同期待ち数との差分に基づいた値が所定値より大きいことを示す情報を取得し、情報が取得された場合、複数のコアのうち同期通知を実行するコアのレジスタを他のコアに共有させることにより、複数のコアにより各スレッドを実行するマルチコアプロセッサシステム、レジスタ利用方法、およびレジスタ利用プログラムが提案される。 In order to solve the above-described problems and achieve the object, according to one aspect of the present invention, the difference between the number of synchronization notifications and the number of synchronization waits for at least one of the threads assigned to each of the plurality of cores By acquiring information indicating that the value based on the value is greater than a predetermined value, and when the information is acquired, by sharing the register of the core that performs synchronization notification among the plurality of cores with the other cores. Proposes a multi-core processor system for executing each thread, a register use method, and a register use program.

また、本発明の他の側面によれば、複数のコアのそれぞれに割り当てられるスレッドのいずれのスレッドについても同期通知数と同期待ち数との差分に基づいた値が所定値以下であることを示す情報を取得し、情報が取得された場合、複数のコアのうちいずれかのコアのレジスタの値が更新される都度、他のコアのレジスタに複写することにより、複数のコアによりスレッドを実行するマルチコアプロセッサシステム、レジスタ利用方法、およびレジスタ利用プログラムが提案される。 According to another aspect of the present invention, it is indicated that the value based on the difference between the number of synchronization notifications and the number of synchronization waits is less than or equal to a predetermined value for any of the threads allocated to each of the plurality of cores. Acquire information, and when the information is acquired, each time the value of the register of one of the multiple cores is updated, the thread is executed by multiple cores by copying to the register of the other core A multi-core processor system, a register utilization method, and a register utilization program are proposed.

また、本発明の他の側面によれば、複数のコアのそれぞれに割り当てられるスレッドのうち少なくともいずれか一つのスレッドに関する同期通知数と同期待ち数との差分に基づいた値が所定値より大きいか否かを判断し、値が所定値より大きいと判断された場合、複数のコアのうち同期通知を実行するコアのレジスタを他のコアに共有させることにより、複数のコアにより各スレッドを実行するマルチコアプロセッサシステム、レジスタ利用方法、およびレジスタ利用プログラムが提案される。 Further, according to another aspect of the present invention, is a value based on a difference between the number of synchronization notifications and the number of synchronization waits related to at least one of the threads allocated to each of the plurality of cores greater than a predetermined value? If it is determined that the value is greater than the predetermined value, each thread is executed by a plurality of cores by sharing the register of the core that performs synchronization notification among the plurality of cores with other cores. A multi-core processor system, a register utilization method, and a register utilization program are proposed.

本発明の一側面によれば、処理能力の向上を図ることができるという効果を奏する。 According to one aspect of the present invention, the processing capacity can be improved.

図１は、同期通知数と同期待ち数に偏りがあるスレッドの割当例を示す説明図である。FIG. 1 is an explanatory diagram illustrating an example of thread allocation in which the number of synchronization notifications and the number of synchronization waits are biased. 図２は、同期通知数と同期待ち数に偏りがないスレッドの割当例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of thread allocation in which the number of synchronization notifications and the number of synchronization waiting are not biased. 図３は、マルチコアプロセッサシステムのハードウェア例を示すブロック図である。FIG. 3 is a block diagram illustrating a hardware example of the multi-core processor system. 図４は、同期命令の種別の一例についての説明図である。FIG. 4 is an explanatory diagram of an example of the type of synchronization instruction. 図５は、マルチコアプロセッサシステムの機能例を示すブロック図である。FIG. 5 is a block diagram illustrating an example of functions of the multi-core processor system. 図６は、プロファイル情報の記憶内容の一例を示す説明図である。FIG. 6 is an explanatory diagram showing an example of the stored contents of profile information. 図７は、同期命令に偏りがあるスレッドの実行結果の一例を示す説明図である。FIG. 7 is an explanatory diagram illustrating an example of the execution result of a thread having a biased synchronization instruction. 図８は、同期命令に偏りがないスレッドの実行結果の一例を示す説明図である。FIG. 8 is an explanatory diagram illustrating an example of the execution result of a thread in which there is no bias in synchronization instructions. 図９は、レジスタ値共有方法の判断方法の一例を示す説明図である。FIG. 9 is an explanatory diagram illustrating an example of a determination method of the register value sharing method. 図１０は、第１のスレッド群の前提条件の一例を示す説明図である。FIG. 10 is an explanatory diagram illustrating an example of a precondition for the first thread group. 図１１は、共有方法、または複写方法を用いて第１のスレッド群を実行した場合の結果の一例を示す説明図である。FIG. 11 is an explanatory diagram illustrating an example of a result when the first thread group is executed using the sharing method or the copying method. 図１２は、第２のスレッド群の前提条件の一例を示す説明図である。FIG. 12 is an explanatory diagram illustrating an example of a precondition for the second thread group. 図１３は、共有方法、または複写方法を用いて第２のスレッド群を実行した場合の結果の一例を示す説明図である。FIG. 13 is an explanatory diagram illustrating an example of a result when the second thread group is executed using the sharing method or the copying method. 図１４は、第３のスレッド群の前提条件の一例を示す説明図である。FIG. 14 is an explanatory diagram illustrating an example of a precondition for the third thread group. 図１５は、共有方法、または複写方法を用いて第３のスレッド群を実行した場合の結果の一例を示す説明図である。FIG. 15 is an explanatory diagram illustrating an example of a result when the third thread group is executed using the sharing method or the copying method. 図１６は、レジスタ利用処理の一例を示すフローチャートである。FIG. 16 is a flowchart illustrating an example of register use processing. 図１７は、レジスタ利用処理の他の例を示すフローチャートである。FIG. 17 is a flowchart illustrating another example of register use processing. 図１８は、本実施の形態にかかるコンピュータを用いたシステムの適用例を示す説明図である。FIG. 18 is an explanatory diagram showing an application example of a system using a computer according to the present embodiment.

以下に添付図面を参照して、開示のマルチコアプロセッサシステム、レジスタ利用方法、およびレジスタ利用プログラムの実施の形態を詳細に説明する。 Exemplary embodiments of a disclosed multi-core processor system, a register utilization method, and a register utilization program will be described below in detail with reference to the accompanying drawings.

図１は、同期通知数と同期待ち数に偏りがあるスレッドの割当例を示す説明図である。図１で示すマルチコアプロセッサシステム１００は、複数のＣＰＵとして、ＣＰＵ＃０〜ＣＰＵ＃２を含み、バス１０１で接続されている。また、ＣＰＵ＃０〜ＣＰＵ＃２は、レジスタＲ０〜レジスタＲ４を有し、レジスタＩ／Ｆ１０２＃０〜レジスタＩ／Ｆ１０２＃２の制御によってＣＰＵ＃０〜ＣＰＵ＃２の各レジスタの値を共有する。 FIG. 1 is an explanatory diagram illustrating an example of thread allocation in which the number of synchronization notifications and the number of synchronization waits are biased. A multi-core processor system 100 shown in FIG. 1 includes CPU # 0 to CPU # 2 as a plurality of CPUs, and is connected by a bus 101. CPU # 0 to CPU # 2 have registers R0 to R4, and share the values of the registers of CPU # 0 to CPU # 2 under the control of register I / F 102 # 0 to register I / F 102 # 2. To do.

始めに、ＣＰＵ＃０は、ＣＰＵ＃０〜ＣＰＵ＃２に割り当てられるスレッドＡ＿０〜スレッドＡ＿２に関する同期通知数と同期待ち数との差分に基づいた値が所定値より大きいことを示す情報を取得する。同期通知数と同期待ち数との差分に基づいた値が所定値より大きいことを示す情報を、以下の記述では、同期命令に偏りがあることを示す情報と呼称する。このとき、スレッドＡ＿０〜スレッドＡ＿２は、細粒度並列処理を行うことを前提としており、ＣＰＵ＃０〜ＣＰＵ＃２のレジスタ値を共有することが要求される。 First, CPU # 0 obtains information indicating that a value based on the difference between the number of synchronization notifications and the number of waiting for synchronization related to thread A_0 to thread A_2 assigned to CPU # 0 to CPU # 2 is greater than a predetermined value. . Information indicating that the value based on the difference between the number of synchronization notifications and the number of waiting for synchronization is greater than a predetermined value is referred to as information indicating that the synchronization command is biased in the following description. At this time, thread A_0 to thread A_2 are premised on performing fine-grain parallel processing, and are required to share the register values of CPU # 0 to CPU # 2.

同期通知数とは、同期命令のうちの同期通知を実行する回数であり、同期待ち数とは、同期命令のうちの同期待ちを実行する回数である。なお、同期命令の詳細については、図４にて説明する。また、所定値の具体的な値については、図９にて後述する。 The number of synchronization notifications is the number of times that the synchronization notification of the synchronization command is executed, and the number of synchronization waits is the number of times of waiting for the synchronization of the synchronization command. Details of the synchronization command will be described with reference to FIG. A specific value of the predetermined value will be described later with reference to FIG.

スレッドＡ＿０に関して、スレッドＡ＿０の同期通知数が６であり、同期待ち数が０となることから、差分が６となり、さらに、所定値が３であれば、差分が所定値より大きくなるため、ＣＰＵ＃０は、同期命令に偏りがあることを示す情報を取得する。続けて、ＣＰＵ＃０は、ＣＰＵ＃０のレジスタを共有元として、ＣＰＵ＃１、ＣＰＵ＃２がレジスタＩ／Ｆ１０２を通してＣＰＵ＃０のレジスタにアクセスするようにレジスタＩ／Ｆ１０２＃０〜レジスタＩ／Ｆ１０２＃２に通知する。 Regarding the thread A_0, since the number of synchronization notifications of the thread A_0 is 6 and the number of synchronization waits is 0, the difference is 6, and if the predetermined value is 3, the difference becomes larger than the predetermined value. In # 0, information indicating that the synchronization command is biased is acquired. Subsequently, the CPU # 0 uses the register of the CPU # 0 as a sharing source, so that the CPU # 1 and the CPU # 2 access the register of the CPU # 0 through the register I / F 102. / F102 # 2 is notified.

このように、マルチコアプロセッサシステム１００は、ＣＰＵ＃０〜ＣＰＵ＃２のうち同期通知を実行するＣＰＵ＃０のレジスタをＣＰＵ＃１とＣＰＵ＃２に共有させることにより、ＣＰＵ＃０〜ＣＰＵ＃２にてスレッドＡ＿０〜スレッドＡ＿２を実行する。以下、図１で示したレジスタの利用方法を、共有方法と呼称する。共有方法にて、共有元となるＣＰＵの処理は、自身のレジスタにアクセスするため速くなり、他のＣＰＵは、バス１０１を介して共有元となるＣＰＵにアクセスするため遅くなる。 As described above, the multi-core processor system 100 causes the CPU # 1 and the CPU # 2 to share the register of the CPU # 0 that executes the synchronization notification among the CPU # 0 to the CPU # 2, thereby the CPU # 0 to the CPU # 2. Thread A_0 to thread A_2 are executed. Hereinafter, the register utilization method shown in FIG. 1 is referred to as a sharing method. In the sharing method, the processing of the CPU that becomes the sharing source becomes faster because it accesses its own register, and the other CPUs become slower because they access the CPU that becomes the sharing source via the bus 101.

図１の状態では、同期通知を行うＣＰＵ＃０の処理が高速となるため、同期待ちを行うＣＰＵ＃１、ＣＰＵ＃２の待ち時間が減少し、ＣＰＵ＃０〜ＣＰＵ＃２全体の利用効率が向上する。 In the state of FIG. 1, since the processing of CPU # 0 that performs synchronization notification becomes high speed, the waiting time of CPU # 1 and CPU # 2 that wait for synchronization decreases, and the overall utilization efficiency of CPU # 0 to CPU # 2 Will improve.

図２は、同期通知数と同期待ち数に偏りがないスレッドの割当例を示す説明図である。図２で示すマルチコアプロセッサシステム１００にて、ＣＰＵ＃０は、ＣＰＵ＃０〜ＣＰＵ＃２に割り当てられるスレッドＢ＿０〜スレッドＢ＿２に関する同期通知数と同期待ち数との差分に基づいた値が所定値以下であることを示す情報を取得する。同期通知数と同期待ち数との差分に基づいた値が所定値以下であることを示す情報を、以下の記述では、同期命令に偏りがないことを示す情報と呼称する。このとき、スレッドＢ＿０〜スレッドＢ＿２は、細粒度並列処理を行うことを前提としており、ＣＰＵ＃０〜ＣＰＵ＃２のレジスタ値を共有することが要求される。 FIG. 2 is an explanatory diagram showing an example of thread allocation in which the number of synchronization notifications and the number of synchronization waiting are not biased. In the multi-core processor system 100 shown in FIG. 2, the CPU # 0 has a value based on the difference between the number of synchronization notifications and the number of synchronization waiting for the thread B_0 to thread B_2 assigned to the CPU # 0 to CPU # 2 equal to or less than a predetermined value Information indicating that is. Information indicating that the value based on the difference between the number of synchronization notifications and the number of synchronization waits is equal to or less than a predetermined value is referred to as information indicating that there is no bias in the synchronization command in the following description. At this time, thread B_0 to thread B_2 are premised on performing fine-grain parallel processing, and are required to share the register values of CPU # 0 to CPU # 2.

スレッドＢ＿０の同期通知数が３であり、同期待ち数が３となることから、差分が０となり、さらに、所定値が３であれば、差分が所定値以下となるため、ＣＰＵ＃０は、同期命令に偏りがないことを示す情報を取得する。また、ＣＰＵ＃０は、スレッドＢ＿１、スレッドＢ＿２に関する同期命令に偏りがないことを示す情報を取得する。続けて、ＣＰＵ＃０は、各ＣＰＵが自身のレジスタの値が更新される都度、他のＣＰＵのレジスタに複写するように、レジスタＩ／Ｆ１０２＃０〜レジスタＩ／Ｆ１０２＃２に通知する。 Since the number of synchronization notifications for thread B_0 is 3 and the number of synchronization waits is 3, the difference is 0, and if the predetermined value is 3, the difference is less than or equal to the predetermined value. Information indicating that there is no bias in the synchronization command is acquired. In addition, the CPU # 0 acquires information indicating that there is no bias in the synchronization commands regarding the thread B_1 and the thread B_2. Subsequently, the CPU # 0 notifies the register I / F 102 # 0 to the register I / F 102 # 2 so that each CPU updates the value of its own register so that it is copied to the register of another CPU.

このように、マルチコアプロセッサシステム１００は、ＣＰＵ＃０〜ＣＰＵ＃２のうちいずれかのＣＰＵのレジスタの値が更新される都度、他のＣＰＵのレジスタに複写することにより、ＣＰＵ＃０〜ＣＰＵ＃２にてスレッドＢ＿０〜スレッドＢ＿２を実行する。以下、図２で示したレジスタの利用方法を、複写方法と呼称する。複写方法では、レジスタの読込時には、複写が発生しないため、高速に処理が行え、レジスタ書き込み時には、複写が発生するため、処理は遅くなる。また、複写方法では、各ＣＰＵの処理能力は同一となる。 As described above, the multi-core processor system 100 copies the CPU # 0 to CPU # 2 by copying it to the register of another CPU every time the value of the register of any of the CPUs # 0 to CPU # 2 is updated. In thread 2, thread B_0 to thread B_2 are executed. Hereinafter, the register usage method shown in FIG. 2 is referred to as a copying method. In the copying method, copying does not occur when the register is read, so that processing can be performed at high speed, and copying occurs when writing to the register, and processing is slow. In the copying method, the processing capability of each CPU is the same.

図２の状態では、同期命令に偏りがない状態で、各ＣＰＵの処理能力が同一であるため、同期待ち時間が減少し、結果、全体での利用効率を向上することができる。図１、図２で示したように、マルチコアプロセッサシステム１００は、スレッドに同期命令の偏りがある場合、ＣＰＵの処理能力に偏りのある共有方法を用い、スレッドに同期命令の偏りがない場合、ＣＰＵの処理能力に偏りのない複写方法を用いる。このように、各スレッドの同期命令の偏りと、各ＣＰＵの処理能力の偏りを一致させることで、全体の処理能力を向上することができる。以下、図１、図２で示したように動作するマルチコアプロセッサシステム１００について、図３〜図１８を用いて説明する。 In the state of FIG. 2, since the processing capability of each CPU is the same in the state where there is no bias in the synchronization command, the synchronization waiting time is reduced, and as a result, the overall utilization efficiency can be improved. As shown in FIGS. 1 and 2, the multi-core processor system 100 uses a sharing method in which the processing capacity of the CPU is biased when the thread has a bias in the synchronous instruction, and the thread has no bias in the synchronous instruction. Use a copying method that does not bias the processing power of the CPU. In this way, the overall processing capability can be improved by matching the bias of the synchronization instruction of each thread with the bias of the processing capability of each CPU. The multi-core processor system 100 that operates as shown in FIGS. 1 and 2 will be described below with reference to FIGS.

（マルチコアプロセッサシステム１００のハードウェア）
図３は、マルチコアプロセッサシステムのハードウェア例を示すブロック図である。本実施の形態におけるマルチコアプロセッサシステム１００は、携帯電話などの携帯端末を想定している。図３において、マルチコアプロセッサシステム１００は、ＣＰＵｓ３０１と、ＲＯＭ（Ｒｅａｄ‐ＯｎｌｙＭｅｍｏｒｙ）３０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）３０３と、を含む。また、マルチコアプロセッサシステム１００は、フラッシュＲＯＭ３０４と、フラッシュＲＯＭコントローラ３０５と、フラッシュＲＯＭ３０６と、を含む。また、マルチコアプロセッサシステム１００は、ユーザやその他の機器との入出力装置として、ディスプレイ３０７と、Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）３０８と、キーボード３０９と、を含む。また、各部はバス１０１によってそれぞれ接続されている。 (Hardware of the multi-core processor system 100)
FIG. 3 is a block diagram illustrating a hardware example of the multi-core processor system. Multi-core processor system 100 in the present embodiment assumes a mobile terminal such as a mobile phone. In FIG. 3, the multi-core processor system 100 includes CPUs 301, a ROM (Read-Only Memory) 302, and a RAM (Random Access Memory) 303. The multi-core processor system 100 includes a flash ROM 304, a flash ROM controller 305, and a flash ROM 306. The multi-core processor system 100 includes a display 307, an I / F (Interface) 308, and a keyboard 309 as input / output devices for a user and other devices. Each unit is connected by a bus 101.

ここで、ＣＰＵｓ３０１は、マルチコアプロセッサシステム１００の全体の制御を司る。ＣＰＵｓ３０１は、ＣＰＵ＃０〜ＣＰＵ＃２を含む。また、マルチコアプロセッサシステム１００に含まれるＣＰＵは、２つ以上であればよい。また、ＣＰＵｓ３０１は、専用のキャッシュメモリを有してもよい。また、マルチコアプロセッサシステム１００は、複数のコアを含むマルチコアプロセッサシステムであってもよい。なお、マルチコアプロセッサシステムとは、コアが複数搭載されたプロセッサを含むコンピュータのシステムである。コアが複数搭載されていれば、複数のコアが搭載された単一のプロセッサでもよく、シングルコアのプロセッサが並列されているプロセッサ群でもよい。なお、本実施の形態では、シングルコアのプロセッサであるＣＰＵが並列されている形態を例にあげて説明する。 Here, the CPUs 301 are responsible for overall control of the multi-core processor system 100. The CPUs 301 includes CPU # 0 to CPU # 2. Further, the number of CPUs included in the multi-core processor system 100 may be two or more. The CPUs 301 may have a dedicated cache memory. The multicore processor system 100 may be a multicore processor system including a plurality of cores. The multi-core processor system is a computer system including a processor having a plurality of cores. If a plurality of cores are mounted, a single processor having a plurality of cores may be used, or a processor group in which single core processors are arranged in parallel may be used. In the present embodiment, an example in which CPUs that are single-core processors are arranged in parallel will be described.

ＲＯＭ３０２は、ブートプログラムなどのプログラムを記憶している。ＲＡＭ３０３は、ＣＰＵｓ３０１のワークエリアとして使用される。フラッシュＲＯＭ３０４は、読出し速度が高速なフラッシュＲＯＭであり、たとえば、ＮＯＲ型フラッシュメモリである。たとえば、フラッシュＲＯＭ３０４は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）などのシステムソフトウェアやアプリなどを記憶している。たとえば、ＯＳを更新する場合、マルチコアプロセッサシステム１００は、Ｉ／Ｆ３０８によって新しいＯＳを受信し、フラッシュＲＯＭ３０４に格納されている古いＯＳを、受信した新しいＯＳに更新する。 The ROM 302 stores a program such as a boot program. The RAM 303 is used as a work area for the CPUs 301. The flash ROM 304 is a flash ROM having a high reading speed, and is, for example, a NOR flash memory. For example, the flash ROM 304 stores system software such as an OS (Operating System), applications, and the like. For example, when updating the OS, the multi-core processor system 100 receives the new OS through the I / F 308 and updates the old OS stored in the flash ROM 304 to the received new OS.

フラッシュＲＯＭコントローラ３０５は、ＣＰＵｓ３０１の制御に従ってフラッシュＲＯＭ３０６に対するデータのリード／ライトを制御する。フラッシュＲＯＭ３０６は、データの保存、運搬を主に目的としたフラッシュＲＯＭであり、たとえば、ＮＡＮＤ型フラッシュメモリである。フラッシュＲＯＭ３０６は、フラッシュＲＯＭコントローラ３０５の制御で書き込まれたデータを記憶する。データの具体例としては、マルチコアプロセッサシステム１００を使用するユーザがＩ／Ｆ３０８を通して取得した画像データ、映像データや、また本実施の形態にかかるレジスタ利用方法を実行するプログラムなどである。フラッシュＲＯＭ３０６は、たとえば、メモリカード、ＳＤカードなどを採用することができる。 The flash ROM controller 305 controls reading / writing of data with respect to the flash ROM 306 according to the control of the CPUs 301. The flash ROM 306 is a flash ROM mainly intended for data storage and transportation, and is, for example, a NAND flash memory. The flash ROM 306 stores data written under the control of the flash ROM controller 305. Specific examples of the data include image data and video data acquired by the user using the multi-core processor system 100 through the I / F 308, and a program for executing the register using method according to the present embodiment. As the flash ROM 306, for example, a memory card, an SD card, or the like can be adopted.

ディスプレイ３０７は、カーソル、アイコンあるいはツールボックスを始め、文書、画像、機能情報などのデータを表示する。ディスプレイ３０７は、たとえば、ＴＦＴ（ＴｈｉｎＦｉｌｍＴｒａｎｓｉｓｔｏｒ）液晶ディスプレイなどを採用することができる。 The display 307 displays data such as a document, an image, and function information including a cursor, an icon, or a tool box. As the display 307, for example, a TFT (Thin Film Transistor) liquid crystal display can be adopted.

Ｉ／Ｆ３０８は、通信回線を通じてＬＡＮ、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、インターネットなどのネットワーク３１０に接続され、ネットワーク３１０を介して他の装置に接続される。そして、Ｉ／Ｆ３０８は、ネットワーク３１０と内部のインターフェースを司り、外部装置からのデータの入出力を制御する。Ｉ／Ｆ３０８には、たとえばモデムやＬＡＮアダプタなどを採用することができる。 The I / F 308 is connected to a network 310 such as a LAN, a WAN (Wide Area Network), and the Internet through a communication line, and is connected to another device via the network 310. The I / F 308 controls an internal interface with the network 310 and controls input / output of data from an external device. For example, a modem or a LAN adapter may be employed as the I / F 308.

キーボード３０９は、数字、各種指示などの入力のためのキーを有し、データの入力を行う。また、キーボード３０９は、タッチパネル式の入力パッドやテンキーなどであってもよい。 The keyboard 309 has keys for inputting numbers, various instructions, and the like, and inputs data. The keyboard 309 may be a touch panel type input pad or a numeric keypad.

図４は、同期命令の種別の一例についての説明図である。符号４０１で示す図は、スレッドＡ＿０とスレッドＡ＿１の実行コードの一例を示しており、符号４０２で示す図は、スレッドＡ＿０とスレッドＡ＿１の実行結果を示しており、表４０３は、符号４０１、符号４０２から示される同期命令の特徴について示している。スレッドＡ＿０とスレッドＡ＿１にて、レジスタＲ１が共に使用されており、レジスタＲ１に対する書込と読込の順序が変更されないようにするため、同期命令が挿入されている。 FIG. 4 is an explanatory diagram of an example of the type of synchronization instruction. The diagram denoted by reference numeral 401 shows an example of the execution codes of the threads A_0 and A_1. The diagram denoted by reference numeral 402 shows the execution results of the threads A_0 and A_1. The characteristics of the synchronization command shown from 402 are shown. Both the thread A_0 and the thread A_1 use the register R1, and a synchronization instruction is inserted so that the order of writing to and reading from the register R1 is not changed.

なお、以下の説明において、実行コード内での同期命令の位置を同期ポイントと定義する。また、同期命令を実行可能な位置に到達した場合を、同期ポイントに到達したと呼称する。また、同期命令のうちバリア同期は、特定のグループに含まれるスレッドが全て同期ポイントに到着した際に、次の処理に進む機能を有する。この特定のグループのことを、同期グループと定義する。また、同期命令には、同期通知、同期待ち、バリア同期が存在する。 In the following description, the position of the synchronization instruction in the execution code is defined as a synchronization point. Further, when the position where the synchronization command can be executed is reached, it is called that the synchronization point has been reached. Also, the barrier synchronization of the synchronization commands has a function of proceeding to the next processing when all the threads included in the specific group arrive at the synchronization point. This specific group is defined as a synchronization group. The synchronization command includes synchronization notification, synchronization wait, and barrier synchronization.

初めに、スレッドＡ＿０を実行するＣＰＵ＃０は、時刻ｔ０にて、先行命令として、レジスタＲ２とレジスタＲ３の和をレジスタＲ１に書き込み、時刻ｔ２にて同期通知であるｓｙｎｃｓ命令をＣＰＵ＃１を通知先として実行する。また、スレッドＡ＿１を実行する実行するＣＰＵ＃１は、時刻ｔ０にて、先行命令を実行し、時刻ｔ２より早い時刻である時刻ｔ１にて、同期待ちであるｓｙｎｃｒ命令をＣＰＵ＃０を通知元として実行する。時刻ｔ１では、ＣＰＵ＃０が同期ポイントに到達していないため、ＣＰＵ＃１は、同期通知を受け付けるまで待機する。同期通知を完了した時刻ｔ３にて、ＣＰＵ＃０は、後続命令を実行し、同時刻にて、ＣＰＵ＃１も、同期待ちを終了し、後続命令を実行する。 First, the CPU # 0 that executes the thread A_0 writes the sum of the register R2 and the register R3 to the register R1 as a preceding instruction at the time t0, and sends the syncs instruction that is a synchronization notification to the CPU # 1 at the time t2. Execute as a notification destination. The CPU # 1, which executes the thread A_1, executes the preceding instruction at time t0, and notifies the CPU # 0 of the syncr instruction waiting for synchronization at time t1, which is earlier than time t2. Run as. At time t1, since CPU # 0 has not reached the synchronization point, CPU # 1 waits until a synchronization notification is received. At time t3 when the synchronization notification is completed, CPU # 0 executes the subsequent instruction, and at the same time, CPU # 1 also finishes waiting for synchronization and executes the subsequent instruction.

次に、ＣＰＵ＃０は、時刻ｔ４にて、バリア同期であるｓｙｎｃａ命令を実行する。時刻ｔ４の時点では、ＣＰＵ＃１が同期ポイントに到達していないため、ＣＰＵ＃０は、ＣＰＵ＃１が同期ポイントに到達するまで待機する。時刻ｔ５にて、ＣＰＵ＃１がｓｙｎｃａ命令を実行する。 Next, CPU # 0 executes a synca instruction which is barrier synchronization at time t4. At time t4, since CPU # 1 has not reached the synchronization point, CPU # 0 waits until CPU # 1 reaches the synchronization point. At time t5, CPU # 1 executes a synca instruction.

このように、符号４０１、符号４０２で示したように、同期通知を含む一連の処理として、ＣＰＵは、先行命令が終了した後、同期通知を実行し、同期通知が終了した後に、後続命令を実行する。したがって、表４０３で示すように、同期通知を実行するＣＰＵは、同期待ち側を待たなくてよい。 As described above, as indicated by reference numerals 401 and 402, as a series of processes including the synchronization notification, the CPU executes the synchronization notification after the preceding instruction ends, and after the synchronization notification ends, Run. Therefore, as shown in Table 403, the CPU executing the synchronization notification does not have to wait for the synchronization waiting side.

同様に、同期待ちを含む一連の処理として、ＣＰＵは、先行命令が終了した後、同期待ちを実行し、同期通知を受け付けた後、後続命令を実行する。したがって、同期待ちを実行するＣＰＵは、既に同期通知を受け付けていれば、待たなくてよい。 Similarly, as a series of processing including synchronization waiting, the CPU executes synchronization waiting after completion of the preceding instruction, and executes subsequent instructions after receiving the synchronization notification. Therefore, the CPU executing the waiting for synchronization does not have to wait if the synchronization notification has already been received.

同様に、バリア同期を含む一連の処理として、ＣＰＵは、先行命令が終了した後、同期グループに属するＣＰＵが全て同期ポイントに到達した場合、後続命令を実行する。したがって、バリア同期を実行するＣＰＵは、同一の同期グループに属する他のＣＰＵが同時に同期ポイントに到達すれば、待たなくてよい。 Similarly, as a series of processes including barrier synchronization, the CPU executes the subsequent instruction when all the CPUs belonging to the synchronization group have reached the synchronization point after the preceding instruction is completed. Therefore, the CPU executing the barrier synchronization does not have to wait if other CPUs belonging to the same synchronization group reach the synchronization point at the same time.

（マルチコアプロセッサシステム１００の機能）
次に、マルチコアプロセッサシステム１００の機能について説明する。図５は、マルチコアプロセッサシステムの機能例を示すブロック図である。マルチコアプロセッサシステム１００は、スケジューラ５０１と、レジスタ利用ライブラリ５０２と、ディスパッチャ５０３と、を有する。 (Functions of the multi-core processor system 100)
Next, functions of the multi-core processor system 100 will be described. FIG. 5 is a block diagram illustrating an example of functions of the multi-core processor system. The multi-core processor system 100 includes a scheduler 501, a register use library 502, and a dispatcher 503.

また、マルチコアプロセッサシステム１００は、検出部５１１と、更新部５１２と、取得部５１３と、判断部５１４と、特定部５１５と、通知部５１６と、実行部５１７と、割当部５１８とを含む。制御部となる機能（検出部５１１〜割当部５１８）は、記憶装置に記憶されたプログラムをＣＰＵｓ３０１のうちのいずれかのＣＰＵが実行することにより、その機能を実現する。記憶装置とは、具体的には、たとえば、図３に示したＲＯＭ３０２、ＲＡＭ３０３、フラッシュＲＯＭ３０４、フラッシュＲＯＭ３０６などである。または、Ｉ／Ｆ３０８を経由して他のＣＰＵが実行することにより、その機能を実現してもよい。 The multi-core processor system 100 includes a detection unit 511, an update unit 512, an acquisition unit 513, a determination unit 514, a specification unit 515, a notification unit 516, an execution unit 517, and an allocation unit 518. The functions (detecting unit 511 to allocating unit 518) serving as the control unit realize the functions by executing a program stored in the storage device by any one of the CPUs 301. Specifically, the storage device is, for example, the ROM 302, the RAM 303, the flash ROM 304, the flash ROM 306, etc. shown in FIG. Alternatively, the function may be realized by being executed by another CPU via the I / F 308.

また、図５では各機能部が、ＣＰＵ＃０の機能であるように図示しているが、ＣＰＵ＃１、ＣＰＵ＃２の機能であってもよい。また、検出部５１１〜通知部５１６は、レジスタ利用ライブラリ５０２の機能であり、実行部５１７は、レジスタＩ／Ｆ１０２の機能であり、割当部５１８は、ディスパッチャ５０３の機能である。 Further, in FIG. 5, each functional unit is illustrated as being a function of CPU # 0, but may be a function of CPU # 1 and CPU # 2. The detection unit 511 to the notification unit 516 are functions of the register use library 502, the execution unit 517 is a function of the register I / F 102, and the allocation unit 518 is a function of the dispatcher 503.

また、マルチコアプロセッサシステム１００は、プロファイル情報５２１にアクセス可能である。プロファイル情報５２１の詳細については、図６にて後述する。プロファイル情報５２１は、ＲＡＭ３０３、フラッシュＲＯＭ３０４、フラッシュＲＯＭ３０６等に存在する。 The multi-core processor system 100 can access the profile information 521. Details of the profile information 521 will be described later with reference to FIG. The profile information 521 exists in the RAM 303, the flash ROM 304, the flash ROM 306, and the like.

スケジューラ５０１は、マルチコアプロセッサシステム１００内で実行されるスレッドを各ＣＰＵに割り当て、次に実行するスレッドを選択する機能を有する。たとえば、スケジューラ５０１は、スレッドＡ＿０をＣＰＵ＃０に割り当て、スレッドＡ＿１をＣＰＵ＃１に割り当てる。 The scheduler 501 has a function of assigning a thread to be executed in the multi-core processor system 100 to each CPU and selecting a thread to be executed next. For example, the scheduler 501 assigns the thread A_0 to the CPU # 0 and assigns the thread A_1 to the CPU # 1.

レジスタ利用ライブラリ５０２は、スケジューラ５０１からのスレッド割当通知を受け付けると、レジスタ共有方法のうち共有方法か複写方法のいずれかを用いるか、またはレジスタ共有を行わないか、ということをレジスタＩ／Ｆ１０２に通知する。また、レジスタ利用ライブラリ５０２は、ディスパッチャ５０３に、スレッドの割当に変更がない場合、スケジューラ５０１から受けたスレッド割当通知をそのまま通知し、変更がある場合、変更されたスレッド割当通知を通知する。 Upon receipt of the thread allocation notification from the scheduler 501, the register use library 502 informs the register I / F 102 whether to use either the sharing method or the copying method among the register sharing methods, or not to share the registers. Notice. Further, the register use library 502 notifies the dispatcher 503 of the thread allocation notification received from the scheduler 501 as it is when there is no change in the thread allocation, and notifies the changed thread allocation notification when there is a change.

ディスパッチャ５０３は、現在動作中のスレッドに対して、スケジューラ５０１およびレジスタ利用ライブラリ５０２によって決定した次のスレッドに切り替える機能を有する。たとえば、ディスパッチャ５０３は、ＣＰＵ＃０で実行していたスレッドＡ＿０からスレッドＢ＿０に切り替える場合、スレッドＡ＿０のプログラムカウンタ等を含むレジスタ情報を退避する。退避後、ディスパッチャ５０３は退避されてあったスレッドＢ＿０のレジスタ情報を復帰する。復帰後、ディスパッチャ５０３は、スレッドＢ＿０の処理を前回の切り替えられた時点から継続することができる。 The dispatcher 503 has a function of switching the currently operating thread to the next thread determined by the scheduler 501 and the register use library 502. For example, when switching from the thread A_0 executed by the CPU # 0 to the thread B_0, the dispatcher 503 saves register information including a program counter of the thread A_0. After saving, the dispatcher 503 restores the saved register information of the thread B_0. After returning, the dispatcher 503 can continue the processing of the thread B_0 from the time when it was switched last time.

検出部５１１は、スレッドが複数のコアのいずれかのコアに割り当てられることを検出する機能を有する。たとえば、検出部５１１は、スレッドＡ＿０がＣＰＵ＃０に割り当てられることを検出する。 The detection unit 511 has a function of detecting that a thread is assigned to any one of a plurality of cores. For example, the detection unit 511 detects that the thread A_0 is assigned to the CPU # 0.

また、検出部５１１は、いずれかのスレッドにて同期待ちが完了したことを検出してもよい。たとえば、検出部５１１は、実行中のスレッドＡ＿１にて、同期待ちが完了したことを検出する。また、検出対象は、同期通知、バリア同期が含まれてもよい。なお、検出結果は、ＲＡＭ３０３、フラッシュＲＯＭ３０４、フラッシュＲＯＭ３０６などの記憶領域に記憶される。 Further, the detection unit 511 may detect that the synchronization wait has been completed in any thread. For example, the detection unit 511 detects that the synchronization waiting has been completed in the thread A_1 being executed. The detection target may include synchronization notification and barrier synchronization. The detection result is stored in a storage area such as the RAM 303, the flash ROM 304, and the flash ROM 306.

更新部５１２は、検出部５１１によっていずれかのスレッドにて同期待ちが完了したことを検出した場合、スレッドに関する同期通知数と同期待ち数とを更新する機能を有する。たとえば、スレッドＡ＿０のプロファイル情報５２１が同期通知数：６、同期待ち数：０、スレッドＡ＿１のプロファイル情報５２１が同期通知数：０、同期待ち数：６とする。この状態から、さらに、たとえば、スレッドＡ＿０から発行された同期通知をスレッドＡ＿１にて同期待ちが完了したことを検出部５１１が検出した場合とする。このとき、更新部５１２は、スレッドＡ＿０のプロファイル情報５２１を、同期通知数：５、同期待ち数：０、スレッドＡ＿１のプロファイル情報５２１を、同期通知数：０、同期待ち数：５に更新する。 The update unit 512 has a function of updating the number of synchronization notifications and the number of synchronization waits related to a thread when the detection unit 511 detects that synchronization wait has been completed in any thread. For example, the profile information 521 of the thread A_0 is the number of synchronization notifications: 6, the number of synchronization waits: 0, and the profile information 521 of the thread A_1 is the number of synchronization notifications: 0, the number of synchronization waits: 6. In this state, for example, it is assumed that the detection unit 511 detects that the synchronization notification issued from the thread A_0 is completed in the thread A_1. At this time, the update unit 512 updates the profile information 521 of the thread A_0 to the synchronization notification count: 5, the synchronization wait count: 0, and the thread A_1 profile information 521 to the synchronization notification count: 0 and the synchronization wait count: 5. .

取得部５１３は、複数のコアのそれぞれに割り当てられるスレッドのうち少なくともいずれか一つのスレッドに関する同期通知数と同期待ち数との差分に基づいた値が所定値より大きいことを示す情報を取得する機能を有する。同期通知数と同期待ち数との差分に基づいた値が所定値より大きいことを示す情報とは、同期命令に偏りがあることを示す情報のことであり、同期命令に偏りがあることを示す情報は、プロファイル情報５２１に記録されている。プロファイル情報５２１には、同期命令に偏りがあることを示す識別子が格納されていてもよいし、同期通知数と同期待ち数の各値が格納されていてもよい。また、同期命令に偏りがあることを示す識別子は、マルチコアプロセッサシステム１００の設計者によって設定されていてもよい。 The acquisition unit 513 has a function of acquiring information indicating that a value based on a difference between the number of synchronization notifications and the number of synchronization waits for at least one of the threads allocated to each of the plurality of cores is greater than a predetermined value. Have The information indicating that the value based on the difference between the number of synchronization notifications and the number of waiting for synchronization is larger than a predetermined value is information indicating that the synchronization command is biased and indicates that the synchronization command is biased Information is recorded in profile information 521. The profile information 521 may store an identifier indicating that the synchronization command is biased, or may store values of the number of synchronization notifications and the number of synchronization waits. The identifier indicating that the synchronization instruction is biased may be set by the designer of the multi-core processor system 100.

また、スレッドに関する同期命令に偏りがある情報とは、スレッド内のプログラムに記述された同期命令に偏りがあることを示す情報である。したがって、プロファイル情報５２１は、同期命令に偏りがあることを示す情報をスレッドごとに記憶する。たとえば、取得部５１３は、スレッドＡ＿０に関する同期命令に偏りがある情報を取得する。 Further, information indicating that the synchronization instruction related to the thread is biased is information indicating that the synchronization instruction described in the program in the thread is biased. Therefore, the profile information 521 stores information indicating that the synchronization command is biased for each thread. For example, the acquisition unit 513 acquires information that has a bias in the synchronization command regarding the thread A_0.

また、取得部５１３は、複数のコアのそれぞれに割り当てられるスレッドのいずれのスレッドについても同期通知数と同期待ち数との差分に基づいた値が所定値以下であることを示す情報を取得してもよい。同期通知数と同期待ち数との差分に基づいた値が所定値以下であることを示す情報とは、同期命令に偏りがないことを示す情報のことであり、同期命令に偏りがないことを示す情報は、プロファイル情報５２１に記録されている。たとえば、プロファイル情報５２１には、同期命令に偏りがないことを示す識別子が格納されている。たとえば、取得部５１３は、スレッドＡ＿０〜スレッドＡ＿２のプロファイル情報５２１として、全てスレッドにて同期命令に偏りがないことを示す情報を取得する。なお、取得されたプロファイル情報５２１、または、プロファイル情報５２１へのポインタは、ＲＡＭ３０３、フラッシュＲＯＭ３０４、フラッシュＲＯＭ３０６などの記憶領域に記憶される。 In addition, the acquisition unit 513 acquires information indicating that the value based on the difference between the number of synchronization notifications and the number of synchronization waits is less than or equal to a predetermined value for any of the threads assigned to each of the plurality of cores. Also good. Information indicating that the value based on the difference between the number of synchronization notifications and the number of synchronization waits is equal to or less than a predetermined value is information indicating that there is no bias in the synchronization command, and that there is no bias in the synchronization command. The information shown is recorded in the profile information 521. For example, the profile information 521 stores an identifier indicating that there is no bias in the synchronization command. For example, the acquisition unit 513 acquires information indicating that there is no bias in the synchronization command in all threads as the profile information 521 of the thread A_0 to the thread A_2. The acquired profile information 521 or a pointer to the profile information 521 is stored in a storage area such as the RAM 303, the flash ROM 304, and the flash ROM 306.

判断部５１４は、複数のコアのそれぞれに割り当てられるスレッドのうち少なくともいずれか一つのスレッドに関する同期通知数と同期待ち数との差分に基づいた値が所定値より大きいか否かを判断する機能を有する。 The determination unit 514 has a function of determining whether or not a value based on a difference between the number of synchronization notifications related to at least one of the threads assigned to each of the plurality of cores and the number of synchronization waits is greater than a predetermined value. Have.

具体的な判断方法として、たとえば、判断部５１４は、同期通知数と同期待ち数の差分の絶対値が所定値より大きいか否かを判断する。また、判断部５１４は、同期通知数と同期待ち数の差分の絶対値を、同期命令の総数で除した値が所定値より大きいか否かを判断してもよい。 As a specific determination method, for example, the determination unit 514 determines whether or not the absolute value of the difference between the synchronization notification count and the synchronization wait count is greater than a predetermined value. Further, the determination unit 514 may determine whether or not a value obtained by dividing the absolute value of the difference between the number of synchronization notifications and the number of synchronization waits by the total number of synchronization instructions is greater than a predetermined value.

また、判断部５１４は、検出部５１１によってスレッドが割り当てられることが検出された場合、または更新部５１２によってスレッドに関する偏りを示す情報が更新された場合に、差分に基づいた値が所定値より大きいか否かを判断してもよい。なお、判断結果は、ＲＡＭ３０３、フラッシュＲＯＭ３０４、フラッシュＲＯＭ３０６などの記憶領域に記憶される。 Further, the determination unit 514 has a value based on the difference larger than a predetermined value when the detection unit 511 detects that a thread is allocated or when the update unit 512 updates information indicating a bias regarding the thread. It may be determined whether or not. The determination result is stored in a storage area such as the RAM 303, the flash ROM 304, and the flash ROM 306.

特定部５１５は、実行部５１７がレジスタを共有させることによりスレッドを実行する場合、スレッドに関する同期通知数と同期待ち数との差の大きさに基づいて、スレッドを特定する機能を有する。具体的な特定方法として、たとえば、特定部５１５は、同期通知数と同期待ち数の差分が最大となるスレッドを特定してもよい。または、特定部５１５は、同期通知数と同期待ち数の差が所定値以上となるスレッドのうちいずれかのスレッドを特定してもよい。または、特定部５１５は、同期通知数と同期待ち数の差を同期命令の総数で除した値が最大となるスレッドを特定してもよい。なお、特定されたスレッドの情報は、ＲＡＭ３０３、フラッシュＲＯＭ３０４、フラッシュＲＯＭ３０６などの記憶領域に記憶される。 The specifying unit 515 has a function of specifying a thread based on the difference between the number of synchronization notifications related to the thread and the number of synchronization waits when the execution unit 517 executes a thread by sharing a register. As a specific specifying method, for example, the specifying unit 515 may specify a thread that maximizes the difference between the synchronization notification count and the synchronization wait count. Alternatively, the specifying unit 515 may specify one of the threads in which the difference between the number of synchronization notifications and the number of synchronization waits is a predetermined value or more. Alternatively, the specifying unit 515 may specify a thread having a maximum value obtained by dividing the difference between the synchronization notification count and the synchronization wait count by the total number of synchronization instructions. Note that the information of the identified thread is stored in a storage area such as the RAM 303, the flash ROM 304, and the flash ROM 306.

通知部５１６は、取得部５１３によって差分に基づいた値が所定値より大きいことを示す情報が取得された場合、複数のコアのうち同期通知を実行するコアのレジスタを他のコアに共有させる共有方法を用いることをレジスタＩ／Ｆ１０２に通知する機能を有する。また、通知部５１６は、判断部５１４によって差分に基づいた値が所定値より大きいと判断された場合に、共有方法を用いることを通知してもよい。また、通知部５１６は、特定部５１５によって特定されたスレッドを、複数のコアのうちレジスタの共有元となるコアに割り当てるように割当部５１８に通知してもよい。 When the information indicating that the value based on the difference is larger than the predetermined value is acquired by the acquisition unit 513, the notification unit 516 is configured to share the register of the core that performs synchronization notification among the plurality of cores with other cores. It has a function of notifying the register I / F 102 that the method is used. Further, the notification unit 516 may notify the use of the sharing method when the determination unit 514 determines that the value based on the difference is larger than a predetermined value. In addition, the notification unit 516 may notify the allocation unit 518 to allocate the thread specified by the specifying unit 515 to a core that is a register sharing source among a plurality of cores.

また、通知部５１６は、取得部５１３によって同期命令に偏りがないというプロファイル情報５２１が取得された場合、複数のコアのうちいずれかのコアのレジスタの値が更新される都度、他のコアのレジスタに複写する複写方法を用いることを通知してもよい。また、通知部５１６は、判断部５１４によっていずれのスレッドについても同期命令に偏りがないと判断された場合に、複写方法を用いることを通知してもよい。 In addition, when the acquisition unit 513 acquires profile information 521 that the synchronization command is not biased, the notification unit 516 updates the value of the register of one of the cores every time the value of the register of the other core is updated. It may be notified that a copying method of copying to a register is used. Further, the notification unit 516 may notify the use of the copying method when the determination unit 514 determines that there is no bias in the synchronization command for any thread.

実行部５１７は、通知部５１６から通知された共有方法を用いるか、または複写方法を用いるか、という指示に従って、複数のコアによりスレッドを実行する機能を有する。たとえば、実行部５１７は、共有方法として、複数のコアのうち同期通知を実行するコアのレジスタを他のコアに共有させることにより、スレッドＡ＿０〜スレッドＡ＿２を実行する。また、実行部５１７は、複写方法として、複数のコアのうちいずれかのコアのレジスタの値が更新される都度、他のコアのレジスタに複写することにより、スレッドＢ＿０〜スレッドＢ＿２を実行する。 The execution unit 517 has a function of executing a thread by a plurality of cores in accordance with an instruction whether to use the sharing method notified from the notification unit 516 or the copy method. For example, as a sharing method, the execution unit 517 executes the thread A_0 to the thread A_2 by causing other cores to share a register of a core that performs synchronization notification among a plurality of cores. Further, as a copying method, the execution unit 517 executes the thread B_0 to the thread B_2 by copying to the register of another core every time the value of the register of one of the plurality of cores is updated.

割当部５１８は、通知部５１６から通知された、特定されたスレッドをレジスタの共有元となるコアに割り当てる機能を有する。たとえば、割当部５１８は、スレッドＡ＿０を、レジスタの共有元となるＣＰＵ＃０に割り当てる。 The allocation unit 518 has a function of allocating the identified thread notified from the notification unit 516 to the core that becomes the register sharing source. For example, the assigning unit 518 assigns the thread A_0 to the CPU # 0 that is the register sharing source.

図６は、プロファイル情報の記憶内容の一例を示す説明図である。図６で示すプロファイル情報５２１は、レコード５２１−１〜レコード５２１−９を登録している。プロファイル情報５２１は、スレッドＩＤ、同期命令の総数、同期通知数、同期待ち数、バリア同期数という５つのフィールドを含む。スレッドＩＤフィールドには、対象スレッドを一意に識別する情報が格納される。同期命令の総数フィールドには、対象スレッド内にある同期通知、同期待ち、バリア同期の総数が格納される。同期通知数フィールドには、対象スレッド内にある同期通知数が格納される。同期待ち数フィールドには、対象スレッド内にある同期待ち数が格納される。バリア同期数フィールドには、対象スレッド内にあるバリア同期数が格納される。 FIG. 6 is an explanatory diagram showing an example of the stored contents of profile information. In the profile information 521 shown in FIG. 6, records 521-1 to 521-9 are registered. The profile information 521 includes five fields: thread ID, total number of synchronization instructions, number of synchronization notifications, number of synchronization waits, and number of barrier synchronizations. The thread ID field stores information for uniquely identifying the target thread. The total number of synchronization notifications, synchronization waits, and barrier synchronizations in the target thread is stored in the total number field of synchronization instructions. The synchronization notification number field stores the number of synchronization notifications in the target thread. The synchronization wait number field stores the number of synchronization waits in the target thread. The barrier synchronization number field stores the number of barrier synchronizations in the target thread.

たとえば、レコード５２１−１は、同期命令の総数が６であり、同期通知数が６であり、同期待ち数とバリア同期数が０であることを示している。なお、プロファイル情報５２１は、開発者がプログラムを作成したときに生成してもよいし、スレッド実行前に、ＯＳがスレッドのバイナリプログラムを解析して生成してもよい。 For example, the record 521-1 indicates that the total number of synchronization instructions is 6, the number of synchronization notifications is 6, and the number of synchronization waits and the number of barrier synchronizations are 0. The profile information 521 may be generated when a developer creates a program, or may be generated by the OS analyzing a binary program of a thread before thread execution.

図７は、同期命令に偏りがあるスレッドの実行結果の一例を示す説明図である。符号７０１で示す図は、共有方法を実行しているマルチコアプロセッサシステム１００が、レジスタの共有元となるＣＰＵ＃０に、同期待ちが多いスレッドＡ＿１を割り当て、ＣＰＵ＃１に、同期通知が多いスレッドＡ＿０を割り当てている。また、符号７０２で示す図は、共有方法を実行しているマルチコアプロセッサシステム１００が、レジスタの共有元となるＣＰＵ＃０に、スレッドＡ＿０を割り当て、ＣＰＵ＃１に、スレッドＡ＿１を割り当てている。 FIG. 7 is an explanatory diagram illustrating an example of the execution result of a thread having a biased synchronization instruction. In the diagram indicated by reference numeral 701, the multi-core processor system 100 executing the sharing method assigns a thread A_1 having a lot of synchronization waiting to the CPU # 0 as a register sharing source, and a thread having a lot of synchronization notifications to the CPU # 1. A_0 is assigned. In the diagram indicated by reference numeral 702, the multi-core processor system 100 executing the sharing method assigns the thread A_0 to the CPU # 0 as the register sharing source and assigns the thread A_1 to the CPU # 1.

初めに、符号７０１におけるＣＰＵ＃０は、レジスタの共有元であるため、処理が早く完了し、スレッドＡ＿０からの同期通知を待つことになる。たとえば、ＣＰＵ＃０は、スレッドＡ＿１の処理番号｛１｝を終了した後、ＣＰＵ＃１によるスレッドＡ＿０の処理番号２からの同期通知を待つことになる。処理番号｛３｝、処理番号｛５｝でも同様な現象が発生する。このように、レジスタの共有元となるＣＰＵに、同期待ちが多いスレッドを割り当てると、待ち時間の粒度が小さくなる。 First, since the CPU # 0 in the reference numeral 701 is a register sharing source, the processing is completed quickly, and a synchronization notification from the thread A_0 is awaited. For example, after ending the process number {1} of the thread A_1, the CPU # 0 waits for a synchronization notification from the process number 2 of the thread A_0 by the CPU # 1. A similar phenomenon occurs with process number {3} and process number {5}. In this way, if a thread with a high synchronization wait is assigned to a CPU that is a register sharing source, the latency granularity is reduced.

次に、符号７０２におけるＣＰＵ＃０は、レジスタの共有元であるため、処理が早く完了し、処理番号｛２｝、｛４｝、｛６｝にて、同期通知をＣＰＵ＃１に通知し、待ちとなる。このように、レジスタの共有元となるＣＰＵに、同期通知が多いスレッドを割り当てると、待ち時間の粒度が大きくなる。待ち時間の粒度が大きくなると、ＤＶＦＳ（ＤｙｎａｍｉｃＶｏｌｔａｇｅａｎｄＦｒｅｑｕｅｎｃｙＳｃａｌｉｎｇ）が利用しやすくなり、また、他のアプリの処理を実行しやすくなる。理由として、ＤＶＦＳは、適用が可能な最小時間が存在するため、待ち時間の粒度が小さいと、ＤＶＦＳの適用ができない場合が存在するためである。また、他プロセスの処理に関しては、待ち時間の粒度が小さいと、他プロセスに切り替えるオーバヘッドが増大するためである。 Next, since the CPU # 0 at the reference numeral 702 is the register sharing source, the processing is completed early, and the CPU # 1 is notified of the synchronization with the processing numbers {2}, {4}, and {6}. , Wait. As described above, if a thread with a large number of synchronization notifications is assigned to a CPU that is a register sharing source, the granularity of the waiting time is increased. As the granularity of the waiting time increases, DVFS (Dynamic Voltage and Frequency Scaling) becomes easier to use, and processing of other applications becomes easier to execute. This is because DVFS has a minimum time that can be applied, and DVFS cannot be applied if the latency granularity is small. Also, regarding the processing of other processes, if the granularity of the waiting time is small, the overhead for switching to the other processes increases.

図８は、同期命令に偏りがないスレッドの実行結果の一例を示す説明図である。符号８０１で示す図は、共有方法を実行しているマルチコアプロセッサシステム１００が、レジスタの共有元となるＣＰＵ＃０に、同期命令に偏りがないスレッドＢ＿１を割り当て、ＣＰＵ＃１に、同期命令に偏りがないスレッドＢ＿０を割り当てている。また、符号８０２で示す図は、複写方法を実行しているマルチコアプロセッサシステム１００が、ＣＰＵ＃０に、スレッドＢ＿１を割り当て、ＣＰＵ＃１に、スレッドＢ＿０を割り当てている。 FIG. 8 is an explanatory diagram illustrating an example of the execution result of a thread in which there is no bias in synchronization instructions. In the diagram indicated by reference numeral 801, the multi-core processor system 100 executing the sharing method assigns a thread B — 1 with no bias to the synchronization instruction to the CPU # 0 as the register sharing source, and assigns the synchronization instruction to the CPU # 1. An unbiased thread B_0 is assigned. In the diagram indicated by reference numeral 802, the multi-core processor system 100 executing the copying method assigns thread B_1 to CPU # 0 and assigns thread B_0 to CPU # 1.

初めに、符号８０１におけるＣＰＵ＃０は、レジスタの共有元であるため、処理が早く完了し、スレッドＢ＿０からの同期通知を待つことになる。たとえば、ＣＰＵ＃０は、スレッドＢ＿０の処理番号｛１｝を終了した後、ＣＰＵ＃１によるスレッドＢ＿０の処理番号｛２｝からの同期通知を待つことになる。処理番号｛５｝でも同様な現象が発生する。 First, since the CPU # 0 in the reference numeral 801 is a register sharing source, the processing is completed early and a synchronization notification from the thread B_0 is awaited. For example, after ending the process number {1} of the thread B_0, the CPU # 0 waits for a synchronization notification from the process number {2} of the thread B_0 by the CPU # 1. A similar phenomenon occurs even with process number {5}.

次に、符号８０２におけるＣＰＵ＃０とＣＰＵ＃１は、複写方法であるため、処理速度が同一となるため、同期待ちを行う時間が符号８０１で示す図に比べて短くなる。このように、同期命令に偏りがない場合、複写方法を用いてスレッドを実行することで、ＣＰＵの性能差がなくなり、同期待ち時間が減少するため、マルチコアプロセッサシステム１００は、プロセッサの利用効率を向上できる。 Next, since CPU # 0 and CPU # 1 in the reference numeral 802 are copying methods, the processing speed is the same, so the time for waiting for synchronization is shorter than the figure indicated by the reference numeral 801. As described above, when there is no bias in the synchronization instruction, the thread is executed by using the copying method, so that the CPU performance difference is eliminated and the synchronization waiting time is reduced. Therefore, the multi-core processor system 100 increases the processor utilization efficiency. It can be improved.

図９は、レジスタ値共有方法の判断方法の一例を示す説明図である。図９では、レジスタ値共有方法として利用する方法の判断方法について、共有方法、複写方法のいずれを用いるかの判断方法について説明する。 FIG. 9 is an explanatory diagram illustrating an example of a determination method of the register value sharing method. FIG. 9 illustrates a method for determining which method to use as the register value sharing method, which method to use, the sharing method or the copying method.

マルチコアプロセッサシステム１００は、スレッド群のうち、下記（１）式が満たすスレッドが一つ以上ある場合、共有方法を用いる。 The multi-core processor system 100 uses the sharing method when there is one or more threads that satisfy the following expression (1) in the thread group.

｜（同期通知数−同期待ち数）／同期命令の総数｜＞α …（１） | (Number of synchronization notifications−Number of synchronization waits) / Total number of synchronization instructions |> α (1)

ここで、｜ｘ｜はｘの絶対値を意味しており、αは定数である。図１、図２で示した所定値は、たとえばαとなる。たとえば、α＝０．４である。また、マルチコアプロセッサシステム１００は、下記（２）式が満たされる場合、複写方法を用いる。 Here, | x | means the absolute value of x, and α is a constant. The predetermined value shown in FIGS. 1 and 2 is, for example, α. For example, α = 0.4. The multi-core processor system 100 uses a copying method when the following expression (2) is satisfied.

バリア同期数／同期命令の総数＞β …（２） Number of barrier synchronizations / total number of synchronization instructions> β (2)

ここで、βは定数である。たとえば、β＝０．５である。また、マルチコアプロセッサシステム１００は、共有方法を用いると判断された場合、スレッドごとに下記（３）式で示す評価式を実行し、最も大きい値となったスレッドを、レジスタの共有元になるＣＰＵに割り当てる。 Here, β is a constant. For example, β = 0.5. Further, when it is determined that the sharing method is used, the multi-core processor system 100 executes an evaluation expression represented by the following expression (3) for each thread, and determines the thread having the largest value as the CPU that becomes the register sharing source. Assign to.

（同期通知数−同期待ち数）／同期命令の総数 …（３） (Number of synchronization notifications-number of synchronization waits) / total number of synchronization instructions (3)

また、マルチコアプロセッサシステム１００は、（１）式が満たされた場合に共有方法を用い、（１）式が満たされない場合に複写方法を用いてもよい。また、あるスレッドが（１）式を満たし、他のスレッドが（２）式を満たした場合、マルチコアプロセッサシステム１００は、共有方法を用いる。 Further, the multi-core processor system 100 may use the sharing method when the expression (1) is satisfied, and may use the copying method when the expression (1) is not satisfied. Further, when a certain thread satisfies the expression (1) and another thread satisfies the expression (2), the multi-core processor system 100 uses a sharing method.

以下、図９で示したレジスタ値共有方法の判断方法を実行して、第１のスレッド群〜第３のスレッド群の実行結果を図１０〜図１５にて説明する。スレッド群は、たとえば、それぞれ異なるアプリに属しているとする。たとえば、第１のスレッド群がアプリ１に属し、第２のスレッド群がアプリ２に属し、第３のスレッド群がアプリ３に属している。 Hereinafter, the determination method of the register value sharing method shown in FIG. 9 is executed, and execution results of the first to third thread groups will be described with reference to FIGS. For example, it is assumed that the thread groups belong to different applications. For example, the first thread group belongs to the app 1, the second thread group belongs to the app 2, and the third thread group belongs to the app 3.

また、第１のスレッド群は、同期命令に偏りがあるスレッド群を想定しており、たとえば、図６で示したスレッドＡ＿０〜スレッドＡ＿２である。第２のスレッド群は、同期命令に偏りがないスレッド群を想定しており、たとえば、図６で示したスレッドＢ＿０〜スレッドＢ＿２である。第３のスレッド群は、同期命令に偏りがあるスレッドと、同期命令に偏りがないスレッドが混在している場合を想定しており、たとえば、図６で示したスレッドＣ＿０〜スレッドＣ＿２である。 Further, the first thread group is assumed to be a thread group in which the synchronization instruction is biased, and is, for example, the thread A_0 to the thread A_2 illustrated in FIG. The second thread group is assumed to be a thread group in which there is no bias in the synchronization instruction, and is, for example, the thread B_0 to the thread B_2 illustrated in FIG. The third thread group assumes a case in which a thread having a biased synchronization instruction and a thread having a biased synchronization instruction are mixed, for example, the thread C_0 to the thread C_2 illustrated in FIG.

図１０は、第１のスレッド群の前提条件の一例を示す説明図である。表１００１には、共有方法でのＣＰＵの処理能力と、複写方法でのＣＰＵの処理能力を示しており、前提条件１００２では、第１のスレッド群となるスレッドＡ＿０〜スレッドＡ＿２の処理量と、同期通知および同期待ちの詳細について示している。また、表１００３は、スレッドＡ＿０〜スレッドＡ＿２に関する式（１）〜式（３）の算出結果を示している。なお、図１０におけるスレッドＡ＿０〜スレッドＡ＿２のプロファイル情報５２１としては、図６で示した値と同一である。 FIG. 10 is an explanatory diagram illustrating an example of a precondition for the first thread group. A table 1001 shows the CPU processing capacity in the sharing method and the CPU processing capacity in the copying method. In the precondition 1002, the processing amount of the thread A_0 to the thread A_2 as the first thread group, Details of synchronization notification and waiting for synchronization are shown. A table 1003 shows the calculation results of the expressions (1) to (3) regarding the thread A_0 to the thread A_2. Note that the profile information 521 of the thread A_0 to the thread A_2 in FIG. 10 is the same as the value shown in FIG.

表１００１に示すように、たとえば、共有方法にて自身のレジスタにアクセスするＣＰＵの処理能力を３００［命令数／ｕｓ］であるとし、共有方法にて他のＣＰＵのレジスタにアクセスするＣＰＵの処理能力を１００［命令数／ｕｓ］であるとする。また、複写方法のＣＰＵの処理能力を１５０［命令数／ｕｓ］であるとする。 As shown in Table 1001, for example, the processing capacity of a CPU that accesses its own register by the sharing method is 300 [number of instructions / us], and the processing of the CPU that accesses the register of another CPU by the sharing method Assume that the capability is 100 [number of instructions / us]. Further, it is assumed that the processing capability of the CPU of the copying method is 150 [number of instructions / us].

また、前提条件１００２では、たとえば、スレッドＡ＿０の処理番号｛１｝は、処理量が６００［命令数］であり、処理番号｛５｝へ同期通知を送信する。続けて、スレッドＡ＿０は、処理番号｛４｝、｛７｝、｛１０｝、｛１３｝、｛１４｝の順で処理を行う。また、スレッドＡ＿１の処理番号｛２｝は、処理量４５０［命令数］であり、同期命令は行わない。続けて、スレッドＡ＿１は、処理番号｛５｝、｛８｝、｛１１｝の順で処理を行う。また、スレッドＡ＿２の処理番号｛３｝は、処理量６００［命令数］であり、同期命令は行わない。続けて、スレッドＡ＿２は、処理番号｛６｝、｛９｝、｛１２｝の順で処理を行う。 Further, in the precondition 1002, for example, the processing number {1} of the thread A_0 has a processing amount of 600 [number of instructions], and a synchronization notification is transmitted to the processing number {5}. Subsequently, the thread A_0 performs processing in the order of processing numbers {4}, {7}, {10}, {13}, and {14}. Further, the processing number {2} of the thread A_1 has a processing amount of 450 [number of instructions], and no synchronous instruction is performed. Subsequently, the thread A_1 performs processing in the order of processing numbers {5}, {8}, and {11}. Further, the processing number {3} of the thread A_2 has a processing amount of 600 [number of instructions], and no synchronous instruction is performed. Subsequently, the thread A_2 performs processing in the order of processing numbers {6}, {9}, and {12}.

また、表１００３で示すように、マルチコアプロセッサシステム１００は、スレッドＡ＿０〜スレッドＡ＿２に対して、（１）式、（２）式を実行する。たとえば、スレッドＡに対する（１）式は、以下のように実行される。 As shown in Table 1003, the multi-core processor system 100 executes Expressions (1) and (2) for the thread A_0 to the thread A_2. For example, the expression (1) for the thread A is executed as follows.

｜（６−０）／６｜＝１＞０．４ | (6-0) / 6 | = 1> 0.4

このように、スレッドＡ＿０は（１）式を満たしている。同様に、スレッドＡ＿０に対する（２）式、スレッドＡ＿１、スレッドＡ＿２に対する（１）式、（２）式を算出する。（１）式の算出結果について、スレッドＡ＿０〜スレッドＡ＿２全てが（１）式を満たしたため、マルチコアプロセッサシステム１００は、共有方法を用いる。また、マルチコアプロセッサシステム１００は、（３）式を実行し、（３）式の算出結果より、スレッドＡ＿０が最も大きい値となるため、スレッドＡ＿０をＣＰＵ＃０に割り当てる。 Thus, the thread A_0 satisfies the expression (1). Similarly, the formula (2) for the thread A_0, the formula (1) and the formula (2) for the thread A_1 and the thread A_2 are calculated. Regarding the calculation result of the expression (1), since all the threads A_0 to A_2 satisfy the expression (1), the multi-core processor system 100 uses a sharing method. Further, the multi-core processor system 100 executes Expression (3) and assigns the thread A_0 to the CPU # 0 because the thread A_0 has the largest value from the calculation result of the expression (3).

図１１は、共有方法、または複写方法を用いて第１のスレッド群を実行した場合の結果の一例を示す説明図である。図１１の例では、タイムチャート１１０１は、図１０で判断したように、共有方法を用い、スレッドＡ＿０をＣＰＵ＃０に割り当てた場合の結果を示している。また、比較として、タイムチャート１１０２は、共有方法を用い、スレッドＡ＿０をＣＰＵ＃２に割り当てた場合の結果を示している。同様に、タイムチャート１１０３は、複写方法を用いた場合の結果を示している。なお、各処理にかかる時間は、前提条件１００２にて示した処理量を、表１００１で示した処理能力で除算した結果である。 FIG. 11 is an explanatory diagram illustrating an example of a result when the first thread group is executed using the sharing method or the copying method. In the example of FIG. 11, the time chart 1101 shows the result when the thread A_0 is assigned to the CPU # 0 using the sharing method as determined in FIG. For comparison, the time chart 1102 shows the result when the thread A_0 is assigned to the CPU # 2 using the sharing method. Similarly, a time chart 1103 shows the results when the copying method is used. The time required for each process is the result of dividing the processing amount shown in the precondition 1002 by the processing capacity shown in the table 1001.

タイムチャート１１０１にて、スレッドＡ＿０を実行するＣＰＵ＃０は、処理番号｛１｝、｛４｝、｛７｝、｛１０｝、｛１３｝、｛１４｝を実行し、１１．５［ｕｓ］に処理を終了する。また、スレッドＡ＿１を実行するＣＰＵ＃１は、処理番号｛２｝、｛５｝、｛８｝、｛１１｝を実行し、１９．５［ｕｓ］に処理を終了する。同様に、スレッドＡ＿２を実行するＣＰＵ＃２は、処理番号｛３｝、｛６｝、｛９｝、｛１２｝を実行し、１９．５［ｕｓ］に処理を終了する。 In the time chart 1101, the CPU # 0 executing the thread A_0 executes the process numbers {1}, {4}, {7}, {10}, {13}, {14}, and 11.5 [us To finish the process. The CPU # 1 that executes the thread A_1 executes the process numbers {2}, {5}, {8}, and {11}, and ends the process at 19.5 [us]. Similarly, the CPU # 2 executing the thread A_2 executes the process numbers {3}, {6}, {9}, and {12}, and ends the process at 19.5 [us].

タイムチャート１１０２にて、スレッドＡ＿０を実行するＣＰＵ＃２は、図示していないが、３４．５［ｕｓ］に処理を終了する。タイムチャート１１０１の結果と比較すると、タイムチャート１１０２ではスレッドＡ＿０の処理に時間がかかり、結果、通知待ちを行うスレッドＡ＿１、スレッドＡ＿２の処理にも時間がかかるようになってしまっている。また、スレッドＡ＿２を実行するＣＰＵ＃０は、たとえば、２［ｕｓ］から１２［ｕｓ］まで待ち時間が発生してしまっている。 In the time chart 1102, the CPU # 2 executing the thread A_0 ends the process at 34.5 [us], which is not illustrated. Compared with the results of the time chart 1101, in the time chart 1102, the processing of the thread A_0 takes time, and as a result, the processing of the thread A_1 and the thread A_2 waiting for notification also takes time. Further, the CPU # 0 executing the thread A_2 has a waiting time from 2 [us] to 12 [us], for example.

タイムチャート１１０３にて、スレッドＡ＿０を実行するＣＰＵ＃０は、２３［ｕｓ］に処理を終了する。スレッドＡ＿１を実行するＣＰＵ＃１は、２２［ｕｓ］に処理を終了し、スレッドＡ＿２を実行するＣＰＵ＃２は、２６［ｕｓ］に処理を終了する。タイムチャート１１０１の結果と比較すると、タイムチャート１１０３ではスレッドＡ＿０の処理に時間がかかっている。また、ＣＰＵ＃１、ＣＰＵ＃２は、同期待ちの時間が細切れに発生してしまっている。たとえば、ＣＰＵ＃１では、３［ｕｓ］〜４［ｕｓ］、８［ｕｓ］〜１１［ｕｓ］といった細かい時間で待ちが発生しており、ＣＰＵ＃２では、４［ｕｓ］〜８［ｕｓ］、１１［ｕｓ］〜１５［ｕｓ］といった時間で待ちが発生している。 In the time chart 1103, the CPU # 0 executing the thread A_0 ends the process at 23 [us]. The CPU # 1 executing the thread A_1 ends the process at 22 [us], and the CPU # 2 executing the thread A_2 ends the process at 26 [us]. Compared with the result of the time chart 1101, in the time chart 1103, the processing of the thread A_0 takes time. In addition, the CPU # 1 and the CPU # 2 have generated a short time for waiting for synchronization. For example, in CPU # 1, waiting occurs in a minute time such as 3 [us] to 4 [us] and 8 [us] to 11 [us], and in CPU # 2, 4 [us] to 8 [us] ], 11 [us] to 15 [us], waiting has occurred.

図１２は、第２のスレッド群の前提条件の一例を示す説明図である。表１００１には、共有方法でのＣＰＵの処理能力と、複写方法でのＣＰＵの処理能力を示しており、前提条件１２０１では、第２のスレッド群となるスレッドＢ＿０〜スレッドＢ＿２の処理量と、同期通知および同期待ちの詳細について示している。また、表１２０２は、スレッドＢ＿０〜スレッドＢ＿２に関する式（１）〜式（３）の算出結果を示している。なお、図１２におけるスレッドＢ＿０〜スレッドＢ＿２のプロファイル情報５２１は、図６で示した値と同一である。なお、表１００１は図１０で説明した値と同一であるため、説明を省略する。 FIG. 12 is an explanatory diagram illustrating an example of a precondition for the second thread group. A table 1001 shows the CPU processing capacity in the sharing method and the CPU processing capacity in the copying method. In the precondition 1201, the processing amount of the thread B_0 to the thread B_2 as the second thread group, Details of synchronization notification and waiting for synchronization are shown. Table 1202 shows calculation results of Expressions (1) to (3) regarding the thread B_0 to the thread B_2. Note that the profile information 521 of the thread B_0 to thread B_2 in FIG. 12 is the same as the value shown in FIG. The table 1001 is the same as the value described in FIG.

前提条件１２０１で示すように、たとえば、スレッドＢ＿０の処理番号｛１｝は、処理量が６００［命令数］であり、処理番号｛５｝へ同期通知を送信する。次に、スレッドＢ＿０の処理番号｛４｝は、処理量が４５０［命令数］であり、処理番号｛３｝からの同期待ちを行い、処理番号｛８｝へ同期通知を送信する。続けて、スレッドＢ＿０は、処理番号｛７｝、｛１０｝の順で処理を行う。また、スレッドＢ＿１の処理番号｛２｝は、処理量４５０［命令数］であり、処理番号｛６｝へ同期通知を送信する。続けて、スレッドＢ＿１は、処理番号｛５｝、｛８｝、｛１１｝の順で処理を行う。また、スレッドＢ＿２の処理番号｛３｝は、処理量６００［命令数］であり、処理番号｛４｝へ同期通知を送信する。続けて、スレッドＢ＿２は、処理番号｛６｝、｛９｝、｛１２｝の順で処理を行う。 As indicated by the precondition 1201, for example, the processing number {1} of the thread B_0 has a processing amount of 600 [number of instructions] and transmits a synchronization notification to the processing number {5}. Next, the processing number {4} of the thread B_0 has a processing amount of 450 [number of instructions], waits for synchronization from the processing number {3}, and transmits a synchronization notification to the processing number {8}. Subsequently, the thread B_0 performs processing in the order of processing numbers {7} and {10}. Further, the processing number {2} of the thread B_1 has a processing amount of 450 [number of instructions], and a synchronization notification is transmitted to the processing number {6}. Subsequently, the thread B_1 performs processing in the order of processing numbers {5}, {8}, and {11}. Further, the processing number {3} of the thread B_2 has a processing amount of 600 [number of instructions], and a synchronization notification is transmitted to the processing number {4}. Subsequently, the thread B_2 performs processing in the order of processing numbers {6}, {9}, and {12}.

また、表１２０２で示すように、マルチコアプロセッサシステム１００は、スレッドＢ＿０〜スレッドＢ＿２に対して、（１）式、（２）式を実行する。（１）式の算出結果について、スレッドＢ＿０〜スレッドＢ＿２全てが（１）式を満たさないため、マルチコアプロセッサシステム１００は、複写方法を用いる。 As shown in Table 1202, the multi-core processor system 100 executes Expressions (1) and (2) for the thread B_0 to the thread B_2. Regarding the calculation result of the expression (1), since all the threads B_0 to B_2 do not satisfy the expression (1), the multi-core processor system 100 uses a copying method.

図１３は、共有方法、または複写方法を用いて第２のスレッド群を実行した場合の結果の一例を示す説明図である。図１３の例では、タイムチャート１３０１は、図１２で判断したように、複写方法を用いた場合の結果を示している。また、比較として、タイムチャート１３０２は、共有方法を用いた場合の結果を示している。 FIG. 13 is an explanatory diagram illustrating an example of a result when the second thread group is executed using the sharing method or the copying method. In the example of FIG. 13, the time chart 1301 shows the result when the copying method is used as determined in FIG. For comparison, a time chart 1302 shows the results when the sharing method is used.

タイムチャート１３０１にて、スレッドＢ＿０を実行するＣＰＵ＃０は、処理番号｛１｝、｛４｝、｛７｝、｛１０｝を実行し、１６［ｕｓ］に処理を終了する。また、スレッドＢ＿１を実行するＣＰＵ＃１は、処理番号｛２｝、｛５｝、｛８｝、｛１１｝を実行し、１６［ｕｓ］に処理を終了する。同様に、スレッドＢ＿２を実行するＣＰＵ＃２は、処理番号｛３｝、｛６｝、｛９｝、｛１２｝を実行し、１５［ｕｓ］に処理を終了する。 In the time chart 1301, the CPU # 0 executing the thread B_0 executes the process numbers {1}, {4}, {7}, {10}, and ends the process at 16 [us]. The CPU # 1 that executes the thread B_1 executes the process numbers {2}, {5}, {8}, and {11}, and ends the process at 16 [us]. Similarly, the CPU # 2 executing the thread B_2 executes the process numbers {3}, {6}, {9}, and {12}, and ends the process at 15 [us].

タイムチャート１３０２にて、スレッドＢ＿０を実行するＣＰＵ＃０は、２０［ｕｓ］に処理を終了する。また、スレッドＢ＿１を実行するＣＰＵ＃１は、２１［ｕｓ］に処理を終了する。また、スレッドＢ＿２を実行するＣＰＵ＃２は、２２．５［ｕｓ］に処理を終了する。タイムチャート１３０１の結果と比較すると、タイムチャート１３０２ではスレッドＢ＿１、スレッドＢ＿２の処理に時間がかかるようになってしまっている。 In the time chart 1302, the CPU # 0 executing the thread B_0 ends the process at 20 [us]. Also, the CPU # 1 executing the thread B_1 ends the process at 21 [us]. Further, the CPU # 2 executing the thread B_2 ends the process at 22.5 [us]. Compared with the result of the time chart 1301, in the time chart 1302, the processing of the thread B_1 and the thread B_2 takes time.

図１４は、第３のスレッド群の前提条件の一例を示す説明図である。表１００１には、共有方法でのＣＰＵの処理能力と、複写方法でのＣＰＵの処理能力を示しており、前提条件１４０１には、第３のスレッド群となるスレッドＣ＿０〜スレッドＣ＿２の処理量と、同期通知および同期待ちの詳細について示している。また、表１４０２は、スレッドＣ＿０〜スレッドＣ＿２に関する式（１）〜式（３）の算出結果を示している。なお、図１４におけるスレッドＣ＿０〜スレッドＣ＿２のプロファイル情報５２１は、図６で示した値と同一である。なお、表１００１は図１０で説明した値と同一であるため、説明を省略する。 FIG. 14 is an explanatory diagram illustrating an example of a precondition for the third thread group. A table 1001 shows the CPU processing capacity in the sharing method and the CPU processing capacity in the copying method. The precondition 1401 includes the processing amount of the threads C_0 to C_2 as the third thread group. Details of synchronization notification and waiting for synchronization are shown. A table 1402 shows calculation results of Expressions (1) to (3) regarding the thread C_0 to the thread C_2. Note that the profile information 521 of the thread C_0 to the thread C_2 in FIG. 14 is the same as the value shown in FIG. The table 1001 is the same as the value described in FIG.

前提条件１４０１で示した図のように、たとえば、スレッドＣ＿０の処理番号｛１｝は、処理量が６００［命令数］であり、処理番号｛５｝へ同期通知を送信し、処理番号｛４｝は、処理量が６００［命令数］であり、処理番号｛６｝へ同期通知を送信する。また、スレッドＣ＿１の処理番号｛２｝は、処理量４５０［命令数］であり、同期処理は行わない。また、スレッドＢ＿２の処理番号｛３｝は、処理量６００［命令数］であり、処理番号｛５｝へ同期通知を送信する。 As shown in the diagram shown in the precondition 1401, for example, the processing number {1} of the thread C_0 has a processing amount of 600 [number of instructions], and a synchronization notification is transmitted to the processing number {5}, and the processing number {4 } Has a processing amount of 600 [number of instructions] and transmits a synchronization notification to the processing number {6}. Further, the processing number {2} of the thread C_1 has a processing amount of 450 [number of instructions], and the synchronization processing is not performed. Further, the processing number {3} of the thread B_2 has a processing amount of 600 [number of instructions], and a synchronization notification is transmitted to the processing number {5}.

また、表１４０２で示すように、マルチコアプロセッサシステム１００は、スレッドＣ＿０〜スレッドＣ＿２に対して、（１）式、（２）式を実行する。（１）式の算出結果について、スレッドＣ＿０、スレッドＣ＿１が（１）式を満たしたため、マルチコアプロセッサシステム１００は、共有方法を用いる。また、マルチコアプロセッサシステム１００は、（３）式を実行し、（３）式の算出結果より、スレッドＣ＿０が最も大きい値となるため、スレッドＣ＿０をＣＰＵ＃０に割り当てる。 As shown in Table 1402, the multi-core processor system 100 executes Expressions (1) and (2) for the threads C_0 to C_2. Regarding the calculation result of the expression (1), since the thread C_0 and the thread C_1 satisfy the expression (1), the multi-core processor system 100 uses a sharing method. Further, the multi-core processor system 100 executes Expression (3) and assigns the thread C_0 to the CPU # 0 because the thread C_0 has the largest value from the calculation result of the expression (3).

図１５は、共有方法、または複写方法を用いて第３のスレッド群を実行した場合の結果の一例を示す説明図である。図１５の例では、タイムチャート１５０１は、図１４で判断したように、共有方法を用い、スレッドＣ＿０をＣＰＵ＃０に割り当てた場合の結果を示している。また、比較として、タイムチャート１５０２は、共有方法を用い、スレッドＣ＿０をＣＰＵ＃２に割り当てた場合の結果を示している。同様に、タイムチャート１５０３は、複写方法を用いた場合の結果を示している。 FIG. 15 is an explanatory diagram illustrating an example of a result when the third thread group is executed using the sharing method or the copying method. In the example of FIG. 15, the time chart 1501 shows the result when the thread C_0 is assigned to the CPU # 0 using the sharing method as determined in FIG. For comparison, a time chart 1502 shows a result when the thread C_0 is assigned to the CPU # 2 using the sharing method. Similarly, a time chart 1503 shows the result when the copying method is used.

タイムチャート１５０１にて、スレッドＣ＿０を実行するＣＰＵ＃０は、処理番号｛１｝、｛４｝、｛７｝、｛１０｝、｛１３｝、｛１４｝を実行し、１１．５［ｕｓ］に処理を終了する。また、スレッドＣ＿１を実行するＣＰＵ＃１は、処理番号｛２｝、｛５｝、｛８｝、｛１１｝を実行し、２１［ｕｓ］に処理を終了する。同様に、スレッドＡ＿２を実行するＣＰＵ＃２は、処理番号｛３｝、｛６｝、｛９｝、｛１２｝を実行し、１９．５［ｕｓ］に処理を終了する。 In the time chart 1501, the CPU # 0 executing the thread C_0 executes the process numbers {1}, {4}, {7}, {10}, {13}, {14}, and 11.5 [us To finish the process. The CPU # 1 that executes the thread C_1 executes the process numbers {2}, {5}, {8}, and {11}, and ends the process at 21 [us]. Similarly, the CPU # 2 executing the thread A_2 executes the process numbers {3}, {6}, {9}, and {12}, and ends the process at 19.5 [us].

タイムチャート１５０２にて、スレッドＣ＿０を実行するＣＰＵ＃２は、図示していないが、３４．５［ｕｓ］に処理を終了する。タイムチャート１５０１の結果と比較すると、タイムチャート１５０２ではスレッドＣ＿０の処理に時間がかかり、結果、通知待ちを行うスレッドＣ＿１、スレッドＣ＿２の処理にも時間がかかるようになってしまっている。また、スレッドＣ＿２を実行するＣＰＵ＃０は、たとえば、２［ｕｓ］から１２［ｕｓ］まで待ち時間が発生してしまっている。 In the time chart 1502, the CPU # 2 executing the thread C_0 ends the process at 34.5 [us], which is not illustrated. Compared with the result of the time chart 1501, in the time chart 1502, the processing of the thread C_0 takes time, and as a result, the processing of the thread C_1 and the thread C_2 waiting for notification also takes time. Further, the CPU # 0 executing the thread C_2 has a waiting time from 2 [us] to 12 [us], for example.

タイムチャート１５０３にて、スレッドＣ＿０を実行するＣＰＵ＃０は、２３［ｕｓ］に処理を終了する。スレッドＣ＿１を実行するＣＰＵ＃１は、２２［ｕｓ］に処理を終了し、スレッドＣ＿２を実行するＣＰＵ＃２は、２６［ｕｓ］に処理を終了する。タイムチャート１５０１の結果と比較すると、タイムチャート１５０３ではスレッドＣ＿０の処理に時間がかかっている。また、ＣＰＵ＃１、ＣＰＵ＃２は、同期待ちの時間が細切れに発生してしまっている。たとえば、ＣＰＵ＃１では、３［ｕｓ］〜４［ｕｓ］、８［ｕｓ］〜１１［ｕｓ］といった細かい時間で待ちが発生しており、ＣＰＵ＃２では、４［ｕｓ］〜８［ｕｓ］、１１［ｕｓ］〜１５［ｕｓ］といった時間で待ちが発生している。 In the time chart 1503, the CPU # 0 executing the thread C_0 ends the process at 23 [us]. The CPU # 1 that executes the thread C_1 ends the process at 22 [us], and the CPU # 2 that executes the thread C_2 ends the process at 26 [us]. Compared with the result of the time chart 1501, in the time chart 1503, the processing of the thread C_0 takes time. In addition, the CPU # 1 and the CPU # 2 have generated a short time for waiting for synchronization. For example, in CPU # 1, waiting occurs in a minute time such as 3 [us] to 4 [us] and 8 [us] to 11 [us], and in CPU # 2, 4 [us] to 8 [us] ], 11 [us] to 15 [us], waiting has occurred.

図１５で示すように、複数のスレッドのうち１つでも同期命令に偏りがある場合、マルチコアプロセッサシステム１００は、共有方法を用いることで、ボトルネックとなるスレッドを高速に処理することができ、ＣＰＵ＃０〜ＣＰＵ＃２の利用効率を向上できる。 As shown in FIG. 15, when even one of a plurality of threads has a biased synchronization instruction, the multi-core processor system 100 can process a bottleneck thread at high speed by using the sharing method. The utilization efficiency of CPU # 0 to CPU # 2 can be improved.

続いて、図１６、図１７にて、図１０〜図１５で示したようなレジスタ利用処理のフローチャートを示す。マルチコアプロセッサシステム１００が実行するレジスタ利用処理は、図１６で示すレジスタ利用処理か、図１７で示すレジスタ利用処理か、のいずれであってもよい。なお、レジスタ利用処理は、ＣＰＵ＃０〜ＣＰＵ＃２のいずれのＣＰＵで行ってもよい。本実施の形態では、たとえば、ＣＰＵ＃０がレジスタ利用処理を実行する場合にて説明する。 Subsequently, FIGS. 16 and 17 show flowcharts of the register use processing as shown in FIGS. The register use process executed by the multi-core processor system 100 may be either the register use process shown in FIG. 16 or the register use process shown in FIG. Note that the register use processing may be performed by any of the CPUs # 0 to # 2. In the present embodiment, for example, the case where CPU # 0 executes a register use process will be described.

図１６は、レジスタ利用処理の一例を示すフローチャートである。図１６で示すレジスタ利用処理は、スケジューラからスレッド割当の通知をトリガーとして実行される。ＣＰＵ＃０は、スケジューラ５０１により、スレッドがＣＰＵ＃０〜ＣＰＵ＃２のいずれかに割り当てられることを検出する（ステップＳ１６０１）。以下、図１６の説明では、割り当てられるスレッドを対象スレッドと呼称する。ＣＰＵ＃０は、対象スレッドが細粒度並列処理か否かを判断する（ステップＳ１６０２）。なお、対象スレッドが細粒度並列処理か否かの判断方法としては、プロファイル情報５２１に、対象スレッドに対応するレコードの有無によって、細粒度並列処理か否かを判断する。 FIG. 16 is a flowchart illustrating an example of register use processing. The register use process shown in FIG. 16 is executed with a thread allocation notification from the scheduler as a trigger. The CPU # 0 detects that the thread is assigned to any of CPU # 0 to CPU # 2 by the scheduler 501 (step S1601). Hereinafter, in the description of FIG. 16, the assigned thread is referred to as a target thread. CPU # 0 determines whether or not the target thread is a fine-grain parallel processing (step S1602). As a method for determining whether or not the target thread is a fine-grain parallel processing, it is determined whether or not the fine-grain parallel processing is performed based on the presence or absence of a record corresponding to the target thread in the profile information 521.

細粒度並列処理である場合（ステップＳ１６０２：Ｙｅｓ）、ＣＰＵ＃０は、割当対象のスレッドに対応するプロファイル情報を取得する（ステップＳ１６０３）。次に、ＣＰＵ＃０は、プロファイル情報から、（１）式、（２）式を実行する（ステップＳ１６０４）。また、１つのアプリを実行する際に、複数のスレッドを割り当てる場合、ＣＰＵ＃０は、複数のスレッドの各々のスレッドに対して、（１）式、（２）式を実行する。 In the case of the fine-grain parallel processing (step S1602: Yes), the CPU # 0 acquires profile information corresponding to the allocation target thread (step S1603). Next, CPU # 0 executes formulas (1) and (2) from the profile information (step S1604). In addition, when a plurality of threads are assigned when executing one application, the CPU # 0 executes the expressions (1) and (2) for each of the plurality of threads.

（１）式、（２）式の結果により、ＣＰＵ＃０は、同期命令に偏りがあるか否かを判断する（ステップＳ１６０５）。同期命令に偏りがある場合（ステップＳ１６０５：Ｙｅｓ）、ＣＰＵ＃０は、レジスタＩ／Ｆ１０２に、特定のＣＰＵをレジスタの共有元として共有方法を用いることを通知する（ステップＳ１６０６）。続けて、ＣＰＵ＃０は、プロファイル情報から、（３）式を実行する（ステップＳ１６０７）。（３）式の結果により、ＣＰＵ＃０は、ディスパッチャ５０３に、（３）式の値が最も大きいスレッドを、特定のＣＰＵに割り当てるように通知し（ステップＳ１６０８）、レジスタ利用処理を終了する。 Based on the results of the expressions (1) and (2), the CPU # 0 determines whether or not the synchronization command is biased (step S1605). When the synchronization command is biased (step S1605: Yes), the CPU # 0 notifies the register I / F 102 that the specific CPU is used as a register sharing source and the sharing method is used (step S1606). Subsequently, CPU # 0 executes Expression (3) from the profile information (step S1607). Based on the result of the expression (3), the CPU # 0 notifies the dispatcher 503 to allocate the thread having the largest value of the expression (3) to a specific CPU (step S1608), and ends the register use process.

同期命令に偏りがない場合（ステップＳ１６０５：Ｎｏ）、ＣＰＵ＃０は、レジスタＩ／Ｆ１０２に、複写方法を用いることを通知する（ステップＳ１６０９）。次に、ＣＰＵ＃０は、ディスパッチャ５０３に、スケジューラ５０１の指示通りに対象スレッドを割り当てるように通知し（ステップＳ１６１０）、レジスタ利用処理を終了する。 If there is no bias in the synchronization command (step S1605: No), the CPU # 0 notifies the register I / F 102 that the copying method is to be used (step S1609). Next, the CPU # 0 notifies the dispatcher 503 to allocate the target thread as instructed by the scheduler 501 (step S1610), and ends the register use processing.

また、細粒度並列処理でない場合（ステップＳ１６０２：Ｎｏ）、ＣＰＵ＃０は、レジスタＩ／Ｆに、レジスタ値の共有を行わないことを通知し（ステップＳ１６１１）、ステップＳ１６１０の処理に移行する。 If it is not the fine-grain parallel processing (step S1602: No), the CPU # 0 notifies the register I / F that the register value is not shared (step S1611), and the process proceeds to step S1610.

図１７は、レジスタ利用処理の他の例を示すフローチャートである。図１７で示すレジスタ利用処理は、スレッド内で同期命令が完了したことをトリガーとして実行される。また、図１７で示すステップＳ１７０１、ステップＳ１７０４、ステップＳ１７０５、ステップＳ１７０８以外の処理については、図１６で示した処理と同一であるため、説明を省略する。 FIG. 17 is a flowchart illustrating another example of register use processing. The register use process shown in FIG. 17 is executed with the completion of the synchronization instruction in the thread as a trigger. In addition, since processes other than step S1701, step S1704, step S1705, and step S1708 shown in FIG. 17 are the same as those shown in FIG.

ＣＰＵ＃０は、実行中のスレッドにて、同期命令が完了したことを検出し（ステップＳ１７０１）、ステップＳ１７０２の処理に移行する。以下、図１７の説明では、実行中のスレッドを対象スレッドと呼称する。なお、検出対象となる同期命令は、同期通知、同期待ち、バリア同期のうち、同期待ちのみであってもよい。理由として、同期通知の完了を検出した場合、同期待ちも近いうちに行われることが予想され、頻繁にレジスタ利用処理が実行されるのを防ぐためである。また、バリア同期についても、検出対象となる同期命令に含めなくともよい。 The CPU # 0 detects that the synchronization instruction has been completed in the executing thread (step S1701), and proceeds to the process of step S1702. Hereinafter, in the description of FIG. 17, the thread being executed is referred to as a target thread. Note that the synchronization command to be detected may be only synchronization waiting among synchronization notification, synchronization waiting, and barrier synchronization. The reason is that when the completion of the synchronization notification is detected, the synchronization wait is expected to be performed in the near future, and the frequent use of the register is prevented. Also, the barrier synchronization may not be included in the synchronization command to be detected.

ステップＳ１７０３の処理を実行後、ＣＰＵ＃０は、プロファイル情報５２１を、発行した同期命令数分減少する（ステップＳ１７０４）。次に、ＣＰＵ＃０は、更新したプロファイル情報５２１から、（１）式、（２）式を実行し（ステップＳ１７０５）、ステップＳ１７０６の処理に移行する。 After executing the processing of step S1703, CPU # 0 decreases the profile information 521 by the number of issued synchronous instructions (step S1704). Next, CPU # 0 executes formulas (1) and (2) from the updated profile information 521 (step S1705), and proceeds to the processing of step S1706.

また、ステップＳ１７０７の処理を実行後、ＣＰＵ＃０は、更新したプロファイル情報５２１から、（３）式を実行し（ステップＳ１７０８）、ステップＳ１７０９の処理に移行する。 Further, after executing the process of step S1707, CPU # 0 executes expression (3) from the updated profile information 521 (step S1708), and proceeds to the process of step S1709.

図１８は、本実施の形態にかかるコンピュータを用いたシステムの適用例を示す説明図である。図１８において、ネットワークＮＷは、サーバ１８０１とクライアント１８１１〜クライアント１８１４とが通信可能なネットワークであり、たとえば、ＬＡＮ、ＷＡＮ、インターネット、携帯電話網などを含む。 FIG. 18 is an explanatory diagram showing an application example of a system using a computer according to the present embodiment. In FIG. 18, a network NW is a network in which a server 1801 and clients 1811 to 1814 can communicate, and includes, for example, a LAN, a WAN, the Internet, a mobile phone network, and the like.

クライアント１８１１はノート型ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）である。クライアント１８１２はデスクトップ型ＰＣ、クライアント１８１３は携帯電話機である。携帯電話機として、クライアント１８１３は、スマートフォンであってもよいし、ＰＨＳ（ＰｅｒｓｏｎａｌＨａｎｄｙｐｈｏｎｅＳｙｓｔｅｍ）であってもよい。クライアント１８１４はタブレット型端末である。 The client 1811 is a notebook PC (Personal Computer). The client 1812 is a desktop PC, and the client 1813 is a mobile phone. As a mobile phone, the client 1813 may be a smartphone or a PHS (Personal Handyphone System). The client 1814 is a tablet terminal.

図１８のサーバ１８０１、クライアント１８１１〜クライアント１８１４は、実施の形態で説明したマルチコアプロセッサシステムとして、本実施の形態にかかるレジスタ利用方法を実行する。たとえば、サーバ１８０１内の複数のＣＰＵが、本実施の形態にかかるレジスタ利用方法を実行する。 The server 1801 and the clients 1811 to 1814 in FIG. 18 execute the register using method according to the present embodiment as the multi-core processor system described in the embodiment. For example, a plurality of CPUs in the server 1801 execute the register using method according to the present embodiment.

以上説明したように、マルチコアプロセッサシステム、レジスタ利用方法、およびレジスタ利用プログラムによれば、スレッドが発行する同期命令に偏りがあることを取得し、同期通知を行うＣＰＵのレジスタを他のＣＰＵに共有させる共有方法を用いる。これにより、同期通知を行うＣＰＵが速く実行し、通知を待つＣＰＵの待ちが短くなるため、全体の処理性能が向上する。また、レジスタの共有元となるＣＰＵに、同期通知が多いスレッドを割り当てると、待ち時間の粒度が大きくなるため、マルチコアプロセッサシステムは、ＤＶＦＳの利用や、他のアプリを実行しやすくなる。 As described above, according to the multi-core processor system, the register utilization method, and the register utilization program, it is acquired that there is a bias in the synchronization instruction issued by the thread, and the CPU register that performs synchronization notification is shared with other CPUs Use a sharing method. As a result, the CPU that performs the synchronization notification executes faster and the waiting time of the CPU waiting for the notification is shortened, so that the overall processing performance is improved. In addition, if a thread with a lot of synchronization notification is assigned to a CPU that is a register sharing source, the granularity of the waiting time increases, so that the multi-core processor system can easily use DVFS and execute other applications.

また、マルチコアプロセッサシステムは、スレッドが発行する同期命令に偏りがないことを取得し、ＣＰＵが自身のレジスタの値を更新する都度、他のＣＰＵのレジスタに値を複写する複写方法を用いてもよい。これにより、マルチコアプロセッサシステムは、ＣＰＵ間の性能差がなくなるため、同期待ち時間が減少し、ＣＰＵの処理能力を向上させることができる。 In addition, the multi-core processor system may acquire a fact that there is no bias in the synchronous instruction issued by the thread, and use a copying method that copies the value to the register of another CPU each time the CPU updates the value of its own register. Good. As a result, the multi-core processor system eliminates the performance difference between the CPUs, thereby reducing the synchronization waiting time and improving the CPU processing capability.

また、マルチコアプロセッサシステムは、少なくとも一つのスレッドについて、差分に基づいた値が所定値より大きいか否かを判断し、大きければ、共有方法を用いてもよい。これにより、マルチコアプロセッサシステムは、同期命令の偏りがあるスレッド群と偏りがないスレッド群を順次実行する場合でも、偏りがあるスレッドを実行する場合、レジスタの利用方法を共有方法に切り替えて、全体の処理能力を向上させることができる。 Further, the multi-core processor system determines whether or not a value based on the difference is larger than a predetermined value for at least one thread, and if it is larger, a sharing method may be used. This allows the multi-core processor system to switch the register usage method to the shared method when executing a thread with a bias even if a thread group with a bias in the synchronous instruction and a thread group without a bias are sequentially executed. The processing capacity can be improved.

また、マルチコアプロセッサシステムは、少なくとも一つのスレッドについて、同期通知数と同期待ち数との差分に基づいた値が所定値より大きいか否かを判断し、差分値が所定値以下であれば、複写方法を用いてもよい。これにより、マルチコアプロセッサシステムは、同期命令の偏りがあるスレッド群と偏りがないスレッド群を順次実行する場合でも、偏りがないスレッドを実行する場合、レジスタの利用方法を複写方法に切り替えて、全体の処理能力を向上させることができる。 In addition, the multi-core processor system determines whether or not a value based on the difference between the number of synchronization notifications and the number of waiting for synchronization is greater than a predetermined value for at least one thread. A method may be used. This allows the multi-core processor system to switch the register usage method to the copy method when executing a non-biased thread group even when sequentially executing a thread group with a biased synchronous instruction and a non-biased thread group. The processing capacity can be improved.

また、マルチコアプロセッサシステムは、スレッドがＣＰＵに割り当てられるときに、差分に基づいた値が所定値より大きいか否かを判断してもよい。これにより、マルチコアプロセッサシステムは、スレッドが実行される前のタイミングで、利用方法を切り替えることができる。 The multi-core processor system may determine whether a value based on the difference is larger than a predetermined value when a thread is assigned to the CPU. Thereby, the multi-core processor system can switch the usage method at the timing before the thread is executed.

また、マルチコアプロセッサシステムは、スレッドにて同期待ちが完了したことを検出した場合、同期通知数と同期待ち数を更新し、更新された同期通知数と同期待ち数の差分に基づいて、同期命令の偏りがあるか否かを判断してもよい。これにより、マルチコアプロセッサシステムは、スレッドの実行中であっても、より最適なレジスタ利用方法を用いることができる。 In addition, when the multi-core processor system detects that the synchronization wait has been completed in the thread, the multi-core processor system updates the synchronization notification count and the synchronization wait count, and based on the difference between the updated synchronization notification count and the synchronization wait count, It may be determined whether there is a bias. As a result, the multi-core processor system can use a more optimal register utilization method even during execution of a thread.

たとえば、スレッドの割当時には同期命令の偏りがあり、共有方法を用いて実行されていたスレッド群が、処理の前半部分にて、同期命令を全て発行し終えた場合、処理の後半部分は、複写方法を用いた方が処理能力を向上できる。マルチコアプロセッサシステムは、このようなスレッド群を実行する場合、同期待ちの完了を検出し、共有方法から複写方法に切り替えることで、常に共有方法にし続けた場合と比較して、より全体の処理能力を向上させることができる。 For example, there is a bias in synchronous instructions when assigning threads, and if the group of threads executed using the sharing method has issued all synchronous instructions in the first half of the process, the second half of the process is copied. The processing power can be improved by using the method. When executing such a group of threads, the multi-core processor system detects the completion of synchronization wait and switches from the sharing method to the copying method, so that the overall processing capacity is more than that when the sharing method is always kept. Can be improved.

また、マルチコアプロセッサシステムは、共有方法を用いる場合、スレッドに関する同期通知数と同期待ち数の差の大きさに基づいて、レジスタの共有元となるＣＰＵに割り当てるＣＰＵを特定してもよい。これにより、マルチコアプロセッサシステムは、他のスレッドを待たせる割合が多いスレッドを、処理の高速なＣＰＵに割り当てることができるため、他のスレッドの待ち時間が減少し、処理能力を向上させることができる。 In the case of using the sharing method, the multi-core processor system may specify the CPU to be assigned to the CPU that is the register sharing source based on the difference between the number of synchronization notifications related to threads and the number of synchronization waits. As a result, the multi-core processor system can assign a thread with a high ratio of waiting for other threads to a high-speed CPU, so that the waiting time of other threads can be reduced and the processing capacity can be improved. .

なお、本実施の形態で説明したレジスタ利用方法は、予め用意されたプログラムをパーソナル・コンピュータやワークステーション等のコンピュータで実行することにより実現することができる。本レジスタ利用プログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。また本レジスタ利用プログラムは、インターネット等のネットワークを介して配布してもよい。 Note that the register utilization method described in this embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. The register utilization program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. The register use program may be distributed through a network such as the Internet.

上述した実施の形態に関し、さらに以下の付記を開示する。 The following additional notes are disclosed with respect to the embodiment described above.

（付記１）複数のコアのそれぞれに割り当てられるスレッドのうち少なくともいずれか一つのスレッドに関する同期通知数と同期待ち数との差分に基づいた値が所定値より大きいことを示す情報を取得する取得手段と、
前記取得手段によって前記情報が取得された場合、前記複数のコアのうち同期通知を実行するコアのレジスタを他のコアに共有させることにより、前記複数のコアにより前記スレッドを実行する実行手段と、
を備えることを特徴とするマルチコアプロセッサシステム。 (Supplementary Note 1) Acquisition means for acquiring information indicating that a value based on a difference between the number of synchronization notifications and the number of synchronization waits for at least one of threads assigned to each of a plurality of cores is greater than a predetermined value When,
When the information is acquired by the acquisition means, execution means for executing the thread by the plurality of cores by sharing a register of a core that performs synchronization notification among the plurality of cores,
A multi-core processor system comprising:

（付記２）複数のコアのそれぞれに割り当てられるスレッドのいずれのスレッドについても同期通知数と同期待ち数との差分に基づいた値が所定値以下であることを示す情報を取得する取得手段と、
前記取得手段によって前記情報が取得された場合、前記複数のコアのうちいずれかのコアのレジスタの値が更新される都度、他のコアのレジスタに複写することにより、前記複数のコアにより前記スレッドを実行する実行手段と、
を備えることを特徴とするマルチコアプロセッサシステム。 (Additional remark 2) The acquisition means which acquires the information which shows that the value based on the difference of the number of synchronous notifications and the number of waiting for synchronization is below a predetermined value about any thread of the thread allocated to each of a plurality of cores;
When the information is acquired by the acquisition unit, each time the value of the register of one of the plurality of cores is updated, the thread is copied by the plurality of cores to the thread of the other core. Execution means for executing
A multi-core processor system comprising:

（付記３）複数のコアのそれぞれに割り当てられるスレッドのうち少なくともいずれか一つのスレッドに関する同期通知数と同期待ち数との差分に基づいた値が所定値より大きいか否かを判断する判断手段と、
前記判断手段によって前記値が前記所定値より大きいと判断された場合、前記複数のコアのうち同期通知を実行するコアのレジスタを他のコアに共有させることにより、前記複数のコアにより前記スレッドを実行する実行手段と、
を備えることを特徴とするマルチコアプロセッサシステム。 (Supplementary Note 3) Judgment means for judging whether or not a value based on a difference between the number of synchronization notifications and the number of synchronization waits for at least one of the threads assigned to each of the plurality of cores is larger than a predetermined value; ,
When the determination unit determines that the value is larger than the predetermined value, by sharing a register of a core that performs synchronization notification among the plurality of cores with other cores, Execution means to execute;
A multi-core processor system comprising:

（付記４）前記実行手段は、
前記判断手段によっていずれのスレッドについても前記値が前記所定値以下であると判断された場合、前記複数のコアのうちいずれかのコアのレジスタの値が更新される都度、他のコアのレジスタに複写することにより、前記複数のコアにより前記スレッドを実行する、
ことを特徴とする付記３に記載のマルチコアプロセッサシステム。 (Supplementary Note 4) The execution means includes:
When it is determined by the determination means that the value is less than or equal to the predetermined value for any thread, each time the value of the register of one of the plurality of cores is updated, the value is stored in the register of the other core. Executing the thread with the plurality of cores by copying;
The multi-core processor system according to supplementary note 3, wherein

（付記５）前記スレッドが前記複数のコアのいずれかのコアに割り当てられることを検出する検出手段をさらに備え、
前記判断手段は、
前記検出手段によって前記スレッドが割り当てられることが検出された場合、前記複数のコアのそれぞれに割り当てられるスレッドのうち少なくともいずれか一つのスレッドに関する同期通知数と同期待ち数との差分に基づいた値が所定値より大きいか否かを判断する、
ことを特徴とする付記３または４に記載のマルチコアプロセッサシステム。 (Additional remark 5) It further has a detecting means for detecting that the thread is assigned to any one of the plurality of cores,
The determination means includes
When it is detected by the detection means that the thread is allocated, a value based on a difference between the synchronization notification count and the synchronization wait count regarding at least one of the threads allocated to each of the plurality of cores is obtained. Determine whether it is greater than a predetermined value,
The multi-core processor system according to appendix 3 or 4, characterized by the above.

（付記６）前記スレッドのうちいずれかのスレッドにて同期待ちが完了したことを検出する検出手段と、
前記検出手段によって前記いずれかのスレッドにて同期待ちが完了したことを検出した場合、前記スレッドに関する前記同期通知数と前記同期待ち数とを更新する更新手段と、をさらに備え、
前記判断手段は、
前記更新手段によって前記スレッドに関する前記同期通知数と前記同期待ち数とが更新された場合、前記複数のコアのそれぞれに割り当てられるスレッドのうち少なくともいずれか一つのスレッドに関する同期通知数と同期待ち数との差分に基づいた値が所定値より大きいか否かを判断する、
ことを特徴とする付記３または４に記載のマルチコアプロセッサシステム。 (Supplementary Note 6) Detection means for detecting completion of synchronization waiting in any one of the threads,
An update means for updating the number of synchronization notifications and the number of synchronization waits related to the thread when the detection means detects that the waiting for synchronization is completed in any one of the threads;
The determination means includes
When the number of synchronization notifications and the number of synchronization waits related to the thread are updated by the updating means, the number of synchronization notifications and the number of synchronization waits for at least one of the threads assigned to each of the plurality of cores, To determine whether the value based on the difference between is greater than a predetermined value,
The multi-core processor system according to appendix 3 or 4, characterized by the above.

（付記７）前記実行手段が前記共有させることによって前記スレッドを実行する場合、前記複数のスレッドのうち前記スレッドに関する同期通知数と同期待ち数との差分に基づいて、スレッドを特定する特定手段と、
前記特定手段によって特定されたスレッドを、前記複数のコアのうちレジスタの共有元となるコアに割り当てる割当手段と、
をさらに備えることを特徴とする付記３、５、６のうちいずれか一つに記載のマルチコアプロセッサシステム。 (Supplementary note 7) When the execution unit executes the thread by causing the sharing, the specifying unit that specifies the thread based on the difference between the synchronization notification number and the synchronization waiting number related to the thread among the plurality of threads ,
An allocating unit that allocates the thread identified by the identifying unit to a core that is a register sharing source among the plurality of cores;
The multi-core processor system according to any one of supplementary notes 3, 5, and 6, further comprising:

（付記８）複数のコアのそれぞれに割り当てられるスレッドのうち少なくともいずれか一つのスレッドに関する同期通知数と同期待ち数との差分に基づいた値が所定値より大きいことを示す情報を取得し、
前記情報が取得された場合、前記複数のコアのうち同期通知を実行するコアのレジスタを他のコアに共有させることにより、前記複数のコアにより前記スレッドを実行する、
処理を前記複数のコアのうち特定のコアが実行するレジスタ利用方法。 (Appendix 8) Obtaining information indicating that a value based on a difference between the number of synchronization notifications and the number of waiting for synchronization regarding at least one of the threads assigned to each of the plurality of cores is greater than a predetermined value;
When the information is acquired, the thread is executed by the plurality of cores by causing other cores to share the register of the core that performs synchronization notification among the plurality of cores.
A register utilization method in which a specific core among the plurality of cores executes processing.

（付記９）複数のコアのそれぞれに割り当てられるスレッドのいずれのスレッドについても同期通知数と同期待ち数との差分に基づいた値が所定値以下であることを示す情報を取得し、
前記情報が取得された場合、前記複数のコアのうちいずれかのコアのレジスタの値が更新される都度、他のコアのレジスタに複写することにより、前記複数のコアにより前記スレッドを実行する、
処理を前記複数のコアのうち特定のコアが実行するレジスタ利用方法。 (Supplementary Note 9) Acquire information indicating that the value based on the difference between the number of synchronization notifications and the number of synchronization waits for any of the threads assigned to each of the plurality of cores is equal to or less than a predetermined value,
When the information is acquired, each time the value of the register of any of the plurality of cores is updated, the thread is executed by the plurality of cores by copying to the register of another core.
A register utilization method in which a specific core among the plurality of cores executes processing.

（付記１０）複数のコアのそれぞれに割り当てられるスレッドのうち少なくともいずれか一つのスレッドに関する同期通知数と同期待ち数との差分に基づいた値が所定値より大きいか否かを判断し、
前記値が前記所定値より大きいと判断された場合、前記複数のコアのうち同期通知を実行するコアのレジスタを他のコアに共有させることにより、前記複数のコアにより前記スレッドを実行する、
処理を前記複数のコアのうち特定のコアが実行するレジスタ利用方法。 (Additional remark 10) It is judged whether the value based on the difference of the synchronous notification number regarding the at least any one thread among the threads allocated to each of a plurality of cores and the synchronous waiting number is larger than a predetermined value,
When it is determined that the value is greater than the predetermined value, the thread is executed by the plurality of cores by causing another core to share a register of a core that performs synchronization notification among the plurality of cores.
A register utilization method in which a specific core among the plurality of cores executes processing.

（付記１１）複数のコアのそれぞれに割り当てられるスレッドのうち少なくともいずれか一つのスレッドに関する同期通知数と同期待ち数との差分に基づいた値が所定値より大きいことを示す情報を取得し、
前記情報が取得された場合、前記複数のコアのうち同期通知を実行するコアのレジスタを他のコアに共有させることにより、前記複数のコアに前記スレッドを実行させる、
処理を前記複数のコアのうち特定のコアに実行させるレジスタ利用プログラム。 (Supplementary Note 11) Obtaining information indicating that a value based on a difference between the number of synchronization notifications and the number of waiting for synchronization regarding at least one of the threads allocated to each of the plurality of cores is greater than a predetermined value;
When the information is acquired, by causing another core to share a register of a core that performs synchronization notification among the plurality of cores, the plurality of cores execute the thread.
A register use program that causes a specific core of the plurality of cores to execute processing.

（付記１２）複数のコアのそれぞれに割り当てられるスレッドのいずれのスレッドについても同期通知数と同期待ち数との差分に基づいた値が所定値以下であることを示す情報を取得し、
前記情報が取得された場合、前記複数のコアのうちいずれかのコアのレジスタの値が更新される都度、他のコアのレジスタに複写することにより、前記複数のコアに前記スレッドを実行させる、
処理を前記複数のコアのうち特定のコアに実行させるレジスタ利用プログラム。 (Supplementary Note 12) For any of the threads assigned to each of the plurality of cores, obtain information indicating that the value based on the difference between the synchronization notification count and the synchronization wait count is equal to or less than a predetermined value,
When the information is acquired, each time the value of the register of one of the plurality of cores is updated, the thread is copied to the register of another core, thereby causing the plurality of cores to execute the thread.
A register use program that causes a specific core of the plurality of cores to execute processing.

（付記１３）複数のコアのそれぞれに割り当てられるスレッドのうち少なくともいずれか一つのスレッドに関する同期通知数と同期待ち数との差分に基づいた値が所定値より大きいか否かを判断し、
前記値が前記所定値より大きいと判断された場合、前記複数のコアのうち同期通知を実行するコアのレジスタを他のコアに共有させることにより、前記複数のコアに前記スレッドを実行させる、
処理を前記複数のコアのうち特定のコアに実行させるレジスタ利用プログラム。 (Additional remark 13) It is judged whether the value based on the difference of the synchronous notification number regarding the at least any one thread among the threads allocated to each of a plurality of cores and the synchronous waiting number is larger than a predetermined value,
When it is determined that the value is larger than the predetermined value, by causing another core to share a register of a core that performs synchronization notification among the plurality of cores, the plurality of cores execute the thread.
A register use program that causes a specific core of the plurality of cores to execute processing.

＃０〜＃２ＣＰＵ
Ａ＿１〜Ａ＿２スレッド
１０１バス
５０１スケジューラ
５０２レジスタ利用ライブラリ
５０３ディスパッチャ
５１１検出部
５１２更新部
５１３取得部
５１４判断部
５１５特定部
５１６通知部
５１７実行部
５１８割当部
５２１プロファイル情報 # 0 to # 2 CPU
A_1 to A_2 Thread 101 Bus 501 Scheduler 502 Register use library 503 Dispatcher 511 Detection unit 512 Update unit 513 Acquisition unit 514 Judgment unit 515 Identification unit 516 Notification unit 517 Execution unit 518 Allocation unit 521 Profile information

Claims

An acquisition unit configured to acquire information indicating that a value based on a difference between the number of synchronization notifications and the number of synchronization waits regarding at least one of the threads allocated to each of the plurality of cores is greater than a predetermined value;
When the information is acquired by the acquisition means, execution means for executing the thread by the plurality of cores by sharing a register of a core that performs synchronization notification among the plurality of cores,
A multi-core processor system comprising:

An acquisition means for acquiring information indicating that a value based on a difference between the number of synchronization notifications and the number of waiting for synchronization is less than or equal to a predetermined value for any of the threads assigned to each of the plurality of cores;
When the information is acquired by the acquisition unit, each time the value of the register of one of the plurality of cores is updated, the thread is copied by the plurality of cores to the thread of the other core. Execution means for executing
A multi-core processor system comprising:

A determination unit that determines whether or not a value based on a difference between the number of synchronization notifications and the number of synchronization waits for at least one of the threads allocated to each of the plurality of cores is greater than a predetermined value;
When the determination unit determines that the value is larger than the predetermined value, by sharing a register of a core that performs synchronization notification among the plurality of cores with other cores, Execution means to execute;
A multi-core processor system comprising:

The execution means includes
When it is determined by the determination means that the value is less than or equal to the predetermined value for any thread, each time the value of the register of one of the plurality of cores is updated, the value is stored in the register of the other core. Executing the thread with the plurality of cores by copying;
The multi-core processor system according to claim 3.

Detecting means for detecting that the thread is assigned to any one of the plurality of cores;
The determination means includes
When it is detected by the detection means that the thread is allocated, a value based on a difference between the synchronization notification count and the synchronization wait count regarding at least one of the threads allocated to each of the plurality of cores is obtained. Determine whether it is greater than a predetermined value,
The multi-core processor system according to claim 3 or 4, wherein

Detecting means for detecting completion of synchronization waiting in any one of the threads;
An update means for updating the number of synchronization notifications and the number of synchronization waits related to the thread when the detection means detects that the waiting for synchronization is completed in any one of the threads;
The determination means includes
When the number of synchronization notifications and the number of synchronization waits related to the thread are updated by the updating means, the number of synchronization notifications and the number of synchronization waits for at least one of the threads assigned to each of the plurality of cores, To determine whether the value based on the difference between is greater than a predetermined value,
The multi-core processor system according to claim 3 or 4, wherein

When the execution means executes the thread by causing the sharing, the specifying means for specifying the thread based on the difference between the number of synchronization notifications and the number of synchronization waits for the thread among the plurality of threads,
An allocating unit that allocates the thread identified by the identifying unit to a core that is a register sharing source among the plurality of cores;
The multi-core processor system according to claim 3, further comprising:

Obtaining information indicating that a value based on a difference between the number of synchronization notifications and the number of synchronization waits regarding at least one of the threads allocated to each of the plurality of cores is greater than a predetermined value;
When the information is acquired, the thread is executed by the plurality of cores by causing other cores to share the register of the core that performs synchronization notification among the plurality of cores.
A register utilization method in which a specific core among the plurality of cores executes processing.

Obtaining information indicating that a value based on a difference between the number of synchronization notifications and the number of synchronization waits regarding at least one of the threads allocated to each of the plurality of cores is greater than a predetermined value;
When the information is acquired, by causing another core to share a register of a core that performs synchronization notification among the plurality of cores, the plurality of cores execute the thread.
A register use program that causes a specific core of the plurality of cores to execute processing.