JP7059776B2

JP7059776B2 - Parallelization method, parallelization tool, and multi-core microcontroller

Info

Publication number: JP7059776B2
Application number: JP2018083128A
Authority: JP
Inventors: 憲一峰田
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2018-04-24
Filing date: 2018-04-24
Publication date: 2022-04-26
Anticipated expiration: 2038-04-24
Also published as: JP2019191870A; DE102019205674A1

Description

本発明は、シングルコアマイコン用のシングルプログラムからマルチコアマイコン用の並列プログラムを生成する並列化方法と並列化ツール、及びシングルプログラムから生成された並列プログラムを実行するマルチコアマイコンに関する。 The present invention relates to a parallelization method and a parallelization tool for generating a parallel program for a multi-core microcomputer from a single program for a single-core microcomputer, and a multi-core microcomputer for executing a parallel program generated from the single program.

従来、シングルコアマイコン用のシングルプログラムから、マルチコアマイコン用の並列プログラムを生成する並列化方法の一例として、特許文献１に開示された並列化コンパイル方法が知られている。 Conventionally, the parallel compilation method disclosed in Patent Document 1 is known as an example of a parallelization method for generating a parallel program for a multi-core microcomputer from a single program for a single-core microcomputer.

この並列化コンパイル方法では、シングルプログラムのソースコードの字句解析や構文解析を行って中間言語に展開し、この中間言語を用いて、複数のマクロタスク（処理単位）の依存関係の解析や最適化等を行う。また、従来の並列化コンパイル方法では、各マクロタスクの依存関係やマクロタスク毎の実行時間を基にコアへの割り付けやスケジューリングを行って並列プログラムを生成する。 In this parallel compilation method, the source code of a single program is lexically analyzed and parsed and expanded into an intermediate language, and this intermediate language is used to analyze and optimize the dependencies of multiple macro tasks (processing units). And so on. Further, in the conventional parallel compilation method, a parallel program is generated by allocating and scheduling to the core based on the dependency of each macro task and the execution time of each macro task.

特開２０１５－１８０７号公報Japanese Unexamined Patent Publication No. 2015-1807

並列プログラムを実行するマルチコアマイコンが、複数のコアによってアクセス可能な、複数のＲＡＭや複数のＲＯＭなどの複数のメモリを有し、かつ、複数のコアによる複数のメモリへのアクセス必要時間（アクセスレイテンシ）が異なる場合、データをどのメモリに格納するかに応じて、並列プログラムの実行時間に差が生じることが考えられる。例えば、第１コアと第２コアとが、割り付けられた処理単位をそれぞれ実行したとき、第１コアからアクセスする頻度が第２コアからアクセスする頻度よりも高いデータを、第１コアからのアクセス必要時間が第２コアからのアクセス必要時間よりも長いメモリに格納した場合、アクセス頻度の高い第１コアからのアクセス時間が長くかかることに起因して、並列プログラムの実行時間が長くなってしまう。 A multi-core microcomputer that executes a parallel program has multiple memories such as multiple RAMs and multiple ROMs that can be accessed by multiple cores, and the time required to access multiple memories by multiple cores (access latency). ) Is different, it is possible that the execution time of the parallel program will differ depending on which memory the data is stored in. For example, when the first core and the second core execute the assigned processing units, the data that is accessed from the first core more frequently than the frequency accessed from the second core is accessed from the first core. If the required time is stored in a memory longer than the required access time from the second core, the execution time of the parallel program becomes longer due to the longer access time from the first core, which is frequently accessed. ..

本発明は、上述した点に鑑みてなされたものであり、並列プログラムの実行時における複数のコアによるメモリへのアクセス時間を全体として短縮させることが可能な並列プログラムを生成する並列化方法と並列化ツール、及びその並列プログラムを実行するマルチコアマイコンを提供することを目的とする。 The present invention has been made in view of the above points, and is parallel to a parallelization method for generating a parallel program capable of shortening the access time to the memory by a plurality of cores at the time of executing the parallel program as a whole. The purpose is to provide a computerization tool and a multi-core microcomputer that executes its parallel program.

上記目的を達成するために本開示の一つは、
コアが一つであるシングルコアマイコン用のシングルプログラムから、複数のコア（Ｃ０、Ｃ１、Ｃ２）と複数のコアがアクセス可能な複数のメモリ（Ｌ０、Ｌ１、Ｌ２、Ｇ０、Ｇ１）とを有し、当該複数のメモリは複数のコアによるアクセスに要するアクセス必要時間が異なるメモリを含むマルチコアマイコン（２０）用の並列プログラム（１８ａ）を生成する並列化方法であって、
シングルプログラムに含まれる、複数の処理単位からなるタスク毎に、複数の処理単位の依存関係に基づき、複数の処理単位の前記複数のコアへの割り付けと実行順序とを決定し、この決定した複数のコアへの割り付け及び実行順序に従って複数の処理単位が実行されるように並列プログラムを生成する並列プログラム生成手順（１０ａ～１０ｅ）と、
複数の処理単位の複数のコアへの割り付け情報、複数の処理単位がアクセスするデータに関するデータアクセス情報、タスクの実行頻度を示す実行頻度情報、及び複数のコアによる複数のメモリの各々へのアクセス必要時間を示すアクセス必要時間情報に基づき、複数のコアによるデータへのアクセス時間が全体として短縮されるように、複数のメモリの中から、データを格納するメモリを決定する格納メモリ決定手順（１０ｈ）と、
格納メモリ決定手順によって決定されたデータとその格納先のメモリとの関係を示すメモリマップ（１８ｂ）を生成するメモリマップ生成手順（１０ｉ）と、を備える。 In order to achieve the above objectives, one of the present disclosures is
From a single program for a single-core microcomputer with one core, there are multiple cores (C0, C1, C2) and multiple memories (L0, L1, L2, G0, G1) that can be accessed by multiple cores. However, the plurality of memories are a parallelization method for generating a parallel program (18a) for a multi-core microcomputer (20) including memories having different access times required for access by a plurality of cores.
For each task consisting of a plurality of processing units included in a single program, the allocation and execution order of the plurality of processing units to the plurality of cores are determined based on the dependency of the plurality of processing units, and the determined plurality of processing units are determined. Parallel program generation procedure (10a to 10e) to generate a parallel program so that a plurality of processing units are executed according to the allocation to the core and the execution order.
Allocation information of multiple processing units to multiple cores, data access information regarding data accessed by multiple processing units, execution frequency information indicating the execution frequency of tasks, and access to each of multiple memories by multiple cores is required. Storage memory determination procedure (10h) that determines the memory for storing data from a plurality of memories so that the access time to data by a plurality of cores is shortened as a whole based on the access required time information indicating the time. When,
A memory map generation procedure (10i) for generating a memory map (18b) showing the relationship between the data determined by the storage memory determination procedure and the storage destination memory thereof is provided.

本開示の並列化方法によれば、格納メモリ決定手順において、複数の処理単位の複数のコアへの割り付け情報、複数の処理単位がアクセスするデータに関するデータアクセス情報、タスクの実行頻度を示す実行頻度情報、及び複数のコアによる複数のメモリの各々へのアクセス必要時間を示すアクセス必要時間情報を用いることで、複数のコアによるデータへのアクセス時間が全体として短縮されるように、複数のメモリの中から、データを格納するメモリを決定することが可能になる。 According to the parallelization method of the present disclosure, in the storage memory determination procedure, information on allocation of a plurality of processing units to a plurality of cores, data access information on data accessed by a plurality of processing units, and execution frequency indicating the execution frequency of a task are shown. By using information and access time information indicating the time required to access each of multiple memories by multiple cores, the access time to data by multiple cores can be shortened as a whole. From the inside, it becomes possible to determine the memory to store the data.

そして、本開示の並列化方法によれば、メモリマップ生成手順において、決定されたデータとその格納先のメモリとの関係を示すメモリマップが生成される。このメモリマップを利用することにより、マルチコアマイコンにおいて、そのメモリマップに含まれるデータと格納先メモリとの関係を満たすように、各データを格納するメモリを定めることができる。その結果、マルチコアマイコンの複数のコアによるデータのアクセス時間を全体として短縮することができ、ひいては、並列プログラムの実行時間の短縮化を図ることができる。 Then, according to the parallelization method of the present disclosure, a memory map showing the relationship between the determined data and the memory of the storage destination is generated in the memory map generation procedure. By using this memory map, it is possible to determine the memory for storing each data so as to satisfy the relationship between the data included in the memory map and the storage destination memory in the multi-core microcomputer. As a result, the data access time by the plurality of cores of the multi-core microcomputer can be shortened as a whole, and the execution time of the parallel program can be shortened.

本開示の他の一つは、
コアが一つであるシングルコアマイコン用のシングルプログラムから、複数のコア（Ｃ０、Ｃ１、Ｃ２）と複数のコアがアクセス可能な複数のメモリ（Ｌ０、Ｌ１、Ｌ２、Ｇ０、Ｇ１）とを有し、当該複数のメモリは複数のコアによるアクセスに要するアクセス必要時間が異なるメモリを含むマルチコアマイコン（２０）用の並列プログラム（１８ａ）を生成する並列化ツールであって、
シングルプログラムに含まれる、複数の処理単位からなるタスク毎に、複数の処理単位の依存関係に基づき、複数の処理単位の複数のコアへの割り付けと実行順序とを決定し、この決定した複数のコアへの割り付け及び実行順序に従って複数の処理単位が実行されるように並列プログラムを生成する並列プログラム生成部（１０ａ～１０ｅ）と、
複数の処理単位の複数のコアへの割り付け情報、複数の処理単位がアクセスするデータに関するデータアクセス情報、タスクの実行頻度を示す実行頻度情報、及び複数のコアによる複数のメモリの各々へのアクセス必要時間を示すアクセス必要時間情報に基づき、複数のコアによるデータへのアクセス時間が全体として短縮されるように、複数のメモリの中から、データを格納するメモリを決定する格納メモリ決定部（１０ｈ）と、
格納メモリ決定部によって決定されたデータとその格納先のメモリとの関係を示すメモリマップ（１８ｂ）を生成するメモリマップ生成部（１０ｉ）と、を備える
本開示の並列化ツールによれば、格納メモリ決定部が、複数の処理単位の複数のコアへの割り付け情報、複数の処理単位がアクセスするデータに関するデータアクセス情報、タスクの実行頻度を示す実行頻度情報、及び複数のコアによる複数のメモリの各々へのアクセス必要時間を示すアクセス必要時間情報を用いることで、複数のコアによるデータへのアクセス時間が全体として短縮されるように、複数のメモリの中から、データを格納するメモリを決定することが可能になる。 The other one of this disclosure is
From a single program for a single-core microcomputer with one core, there are multiple cores (C0, C1, C2) and multiple memories (L0, L1, L2, G0, G1) that can be accessed by multiple cores. However, the plurality of memories are parallelization tools for generating a parallel program (18a) for a multi-core microcomputer (20) including memories having different access times required for access by a plurality of cores.
For each task consisting of multiple processing units included in a single program, the allocation and execution order of multiple processing units to multiple cores are determined based on the dependencies of multiple processing units, and the determined plurality of processes are determined. A parallel program generator (10a to 10e) that generates a parallel program so that a plurality of processing units are executed according to the allocation to the core and the execution order.
Allocation information of multiple processing units to multiple cores, data access information regarding data accessed by multiple processing units, execution frequency information indicating the execution frequency of tasks, and access to each of multiple memories by multiple cores is required. A storage memory determination unit (10h) that determines a memory for storing data from a plurality of memories so that the access time to data by a plurality of cores is shortened as a whole based on the access required time information indicating the time. When,
According to the parallelization tool of the present disclosure, which comprises a memory map generation unit (10i) that generates a memory map (18b) showing the relationship between the data determined by the storage memory determination unit and the storage destination memory thereof. The memory determination unit determines the allocation information of multiple processing units to multiple cores, data access information regarding data accessed by multiple processing units, execution frequency information indicating the execution frequency of tasks, and multiple memories by multiple cores. By using the access required time information indicating the access required time for each, the memory for storing the data is determined from the multiple memories so that the access time to the data by the multiple cores is shortened as a whole. Will be possible.

そして、メモリマップ生成部が、決定されたデータとその格納先のメモリとの関係を示すメモリマップを生成する。このメモリマップを利用することにより、マルチコアマイコンにおいて、そのメモリマップに含まれるデータと格納先メモリとの関係を満たすように、各データを格納するメモリを定めることができる。その結果、マルチコアマイコンの複数のコアによるデータのアクセス時間を全体として短縮することができ、ひいては、並列プログラムの実行時間の短縮化を図ることができる。 Then, the memory map generation unit generates a memory map showing the relationship between the determined data and the memory of the storage destination. By using this memory map, in a multi-core microcomputer, a memory for storing each data can be determined so as to satisfy the relationship between the data included in the memory map and the storage destination memory. As a result, the data access time by the plurality of cores of the multi-core microcomputer can be shortened as a whole, and the execution time of the parallel program can be shortened.

本開示の他の一つは、
コアが一つであるシングルコアマイコン用のシングルプログラムから生成された、複数のコア（Ｃ０、Ｃ１、Ｃ２）を有するマルチコアマイコン（２０）用の並列プログラム（１８ａ）を実行するマルチコアマイコンであって、
複数のコアがアクセス可能な複数のメモリ（Ｌ０、Ｌ１、Ｌ２、Ｇ０、Ｇ１）を有し、当該複数のメモリは複数のコアによるアクセスに要するアクセス必要時間が異なるメモリを含み、
並列プログラムは、シングルプログラムに含まれる、複数の処理単位からなるタスク毎に、複数の処理単位の依存関係に基づき、複数の処理単位の複数のコアへの割り付けと実行順序とが決定され、この決定された複数のコアへの割り付け及び実行順序に従って複数の処理単位が実行されるように生成されたものであり、
複数のコアが、それぞれ割り付けられた処理単位の実行のために複数のメモリに格納されるデータにアクセスする際、そのアクセス対象となるメモリは、複数の処理単位の複数のコアへの割り付け情報、複数の処理単位がアクセスするデータに関するデータアクセス情報、タスクの実行頻度を示す実行頻度情報、及び複数のコアによる複数のメモリの各々へのアクセス必要時間を示すアクセス必要時間情報に基づき、複数のコアによるデータへのアクセス時間が全体として短縮されるように、複数のメモリの中から、データを格納するメモリが決定され、その決定されたデータと格納先のメモリとの関係を示すように生成されたメモリマップ（１８ｂ）に従って定められたメモリである。 The other one of this disclosure is
A multi-core microcomputer that executes a parallel program (18a) for a multi-core microcomputer (20) having a plurality of cores (C0, C1, C2) generated from a single program for a single-core microcomputer having one core. ,
It has a plurality of memories (L0, L1, L2, G0, G1) that can be accessed by a plurality of cores, and the plurality of memories include memories that require different access times for access by the plurality of cores.
In a parallel program, for each task consisting of multiple processing units included in a single program, the allocation and execution order of multiple processing units to multiple cores are determined based on the dependencies of multiple processing units. It was generated so that multiple processing units would be executed according to the determined allocation to multiple cores and the execution order.
When multiple cores access data stored in multiple memories for the execution of each allocated processing unit, the access target memory is the allocation information to the multiple cores of the multiple processing units. Multiple cores based on data access information about data accessed by multiple processing units, execution frequency information indicating task execution frequency, and access required time information indicating the time required to access each of multiple memories by multiple cores. The memory to store the data is determined from multiple memories so that the access time to the data by the data is shortened as a whole, and it is generated to show the relationship between the determined data and the storage destination memory. It is a memory defined according to the memory map (18b).

メモリマップには、複数のコアが、それぞれ割り付けられた処理単位の実行のためにデータにアクセスする際に、そのデータを格納するための最適なメモリが定められている。従って、このメモリマップを利用することにより、マルチコアマイコンにおいて、そのメモリマップに含まれるデータと格納先メモリとの関係を満たすように、各データを格納するメモリを定めることができる。その結果、マルチコアマイコンの複数のコアによるデータのアクセス時間を全体として短縮することができ、ひいては、並列プログラムの実行時間の短縮化を図ることができる。 The memory map defines the optimum memory for storing the data when a plurality of cores access the data for the execution of the assigned processing unit. Therefore, by using this memory map, it is possible to determine the memory for storing each data so as to satisfy the relationship between the data included in the memory map and the storage destination memory in the multi-core microcomputer. As a result, the data access time by the plurality of cores of the multi-core microcomputer can be shortened as a whole, and the execution time of the parallel program can be shortened.

上記括弧内の参照番号は、本開示の理解を容易にすべく、後述する実施形態における具体的な構成との対応関係の一例を示すものにすぎず、なんら発明の範囲を制限することを意図したものではない。 The reference numbers in parentheses are merely examples of the correspondence with the specific configurations in the embodiments described below, and are intended to limit the scope of the invention in order to facilitate the understanding of the present disclosure. Not what I did.

また、上述した特徴以外の、特許請求の範囲の各請求項に記載した技術的特徴に関しては、後述する実施形態の説明及び添付図面から明らかになる。 Further, the technical features described in each claim of the claims other than the above-mentioned features will be clarified from the description of the embodiment described later and the attached drawings.

実施形態における、自動並列化ツールとしてのコンピュータの概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the computer as an automatic parallelization tool in embodiment. 実施形態における、自動並列化ツールとしてのコンピュータの機能を示すブロック図である。It is a block diagram which shows the function of a computer as an automatic parallelization tool in an embodiment. 各タスクの処理単位を複数のコアに割り付けたコア割付情報の一例を示す図である。It is a figure which shows an example of the core allocation information which allocated the processing unit of each task to a plurality of cores. データアクセス解析部によって生成される、各タスクの処理単位によってアクセスされるデータに関するデータアクセス情報の一例を示す図である。It is a figure which shows an example of the data access information about the data which is generated by the data access analysis part, and is accessed by the processing unit of each task. マルチコアマイコンにおいて、各コアに対するメモリの配置の一例を示す図である。It is a figure which shows an example of the memory arrangement for each core in a multi-core microcomputer. 各コアから各メモリへのアクセス必要時間を示す、アクセスレイテンシ情報の一例を示す図である。It is a figure which shows an example of the access latency information which shows the access required time from each core to each memory. ＲＡＭに格納されるデータについて、格納先となるメモリを決定するとともにメモリマップを作成するための処理を示すフローチャートである。It is a flowchart which shows the process for deciding the memory which becomes the storage destination for the data stored in a RAM, and creating a memory map. データＲＯＭに格納されるデータについて、格納先となるメモリを決定するとともにメモリマップを作成するための処理を示すフローチャートである。It is a flowchart which shows the process for deciding the memory which becomes the storage destination for the data stored in a data ROM, and creating a memory map. コードＲＯＭに格納されるデータについて、格納先となるメモリを決定するとともにメモリマップを作成するための処理を示すフローチャートである。It is a flowchart which shows the process for deciding the memory which becomes the storage destination for the data stored in a code ROM, and creating a memory map. 単位時間当たりの各処理単位による各データへのアクセス頻度を算出した結果の一例を示す図である。It is a figure which shows an example of the result of having calculated the access frequency to each data by each processing unit per unit time. 単位時間当たりの各コアによる各データへのアクセス頻度を算出した結果の一例を示す図である。It is a figure which shows an example of the result of having calculated the access frequency to each data by each core per unit time. アクセスレイテンシ情報を考慮して、各データを各メモリに格納したと仮定した場合の、単位時間当たりのアクセス必要時間の合計を算出した結果の一例を示す図である。It is a figure which shows an example of the result of calculating the total access required time per unit time when it is assumed that each data is stored in each memory in consideration of access latency information.

以下において、図面を参照しながら、発明を実施するための形態を説明する。本実施形態では、並列化ツールとしてのコンピュータ１０が、コアが一つであるシングルコアマイコン用のシングルプログラムから、３個のコアＣ０、Ｃ１、Ｃ２を有するマルチコアマイコン２０用に並列化した並列プログラム１８ａを生成する例について説明する。なお、マルチコアマイコン２０が備えるコアの数は３個に限られず、２個であってもよいし、４個以上であってもよい。 Hereinafter, embodiments for carrying out the invention will be described with reference to the drawings. In the present embodiment, the computer 10 as a parallelization tool is a parallel program in which a single program for a single-core microcomputer having one core is parallelized for a multi-core microcomputer 20 having three cores C0, C1 and C2. An example of generating 18a will be described. The number of cores included in the multi-core microcomputer 20 is not limited to three, and may be two or four or more.

このように、シングルプログラムから並列プログラム１８ａを生成する背景として、制御の高度化によりプログラム量は年々増加する傾向にあるのに対し、シングルコアマイコンの性能向上には限界があることが挙げられる。つまり、例えばシングルコアマイコンの動作周波数を高めて処理能力を向上しようとしても、動作周波数を高めるにも限界があり、また動作周波数を高めることにより発熱量の増大や消費電力の増加を招いてしまう。このため、コア数の増加により処理能力向上を図るマルチコアマイコン２０を適用することが有効と考えられている。 As described above, as a background for generating the parallel program 18a from the single program, the program amount tends to increase year by year due to the sophistication of control, but there is a limit to the performance improvement of the single core microcomputer. That is, for example, even if an attempt is made to increase the operating frequency of a single-core microcomputer to improve the processing capacity, there is a limit to increasing the operating frequency, and increasing the operating frequency causes an increase in heat generation and an increase in power consumption. .. Therefore, it is considered effective to apply the multi-core microcomputer 20 for improving the processing capacity by increasing the number of cores.

この際、プログラムの開発者が、マルチコアの能力を最大限に発揮させられるように、各コアに適切に処理を割り振ったり、そのスケジューリングも行ったりしなければならないとすると、プログラムの開発負荷が増加してしまう。このようなプログラムの開発負荷を低減するために、シングルプログラムから並列プログラム１８ａを自動生成することは技術的意義がある。さらに、シングルプログラムから並列プログラム１８ａを自動生成することにより、シングルプロセッサ用に開発した既存のソフト資産を有効に活用することも可能となる。 At this time, if the program developer must appropriately allocate processing to each core and schedule it so that the multi-core capability can be maximized, the program development load increases. Resulting in. In order to reduce the development load of such a program, it is technically significant to automatically generate a parallel program 18a from a single program. Furthermore, by automatically generating the parallel program 18a from the single program, it is possible to effectively utilize the existing software assets developed for the single processor.

まず、図１を参照して、コンピュータ１０の構成に関して説明する。コンピュータ１０は、並列化方法を実行する並列化ツールに相当し、シングルプログラムから並列プログラム１８ａを生成するものである。なお、本実施形態では、コンピュータ１０は、Ｃ言語で記述されたシングルプログラムに基づき、Ｃ言語で記述された並列プログラム１８ａを生成するように構成される。このため、後述するマルチコアマイコン２０のＲＯＭに記憶され、マルチコアマイコン２０によって実行される並列プログラム１８ａ’は、図２に示すように、さらにコンパイラ１９によりコンパイルされて、バイナリコードに翻訳されたものとなる。 First, the configuration of the computer 10 will be described with reference to FIG. The computer 10 corresponds to a parallelization tool that executes a parallelization method, and generates a parallel program 18a from a single program. In this embodiment, the computer 10 is configured to generate a parallel program 18a written in C language based on a single program written in C language. Therefore, as shown in FIG. 2, the parallel program 18a'stored in the ROM of the multi-core microcomputer 20 described later and executed by the multi-core microcomputer 20 is further compiled by the compiler 19 and translated into binary code. Become.

しかしながら、本発明は、これに限定されない。シングルプログラムは、Ｃ言語とは異なるプログラミング言語で記述されていてもよい。また、並列プログラム１８ａは、例えば、シングルプログラムの解析時に使用する中間言語で記述されていてもよい。あるいは、コンピュータ１０は、Ｃ言語で記述された並列プログラムと中間言語で記述された並列プログラムとをともに生成してもよい。さらに、コンピュータ１０が、コンパイラ１９としての機能も取り込み、直接、バイナリコードの並列プログラム１８ａ’を生成してもよい。 However, the present invention is not limited to this. The single program may be written in a programming language different from C language. Further, the parallel program 18a may be written in, for example, an intermediate language used when analyzing a single program. Alternatively, the computer 10 may generate both a parallel program written in C language and a parallel program written in an intermediate language. Further, the computer 10 may also incorporate the function as the compiler 19 and directly generate the binary code parallel program 18a'.

コンピュータ１０は、図１に示すように、ディスプレイ１１、ＨＤＤ１２、ＣＰＵ１３、ＲＯＭ１４、ＲＡＭ１５、入力装置１６、読取部１７などを備えて構成されている。コンピュータ１０は、読取部１７により、記憶媒体１に記憶された記憶内容を読み取ることができる。図１に示すように、記憶媒体１には、例えば、自動並列化コンパイラ１ａが記憶される。なお、コンピュータ１０及び記憶媒体１は、特開２０１５－１８０７号公報に記載されたパーソナルコンピュータ１００及び記憶媒体１８０と同様であるため、詳細は、特開２０１５－１８０７号公報を参照されたい。 As shown in FIG. 1, the computer 10 includes a display 11, an HDD 12, a CPU 13, a ROM 14, a RAM 15, an input device 16, a reading unit 17, and the like. The computer 10 can read the stored contents stored in the storage medium 1 by the reading unit 17. As shown in FIG. 1, for example, the automatic parallelizing compiler 1a is stored in the storage medium 1. Since the computer 10 and the storage medium 1 are the same as the personal computer 100 and the storage medium 180 described in JP-A-2015-1807, refer to JP-A-2015-1807 for details.

自動並列化コンパイラ１ａは、並列プログラム１８ａを生成するための手順をコンピュータ１０に実行させるソフトウエアである。よって、自動並列化コンパイラ１ａにより、コンピュータ１０は並列化方法を実行可能となる。換言すれば、自動並列化コンパイラ１ａは、並列化方法を含むプログラムである。コンピュータ１０は、自動並列化コンパイラ１ａを実行することで、並列化ツールとして、並列プログラム１８ａを生成する。 The automatic parallelization compiler 1a is software that causes the computer 10 to execute a procedure for generating a parallel program 18a. Therefore, the automatic parallelization compiler 1a enables the computer 10 to execute the parallelization method. In other words, the automatic parallelization compiler 1a is a program including a parallelization method. The computer 10 generates a parallel program 18a as a parallelization tool by executing the automatic parallelization compiler 1a.

次に、図２を参照して、並列化ツールとしてのコンピュータ１０が有する、シングルプログラムから並列プログラム１８ａを生成するための各機能及び処理手順について説明する。図２は、コンピュータ１０の各機能及び処理手順を機能ブロックとして表した図である。図２に示すように、コンピュータ１０は、字句解析部１０ａ、構文・意味解析部１０ｂ、依存関係解析部１０ｃ、コア割付及びスケジューリング部１０ｄ、コード生成部１０ｅ、タスク実行頻度解析部１０ｆ、データアクセス解析部１０ｇ、格納メモリ決定部１０ｈ、及びメモリマップ生成部１０ｉとしての機能を有している。 Next, with reference to FIG. 2, each function and processing procedure for generating the parallel program 18a from the single program, which the computer 10 as the parallelization tool has, will be described. FIG. 2 is a diagram showing each function and processing procedure of the computer 10 as a functional block. As shown in FIG. 2, the computer 10 includes a lexical analysis unit 10a, a syntax / semantic analysis unit 10b, a dependency analysis unit 10c, a core allocation and scheduling unit 10d, a code generation unit 10e, a task execution frequency analysis unit 10f, and data access. It has functions as an analysis unit 10g, a storage memory determination unit 10h, and a memory map generation unit 10i.

本実施形態では、図２に示すように、コンピュータ１０には、制御対象機器を制御するためのシングルプログラム全体を一度に解析して並列プログラムを生成するのではなく、独立した処理機能（タスク）毎に分割されたシングルプログラムを対象として、その並列プログラムを生成する。なお、本実施形態により生成される並列プログラム１８ａは、マルチコアマイコン２０において実行されるときの実行速度を早めることができるので、制御対象機器として、例えば、素早い処理速度が求められる、車両に搭載されたエンジンや電動モータとすることが好適である。この場合、マルチコアマイコン２０は、車両に搭載されるエンジン制御装置、モータ制御装置、ハイブリッド制御装置などの車載装置として具現化される。 In the present embodiment, as shown in FIG. 2, the computer 10 does not generate a parallel program by analyzing the entire single program for controlling the controlled device at once, but has an independent processing function (task). A parallel program is generated for a single program divided for each. Since the parallel program 18a generated by the present embodiment can increase the execution speed when executed by the multi-core microcomputer 20, it is mounted on a vehicle as a controlled target device, for example, where a quick processing speed is required. It is preferable to use an engine or an electric motor. In this case, the multi-core microcomputer 20 is embodied as an in-vehicle device such as an engine control device, a motor control device, and a hybrid control device mounted on a vehicle.

タスク毎に分割されたシングルプログラムは複数の処理単位を含み、その複数の処理単位が実行されることにより、タスク毎の目的とする処理機能を実現することができる。このように複数の処理単位は、目的とする処理機能を実現するために協働するものであり、例えば先の処理単位で処理された変数データを参照する後の処理単位や、先の処理単位の条件分岐によって実行される後の処理単位などを含む。 A single program divided for each task includes a plurality of processing units, and by executing the plurality of processing units, the target processing function for each task can be realized. In this way, the plurality of processing units cooperate to realize the target processing function, for example, a processing unit after referring to the variable data processed in the previous processing unit, or a previous processing unit. Includes the processing unit after being executed by the conditional branch of.

ここで、処理単位とは、各コアに割り振る際の最小単位であるコア配置単位や、関数をいう。コア配置単位は、処理ブロック、マクロタスク、あるいは単なる処理単位などと言い換えることができる。コア配置単位と関数との関係は、コア配置単位≧関数である。つまり、関数は、コア配置単位自体である場合や、コア配置単位に含まれる親関数やサブ関数の場合がある。 Here, the processing unit refers to a core placement unit or a function, which is the minimum unit when allocating to each core. The core placement unit can be rephrased as a processing block, a macro task, or a simple processing unit. The relationship between the core placement unit and the function is that the core placement unit ≧ function. That is, the function may be the core placement unit itself, or it may be a parent function or a subfunction included in the core placement unit.

字句解析部１０ａ及び構文・意味解析部１０ｂは、Ｃ言語で記述されたシングルプログラムのソースコードを対象として、字句解析や、構文と意味の解析を行い、中間言語に展開する。字句解析部１０ａ及び構文・意味解析部１０ｂによって展開された中間言語は、汎用的な命令を含んでいる。なお、字句解析部１０ａ及び構文・意味解析部１０ｂは、特開２０１５－１８０７号公報のＦＥ３に相当するため、詳細は、特開２０１５－１８０７号公報を参照されたい。 The lexical analysis unit 10a and the syntax / semantic analysis unit 10b perform lexical analysis and syntax and meaning analysis on the source code of a single program written in C language, and develop it into an intermediate language. The intermediate language developed by the lexical analysis unit 10a and the syntax / semantic analysis unit 10b includes general-purpose instructions. Since the lexical analysis unit 10a and the syntax / semantic analysis unit 10b correspond to FE3 of JP-A-2015-1807, refer to JP-A-2015-1807 for details.

依存関係解析部１０ｃは、中間言語に展開されたシングルプログラムに含まれる処理単位の依存関係を解析し、並列実行可能な処理単位を抽出する。依存関係には、後に実行される処理単位が先に実行される処理単位で更新された変数データを参照するなどのデータ依存関係と、後に実行される処理単位が先に実行される処理単位の条件分岐先となるなどの制御依存関係とが含まれる。このような依存関係がある複数の処理単位は、依存関係に従う処理順序で実行される必要がある。なお、本実施形態では、上述したようにタスク毎に分割されたシングルプログラムが並列化の対象である。タスク毎に分割されたシングルプログラムに含まれる複数の処理単位は、データ依存関係や制御依存関係を有している。 The dependency analysis unit 10c analyzes the dependency of the processing units included in the single program developed in the intermediate language, and extracts the processing units that can be executed in parallel. Dependencies include data dependencies such that the processing unit executed later refers to variable data updated in the processing unit executed first, and the processing unit executed later is the processing unit executed first. It includes control dependencies such as conditional branch destinations. A plurality of processing units having such a dependency must be executed in a processing order according to the dependency. In the present embodiment, as described above, the single program divided for each task is the target of parallelization. A plurality of processing units included in a single program divided for each task have a data dependency and a control dependency.

コア割付及びスケジューリング部１０ｄは、依存関係解析部１０ｃで解析した解析結果に基づき、複数の処理単位を３個のコアＣ０～Ｃ２に割り付ける（割り振る）。この際、コア割付及びスケジューリング部１０ｄは、例えば、並列実行可能な処理単位が２個以上のコアＣ０～Ｃ２で並行して実行されるように、複数の処理単位の割り付けを行う。コア割付及びスケジューリング部１０ｄは、各タスクの処理単位の割り付けが終了する毎に、あるいは、すべてのタスクの処理単位の割り付けが完了したときなど、任意のタイミングで、どの処理単位をいずれのコアＣ０～Ｃ２に割り付けたかに関するコア割付情報を格納メモリ決定部１０ｈに出力する。 The core allocation and scheduling unit 10d allocates (allocates) a plurality of processing units to the three cores C0 to C2 based on the analysis result analyzed by the dependency analysis unit 10c. At this time, the core allocation and scheduling unit 10d allocates a plurality of processing units so that, for example, the processing units that can be executed in parallel are executed in parallel by two or more cores C0 to C2. The core allocation and scheduling unit 10d assigns which processing unit to which core C0 at any timing, such as every time the allocation of the processing unit of each task is completed or when the allocation of the processing unit of all tasks is completed. The core allocation information regarding whether or not it has been allocated to C2 is output to the storage memory determination unit 10h.

図３にコア割付情報の一例を示す。図３に示すコア割付情報は、シングルプログラムに含まれるタスクは、第１タスクと第２タスクの２つのタスクであって、第１タスクには６個の処理単位が含まれており、処理単位ナンバーが「１」の処理単位と、「４」の処理単位とが、コアＣ０に割り付けられたことを示している。また、処理単位ナンバーが「２」の処理単位と、「５」の処理単位とが、コアＣ１に割り付けられたことを示している。さらに、処理単位ナンバーが「３」の処理単位と、「６」の処理単位とが、コアＣ２に割り付けられたことを示している。第２タスクに関しては、４個の処理単位の内、処理単位ナンバーが「１」の処理単位と、「４」の処理単位とが、コアＣ０に割り付けられ、処理単位ナンバーが「２」の処理単位がコアＣ１に割り付けられ、処理単位ナンバーが「３」の処理単位がコアＣ２に割り付けられたことを示している。なお、処理単位ナンバーは、第１タスク及び第２タスクのそれぞれのタスクに含まれる複数の処理単位を区別するために、便宜的に付与したものである。 FIG. 3 shows an example of core allocation information. In the core allocation information shown in FIG. 3, the tasks included in the single program are two tasks, the first task and the second task, and the first task includes six processing units, and the processing units are included. It shows that the processing unit whose number is "1" and the processing unit whose number is "4" are assigned to the core C0. Further, it is shown that the processing unit whose processing unit number is "2" and the processing unit whose processing unit number is "5" are assigned to the core C1. Further, it is shown that the processing unit whose processing unit number is "3" and the processing unit whose processing unit number is "6" are assigned to the core C2. Regarding the second task, of the four processing units, the processing unit whose processing unit number is "1" and the processing unit whose processing unit number is "4" are assigned to the core C0, and the processing unit number is "2". It shows that the unit is assigned to the core C1 and the processing unit having the processing unit number "3" is assigned to the core C2. The processing unit number is given for convenience in order to distinguish a plurality of processing units included in each task of the first task and the second task.

さらに、コア割付及びスケジューリング部１０ｄは、３個のコアＣ０～Ｃ２に割り付けられた複数の処理単位のスケジューリングを行う。具体的には、コア割付及びスケジューリング部１０ｄは、各処理単位の実行時間や依存関係に基づいて、３個のコアＣ０～Ｃ２に割り付けられた各処理単位の実行スケジュールを決定する。なお、依存関係解析部１０ｃ、及びコア割付及びスケジューリング部１０ｄは、特開２０１５－１８０７号公報のＭＰ５に相当するため、詳細は、特開２０１５－１８０７号公報を参照されたい。 Further, the core allocation and scheduling unit 10d schedules a plurality of processing units allocated to the three cores C0 to C2. Specifically, the core allocation and scheduling unit 10d determines the execution schedule of each processing unit assigned to the three cores C0 to C2 based on the execution time and the dependency of each processing unit. Since the dependency analysis unit 10c and the core allocation and scheduling unit 10d correspond to MP5 of JP-A-2015-1807, refer to JP-A-2015-1807 for details.

コード生成部１０ｅは、コア割付及びスケジューリング部１０ｄによって決定された各コアＣ０～Ｃ２への割り付け及び実行順序に従って該当するタスクの複数の処理単位が実行されるように、並列プログラムに相当するプログラムコードを生成する。コンピュータ１０は、コード生成部１０ｅによって生成されたプログラムコードを並列プログラム２１ａ１として出力する。 The code generation unit 10e is a program code corresponding to a parallel program so that a plurality of processing units of the corresponding task are executed according to the allocation and execution order to each core C0 to C2 determined by the core allocation and scheduling unit 10d. To generate. The computer 10 outputs the program code generated by the code generation unit 10e as the parallel program 21a1.

タスク実行頻度解析部１０ｆは、並列化の対象となっているタスクのプログラムから、該当するタスクの実行頻度に関する情報を抽出することで、各タスクの実行頻度を解析する。具体的には、タスク実行頻度解析部１０ｆは、タスクの実行頻度情報として、タスクの実行周期を示す情報を抽出する。そして、タスク実行頻度解析部１０ｆは、抽出した実行頻度情報を格納メモリ決定部１０ｈに出力する。 The task execution frequency analysis unit 10f analyzes the execution frequency of each task by extracting information on the execution frequency of the corresponding task from the program of the task to be parallelized. Specifically, the task execution frequency analysis unit 10f extracts information indicating the task execution cycle as the task execution frequency information. Then, the task execution frequency analysis unit 10f outputs the extracted execution frequency information to the storage memory determination unit 10h.

データアクセス解析部１０ｇは、構文・意味解析部１０ｂによって中間言語に展開された、シングルプログラムに含まれる各処理単位が、マルチコアマイコン２０のメモリ（ＲＡＭ、データＲＯＭ、コードＲＯＭ）に格納されるデータにアクセスするものであるかどうか、アクセスする場合には、そのアクセス対象となるデータを解析する。そして、データアクセス解析部１０ｇは、すべてのタスクの解析が完了した時点など、任意のタイミングで、その解析結果をデータアクセス情報としてまとめ、格納メモリ決定部１０ｈに出力する。各タスクに含まれる複数の処理単位がアクセスする、マルチコアマイコン２０のメモリに格納されるデータには、ＲＡＭに格納される変数又は定数データ、データＲＯＭに格納される定数データ、及びコードＲＯＭに格納される、処理単位として共用される関数データの少なくとも１つが含まれる。データアクセス解析部１０ｇは、ＲＡＭに格納されるデータ、データＲＯＭに格納されるデータ、及びコードＲＯＭに格納されるデータ毎に分けて、データアクセス情報を生成する。 In the data access analysis unit 10g, each processing unit included in the single program developed in the intermediate language by the syntax / semantic analysis unit 10b is stored in the memory (RAM, data ROM, code ROM) of the multi-core microcomputer 20. Whether or not it is to be accessed, and if so, the data to be accessed is analyzed. Then, the data access analysis unit 10g collects the analysis result as data access information at an arbitrary timing such as when the analysis of all tasks is completed, and outputs the analysis result to the storage memory determination unit 10h. The data stored in the memory of the multi-core microcomputer 20 accessed by a plurality of processing units included in each task is stored in the variable or constant data stored in the RAM, the constant data stored in the data ROM, and the code ROM. At least one of the function data to be shared as a processing unit is included. The data access analysis unit 10g generates data access information separately for each of the data stored in the RAM, the data stored in the data ROM, and the data stored in the code ROM.

ここで、一例として、マルチコアマイコン２０のＲＡＭに格納されるデータを対象として、データアクセス解析部１０ｇによって生成されるデータアクセス情報を図４に示す。図４に示す例おいて、データナンバーの１～１０は、マルチコアマイコン２０のＲＡＭに格納されて、並列プログラム１８ａに含まれるすべてのタスクのいずれかの処理単位によりアクセスされるすべてのデータを対象として、それらのデータを区別するために便宜的に付与された番号である。ただし、データアクセス情報に含まれるデータは、単一の処理単位によってアクセスされるデータを対象から除外し、複数の処理単位によってアクセスされるデータだけに絞り込んでもよい。単一の処理単位によってアクセスされるデータは、その単一の処理単位が割り付けられたコアから最も短いアクセス時間でアクセス可能なＲＡＭに保存されればよいためである。 Here, as an example, FIG. 4 shows data access information generated by the data access analysis unit 10g for the data stored in the RAM of the multi-core microcomputer 20. In the example shown in FIG. 4, the data numbers 1 to 10 are stored in the RAM of the multi-core microcomputer 20 and cover all the data accessed by any processing unit of all the tasks included in the parallel program 18a. As, it is a number given for convenience to distinguish those data. However, the data included in the data access information may be limited to the data accessed by a plurality of processing units by excluding the data accessed by a single processing unit. This is because the data accessed by a single processing unit only needs to be stored in the RAM accessible by the core to which the single processing unit is allocated with the shortest access time.

また、データアクセス情報には、各タスクの、処理単位ナンバーによって区別される各々の処理単位が、データナンバーによって区別されるどのデータにアクセスするかに関する情報が含まれている。例えば、図４に例示するデータアクセス情報には、第１タスクの処理単位ナンバーが「１」の処理単位は、データナンバーが「１」のデータ、「３」のデータ、及び「７」のデータにそれぞれ１回アクセスするとの情報が含まれている。また、第１タスクの処理単位ナンバーが「２」の処理単位は、データナンバーが「２」のデータ、「３」のデータ、「７」のデータ及び「８」のデータにそれぞれ１回アクセスするとの情報が含まれている。データアクセス情報には、第１タスクの他の処理単位及び第２タスクの各処理単位に関しても、同様に、どのデータにアクセスするかの情報が含まれている。 Further, the data access information includes information on which data of each task, which is distinguished by the processing unit number, accesses which data is distinguished by the data number. For example, in the data access information exemplified in FIG. 4, the processing unit whose processing unit number is "1" in the first task is the data whose data number is "1", the data "3", and the data "7". Contains information that each will be accessed once. Further, when the processing unit whose processing unit number of the first task is "2" accesses the data whose data number is "2", the data "3", the data "7", and the data "8", respectively, once. Information is included. The data access information also includes information on which data is to be accessed for each of the other processing units of the first task and each processing unit of the second task.

格納メモリ決定部１０ｈには、上述したタスクの実行頻度情報、各処理単位によるデータアクセス情報、及び処理単位のコア割付情報に加えて、図２に示すように、マルチコアマイコン２０における、各コアＣ０～Ｃ２から各メモリへのアクセス必要時間を示すアクセスレイテンシ情報（アクセス必要時間情報）２が入力される。このアクセスレイテンシ情報２は、予め、並列プログラム１８ａ’を実装するマルチコアマイコン２０における各コアＣ０～Ｃ２に対する各メモリの配置に基づき、計算もしくは実測により、各コアＣ０～Ｃ２から各メモリへのアクセス必要時間を求めてデータ化したものである。 In the storage memory determination unit 10h, in addition to the above-mentioned task execution frequency information, data access information by each processing unit, and core allocation information of each processing unit, as shown in FIG. 2, each core C0 in the multi-core microcomputer 20 Access latency information (access required time information) 2 indicating the required access time to each memory is input from C2. This access latency information 2 requires access to each memory from each core C0 to C2 by calculation or actual measurement based on the arrangement of each memory for each core C0 to C2 in the multi-core microcomputer 20 that implements the parallel program 18a'in advance. It was converted into data in search of time.

例えば、マルチコアマイコン２０が、各コアＣ０～Ｃ２に対するメモリの配置として、図５に示すようなＲＡＭの配置を有していたとする。具体的には、図５に示す例では、コアＣ０の近傍にローカルＲＡＭＬ０が配置され、コアＣ１の近傍にローカルＲＡＭＬ１が配置され、コアＣ２の近傍にローカルＲＡＭＬ２が配置され、コアＣ０とコアＣ１とを結ぶバスに接続されるようにグローバルＲＡＭＧ０が配置され、コアＣ０とコアＣ１との接続バスとコアＣ２とを結ぶバスに接続されるようにグローバルＲＡＭＧ１が配置されている。なお、図５では省略しているが、マルチコアマイコン２０は、メモリとして、制御に使用する定数データなどを格納する複数のデータＲＯＭ、及び並列プログラム１８ａや関数データを格納するための複数のコードＲＯＭも有している。これらの複数のデータＲＯＭ及び複数のコードＲＯＭも、各コアＣ０～Ｃ２からのアクセス必要時間が異なる。 For example, it is assumed that the multi-core microcomputer 20 has a RAM arrangement as shown in FIG. 5 as a memory arrangement for each core C0 to C2. Specifically, in the example shown in FIG. 5, the local RAML0 is arranged in the vicinity of the core C0, the local RAML1 is arranged in the vicinity of the core C1, the local RAML2 is arranged in the vicinity of the core C2, and the core C0 and the core C1 are arranged. The global RAMG0 is arranged so as to be connected to the bus connecting the core C0, and the global RAMG1 is arranged so as to be connected to the bus connecting the core C0 and the core C1 and the core C2. Although omitted in FIG. 5, the multi-core microcomputer 20 has a plurality of data ROMs for storing constant data and the like used for control as memories, and a plurality of code ROMs for storing parallel programs 18a and function data. Also has. The plurality of data ROMs and the plurality of code ROMs also have different access required times from the cores C0 to C2.

図５に示すような、各コアＣ０～Ｃ２に対する各メモリＬ０～Ｌ２、Ｇ０～Ｇ１の配置において、各コアＣ０～Ｃ２から各メモリＬ０～Ｌ２、Ｇ０～Ｇ１へのアクセス必要時間を求め、アクセスレイテンシ情報２としてまとめた結果の一例を図６に示す。なお、図６では、マルチコアマイコン２０の命令実行サイクルを時間単位として、それぞれのアクセス必要時間を示している。 In the arrangement of the memories L0 to L2 and G0 to G1 for each core C0 to C2 as shown in FIG. 5, the required access time from the cores C0 to C2 to the memories L0 to L2 and G0 to G1 is obtained and accessed. FIG. 6 shows an example of the results summarized as the latency information 2. Note that FIG. 6 shows the required access time for each instruction execution cycle of the multi-core microcomputer 20 as a time unit.

図６に示すように、例えば、コアＣ０が、ローカルＲＡＭＬ０にアクセスする場合には１サイクルに相当する時間で済む。しかし、コアＣ０が、ローカルＲＡＭＬ１にアクセスする場合には３サイクル、ローカルＲＡＭＬ２にアクセスする場合には５サイクル、グローバルＲＡＭＧ０にアクセスする場合には２サイクル、グローバルＲＡＭＧ１にアクセスする場合には４サイクルに相当する時間をそれぞれ要することとなる。このように、一般的に、各コアＣ０～Ｃ２の近傍に配置されたメモリＬ０～Ｌ２ほど、該当するコアＣ０～Ｃ２によるアクセス必要時間は短時間で済む。逆に、各コアＣ０～Ｃ２から離間して配置されたメモリ（例えば、コアＣ０に対するメモリＬ２）ほど、アクセス必要時間は長くなる。このため、例えば、コアＣ０に割り付けられた処理単位によって高頻度でアクセスされるデータが、ローカルＲＡＭＬ２に格納されたとすると、コアＣ０からローカルＲＡＭＬ２へのアクセス時間が長くかかるようになってしまい、ひいては、並列プログラム１８ａの実行時間が長くなってしまうという問題が生じる。 As shown in FIG. 6, for example, when the core C0 accesses the local RAML0, the time corresponding to one cycle is sufficient. However, when the core C0 accesses the local RAML1, it has 3 cycles, when it accesses the local RAML2, it has 5 cycles, when it accesses the global RAMG0, it has 2 cycles, and when it accesses the global RAMG1, it has 4 cycles. It will take a considerable amount of time. As described above, in general, the memory L0 to L2 arranged in the vicinity of each core C0 to C2 requires a shorter access time by the corresponding cores C0 to C2. On the contrary, the memory arranged apart from each core C0 to C2 (for example, the memory L2 with respect to the core C0) has a longer access time. Therefore, for example, if the data frequently accessed by the processing unit assigned to the core C0 is stored in the local RAML2, the access time from the core C0 to the local RAML2 becomes long, which in turn takes a long time. , There arises a problem that the execution time of the parallel program 18a becomes long.

このため、格納メモリ決定部１０ｈは、タスクの実行頻度情報、各処理単位によるデータアクセス情報、各処理単位のコア割付情報、及びアクセスレイテンシ情報２に基づき、複数のコアＣ０～Ｃ２によるデータへのアクセス時間が全体として短縮されるように、複数のメモリ（例えば、メモリＬ０～Ｌ２、Ｇ０～Ｇ１）の中から、データを格納するメモリを決定する。そして、メモリマップ生成部１０ｉは、格納メモリ決定部１０ｈによって決定されたデータとその格納先のメモリとの関係を示すとともに、格納先メモリにおける各データの格納場所を示すアドレス情報を示すメモリマップ１８ｂを生成する。 Therefore, the storage memory determination unit 10h transfers data by a plurality of cores C0 to C2 based on task execution frequency information, data access information by each processing unit, core allocation information by each processing unit, and access latency information 2. A memory for storing data is determined from a plurality of memories (for example, memories L0 to L2 and G0 to G1) so that the access time is shortened as a whole. Then, the memory map generation unit 10i shows the relationship between the data determined by the storage memory determination unit 10h and the storage destination memory, and the memory map 18b showing the address information indicating the storage location of each data in the storage destination memory. To generate.

並列化ツールとしてのコンピュータ１０により生成された並列プログラム１８ａは、上述したように、コンパイラ１９によってコンパイルされ、バイナリコードに翻訳された並列プログラム１８ａ’に変換される。コンパイラ１９は、図２に示すように、コンピュータ１０から与えられるメモリマップ１８ｂを参照して、メモリマップ１８ｂに含まれるデータと格納先メモリとの関係を満たすように、コンパイル後の並列プログラム１８ａ’が各データを格納するメモリと、そのメモリにおける格納場所を示すアドレスを定める。つまり、コンパイラ１９は、格納先のメモリ及び当該メモリのアドレス情報に従って、各データの格納場所を定める。このように、メモリマップ１８ｂを利用することにより、マルチコアマイコン２０にて並列プログラム１８ａ’を実行するときに、メモリマップ１８ｂに含まれるデータと格納先メモリとの関係を満たすように、各データを格納するメモリを定めることができる。その結果、マルチコアマイコン２０の複数のコアＣ０～Ｃ２によるデータのアクセス時間を全体として短縮することができ、ひいては、並列プログラム１８１’の実行時間の短縮化を図ることができる。 As described above, the parallel program 18a generated by the computer 10 as a parallelization tool is compiled by the compiler 19 and converted into the parallel program 18a'translated into binary code. As shown in FIG. 2, the compiler 19 refers to the memory map 18b provided by the computer 10 and performs a parallel program 18a'after compilation so as to satisfy the relationship between the data contained in the memory map 18b and the storage destination memory. Defines the memory for storing each data and the address indicating the storage location in the memory. That is, the compiler 19 determines the storage location of each data according to the storage destination memory and the address information of the memory. In this way, by using the memory map 18b, when the parallel program 18a'is executed by the multi-core microcomputer 20, each data is stored so as to satisfy the relationship between the data included in the memory map 18b and the storage destination memory. The memory to be stored can be determined. As a result, the data access time by the plurality of cores C0 to C2 of the multi-core microcomputer 20 can be shortened as a whole, and the execution time of the parallel program 181'can be shortened.

なお、データＲＯＭ及び／又はコードＲＯＭに関するメモリマップが生成された場合には、並列プログラム１８ａ’をマルチコアマイコン２０のＲＯＭに実装する際に、併せて、メモリマップ１８ｂに従って、該当するデータＲＯＭ及び／又はコードＲＯＭの指定されたアドレスに、定数データ及び／又は関数データを格納すればよい。 When a memory map related to the data ROM and / or the code ROM is generated, when the parallel program 18a'is mounted on the ROM of the multi-core microcomputer 20, the corresponding data ROM and / or according to the memory map 18b are also generated. Alternatively, the constant data and / or the function data may be stored in the specified address of the code ROM.

以下、図７～図９のフローチャート及び図１０～図１２の説明図を参照しつつ、格納メモリ決定部１０ｈが、複数のメモリの中からデータを格納するメモリを決定するとともに、メモリマップ生成部がメモリマップを生成するための処理について詳しく説明する。図７のフローチャートは、ＲＡＭに格納されるデータについて、格納先となるメモリを決定するとともにメモリマップを作成するための処理を示す。図８のフローチャートは、データＲＯＭに格納されるデータについて、格納先となるメモリを決定するとともにメモリマップを作成するための処理を示す。図９のフローチャートは、コードＲＯＭに格納されるデータについて、格納先となるメモリを決定するとともにメモリマップを作成するための処理を示す。 Hereinafter, with reference to the flowcharts of FIGS. 7 to 9 and the explanatory diagrams of FIGS. 10 to 12, the storage memory determination unit 10h determines the memory for storing data from the plurality of memories, and the memory map generation unit. Describes in detail the process for generating a memory map. The flowchart of FIG. 7 shows a process for determining a memory to be stored and creating a memory map for the data stored in the RAM. The flowchart of FIG. 8 shows a process for determining a memory to be stored and creating a memory map for the data stored in the data ROM. The flowchart of FIG. 9 shows a process for determining a memory to be stored and creating a memory map for the data stored in the code ROM.

最初に、図７のフローチャートに示す処理について説明する。最初のステップＳ１００では、タスク実行頻度解析部１０ｆから、各タスクの実行頻度情報を取得する。また、コア割付及びスケジューリング部１０ｄから、図３に示すような、各タスクに含まれる処理単位の各コアＣ０～Ｃ２へのコア割付情報を取得する。さらに、データアクセス解析部１０ｇから、図４に示すような、ＲＡＭに格納されるデータのデータアクセス情報を取得する。なお、ステップＳ１００における各情報の取得は、一度にまとめて行われる必要はない。例えば、複数回に分けてそれぞれの情報の一部を取得し、すべての情報が収集できた時点で、それまで取得した情報をまとめることにより、各情報を取得することも可能である。 First, the process shown in the flowchart of FIG. 7 will be described. In the first step S100, the execution frequency information of each task is acquired from the task execution frequency analysis unit 10f. Further, the core allocation information to each core C0 to C2 of the processing unit included in each task is acquired from the core allocation and scheduling unit 10d as shown in FIG. Further, the data access information of the data stored in the RAM as shown in FIG. 4 is acquired from the data access analysis unit 10g. It should be noted that the acquisition of each information in step S100 does not have to be performed all at once. For example, it is possible to acquire each information by acquiring a part of each information in a plurality of times and collecting the information acquired so far when all the information can be collected.

続くステップＳ１１０では、ステップＳ１００にて取得した各タスクの実行頻度情報、コア割付情報、及びデータアクセス情報に基づき、データ毎に、単位時間当たりの各コアＣ０～Ｃ２からのアクセス頻度を算出する。各コアＣ０～Ｃ２からのアクセス頻度を算出する具体例を、図１０及び図１１の説明図を参照して詳しく説明する。 In the following step S110, the access frequency from each core C0 to C2 per unit time is calculated for each data based on the execution frequency information, core allocation information, and data access information of each task acquired in step S100. A specific example of calculating the access frequency from each core C0 to C2 will be described in detail with reference to the explanatory diagrams of FIGS. 10 and 11.

まず、各タスクの実行頻度情報とデータアクセス情報とに基づいて、図１０に示すような、単位時間当たりの各処理単位による各データへのアクセス頻度（アクセス回数）を算出する。図１０の単位時間当たりの各処理単位による各データへのアクセス頻度は、図４に示した各処理単位の各データへのアクセス情報に基づいて算出されたものである。図４に示すように、第１タスクは実行周期が１ｍｓ毎であり、第２タスクは実行周期が４ｍｓ毎である。このように第１タスクと第２タスクとでは実行周期が異なり、第１タスクは第２タスクの４倍の頻度で実行される。従って、図４に示される、第１タスクの処理単位によるデータアクセスの回数と、第２タスクの処理単位によるデータアクセスの回数とをそのまま同等に扱うことはできない。 First, based on the execution frequency information of each task and the data access information, the access frequency (access count) to each data by each processing unit per unit time as shown in FIG. 10 is calculated. The access frequency to each data by each processing unit per unit time in FIG. 10 is calculated based on the access information to each data in each processing unit shown in FIG. As shown in FIG. 4, the first task has an execution cycle of every 1 ms, and the second task has an execution cycle of every 4 ms. In this way, the execution cycle is different between the first task and the second task, and the first task is executed four times as frequently as the second task. Therefore, the number of data accesses by the processing unit of the first task and the number of data accesses by the processing unit of the second task, which are shown in FIG. 4, cannot be treated as they are.

そのため、各タスクの実行周期を利用して、単位時間当たりに、各タスクの処理単位による各データのアクセスの回数を算出する。これにより、各タスクの処理単位による各データへのアクセス回数の持つ意味が同等になる。図１０に示す例では、単位時間を８ｍｓとしている。単位時間である８ｍｓの間に、第１タスクに含まれる処理単位の実行回数は８回となり、第２タスクに含まれる処理単位の実行回数は２回となる。従って、図４に示すデータアクセス情報のアクセス回数が、第１タスクに含まれる処理単位については８倍され、第２タスクに含まれる処理単位については２倍される。これにより、図１０に示す単位時間当たりの各処理単位による各データへのアクセス頻度が得られる。 Therefore, the number of times each data is accessed by the processing unit of each task is calculated per unit time by using the execution cycle of each task. As a result, the meaning of the number of accesses to each data by the processing unit of each task becomes the same. In the example shown in FIG. 10, the unit time is 8 ms. During the unit time of 8 ms, the number of executions of the processing unit included in the first task is eight, and the number of executions of the processing unit included in the second task is two. Therefore, the number of times of access of the data access information shown in FIG. 4 is multiplied by 8 for the processing unit included in the first task and doubled for the processing unit included in the second task. As a result, the access frequency to each data by each processing unit per unit time shown in FIG. 10 can be obtained.

さらに、図１０に示すような、単位時間当たりの各処理単位による各データへのアクセス頻度と、図３に示すような、各処理単位の各コアＣ０～Ｃ２への割付情報とに基づいて、各コア毎に、各データへのアクセス頻度（アクセス回数）をまとめることにより、図１１に示すような、単位時間当たりの各コアＣ０～Ｃ２による各データへのアクセス頻度を算出する。例えば、図３のコア割付情報が得られた場合、第１タスクの処理単位ナンバーが「１」、「４」の処理単位と、第２タスクの処理単位ナンバーが「１」、「４」の処理単位とが、コアＣ０に割り付けられることになる。単位時間当たりのコアＣ０からの各データへのアクセス回数を求めるために、コアＣ０に割り付けられるすべての処理単位（第１タスクの処理単位ナンバーが「１」、「４」の処理単位と、第２タスクの処理単位ナンバーが「１」、「４」の処理単位）による各データへのアクセス回数の総計をデータ毎に求める。同様に、単位時間当たりのコアＣ１、Ｃ２からの各データへのアクセス回数を求めるために、コアＣ１、Ｃ２に割り付けられるすべての処理単位による各データへのアクセス回数の総計をデータ毎に求める。このようにして各コアＣ０～Ｃ２からの各データへのアクセス回数の総計をまとめることで、図１１に示す、単位時間当たりの各コアＣ０～Ｃ２による各データへのアクセス頻度が得られる。 Further, based on the access frequency to each data by each processing unit per unit time as shown in FIG. 10 and the allocation information to each core C0 to C2 of each processing unit as shown in FIG. By summarizing the access frequency (access count) to each data for each core, the access frequency to each data by each core C0 to C2 per unit time as shown in FIG. 11 is calculated. For example, when the core allocation information shown in FIG. 3 is obtained, the processing unit numbers of the first task are "1" and "4", and the processing unit numbers of the second task are "1" and "4". The processing unit is assigned to the core C0. All processing units assigned to core C0 in order to obtain the number of accesses to each data from core C0 per unit time (processing units whose first task processing unit numbers are "1" and "4", and the first The total number of accesses to each data by the processing unit numbers of 2 tasks (processing units of "1" and "4") is calculated for each data. Similarly, in order to obtain the number of accesses to each data from the cores C1 and C2 per unit time, the total number of accesses to each data by all the processing units assigned to the cores C1 and C2 is obtained for each data. By summarizing the total number of accesses to each data from each core C0 to C2 in this way, the access frequency to each data by each core C0 to C2 per unit time shown in FIG. 11 can be obtained.

次に、図７のフローチャートのステップＳ１２０では、図６に示すような、各コアＣ０～Ｃ２から各ＲＡＭＬ０、Ｌ１、Ｌ２、Ｇ０、Ｇ１にアクセスする必要時間に関するアクセスレイテンシ情報を取得する。続くステップＳ１３０では、各データを各ＲＡＭＬ０、Ｌ１、Ｌ２、Ｇ０、Ｇ１に格納したと仮定した場合の、単位時間当たりのアクセス必要時間の合計を算出する。このアクセス必要時間の合計は、図１１の単位時間当たりの各コアからのアクセス頻度と図６のアクセスレイテンシ情報とから算出することができる。 Next, in step S120 of the flowchart of FIG. 7, access latency information regarding the time required to access each RAM L0, L1, L2, G0, and G1 is acquired from each core C0 to C2 as shown in FIG. In the following step S130, the total access required time per unit time is calculated assuming that each data is stored in each RAM L0, L1, L2, G0, G1. The total required access time can be calculated from the access frequency from each core per unit time in FIG. 11 and the access latency information in FIG.

例えば、データナンバーが「１」のデータをＲＡＭＬ０に格納した場合、コアＣ０からデータナンバーが「１」のデータへのアクセス頻度は１８回であり、ＲＡＭＬ０へのアクセス必要時間は１サイクルであるため、コアＣ０からＲＡＭＬ０へのアクセス必要時間の合計は１８×１＝１８（サイクル）となる。また、コアＣ１とコアＣ２からＲＡＭＬ０に格納されたデータナンバーが「１」のデータへのアクセス必要時間の合計は、それぞれ、２×３＝６（サイクル）、１６×５＝８０（サイクル）となる。従って、データナンバーが「１」のデータをＲＡＭＬ０に格納した場合、各コアＣ０～Ｃ２からのアクセス必要時間の合計は、１８＋６+８０＝１０４（サイクル）となる。同様にして、データナンバーが「１」のデータが他のＲＡＭＬ１、Ｌ２、Ｇ０、Ｇ１に格納された場合の、各コアＣ０～Ｃ２からのアクセス必要時間の合計も求めることができる。さらに、他のデータに関しても、各ＲＡＭＬ０、Ｌ１、Ｌ２、Ｇ０、Ｇ１に格納したと仮定した場合の、単位時間当たりの各コアＣ０～Ｃ２からのアクセス必要時間の合計を求めて、それらをまとめたものが、図１２に示すアクセス必要時間の合計データとなる。 For example, when the data with the data number "1" is stored in the RAM L0, the access frequency from the core C0 to the data with the data number "1" is 18 times, and the time required to access the RAM L0 is one cycle. , The total required access time from the core C0 to the RAML0 is 18 × 1 = 18 (cycle). Further, the total time required to access the data whose data number is "1" stored in RAML0 from the core C1 and the core C2 is 2 × 3 = 6 (cycle) and 16 × 5 = 80 (cycle), respectively. Become. Therefore, when the data whose data number is "1" is stored in RAML0, the total required access time from each core C0 to C2 is 18 + 6 + 80 = 104 (cycle). Similarly, when the data having the data number "1" is stored in the other RAM L1, L2, G0, G1, the total access required time from each core C0 to C2 can be obtained. Furthermore, for other data, the total access time required from each core C0 to C2 per unit time is calculated and summarized assuming that the data is stored in each RAM L0, L1, L2, G0, G1. Is the total data of the required access time shown in FIG.

ステップＳ１４０では、アクセス必要時間の合計データに基づき、データ毎に、アクセス必要時間の合計が最小となるメモリを、該当データを格納するメモリとして決定する。例えば、図１２に示すアクセス必要時間の合計データが得られた場合、データナンバーが「１」のデータに関しては、ＲＡＭＬ０とＲＡＭＧ０とのアクセス必要時間の合計が最も小さい。このため、データナンバーが「１」のデータを格納するメモリは、ＲＡＭＬ１（又はＲＡＭＧ０）に決定する。他のデータナンバーのデータについても、同様にして、該当のデータを格納するメモリを決定することができる。 In step S140, based on the total data of the required access time, the memory having the minimum total of the required access time is determined as the memory for storing the corresponding data for each data. For example, when the total access required time data shown in FIG. 12 is obtained, the total access required time between RAML0 and RAMG0 is the smallest for the data whose data number is "1". Therefore, the memory for storing the data whose data number is "1" is determined to be RAML1 (or RAMG0). For the data of other data numbers, the memory for storing the corresponding data can be determined in the same manner.

そして、ステップＳ１５０では、ステップＳ１４０にて決定した各データと格納先となるＲＡＭとの関係に加え、格納先ＲＡＭにおける格納場所のアドレスを決定し、その決定したアドレス情報をメモリマップ１８ｂとして生成する。 Then, in step S150, in addition to the relationship between each data determined in step S140 and the storage destination RAM, the address of the storage location in the storage destination RAM is determined, and the determined address information is generated as the memory map 18b. ..

以上が、ＲＡＭに格納されるデータについて、格納先となるメモリを決定するとともにメモリマップ１８ｂを作成するための処理となる。データＲＯＭ及びコードＲＯＭに格納されるデータに関しても、図８及び図９のフローチャートに示すように、基本的に上述した図７のフローチャートと同様の処理によって、格納先となるメモリを決定するとともにメモリマップ１８ｂを作成することができる。従って、図８及び図９のフローチャートに関しては、詳細な説明を省略する。 The above is the process for determining the memory to be stored in the data stored in the RAM and creating the memory map 18b. As for the data stored in the data ROM and the code ROM, as shown in the flowcharts of FIGS. 8 and 9, basically the same processing as that of the flowchart of FIG. 7 described above is performed to determine the storage destination memory and the memory. Map 18b can be created. Therefore, detailed description of the flowcharts of FIGS. 8 and 9 will be omitted.

以上、本発明の好ましい実施形態について説明した。しかしながら、本発明は、上記実施形態に何ら制限されることはなく、本発明の趣旨を逸脱しない範囲において、種々の変形が可能である。 The preferred embodiment of the present invention has been described above. However, the present invention is not limited to the above embodiment, and various modifications can be made without departing from the spirit of the present invention.

（変形例１）
例えば、上述した実施形態では、マルチコアマイコン２０が、ＲＡＭ、データＲＯＭ、及びコードＲＡＭをそれぞれ複数備え、ＲＡＭ、データＲＯＭ、及びコードＲＡＭに格納されるデータに関して、最適なメモリに格納する例について説明した。しかしながら、マルチコアマイコン２０は、ＲＡＭ、データＲＯＭ、及びコードＲＯＭの少なくとも１種類のメモリについて複数のメモリを備え、それら複数のメモリの中から、データを格納するための最適なメモリを決定するものであってもよい。 (Modification 1)
For example, in the above-described embodiment, the example in which the multi-core microcomputer 20 includes a plurality of RAMs, data ROMs, and code RAMs, and stores the data stored in the RAMs, data ROMs, and code RAMs in the optimum memory will be described. bottom. However, the multi-core microcomputer 20 includes a plurality of memories for at least one type of memory of RAM, data ROM, and code ROM, and determines the optimum memory for storing data from the plurality of memories. There may be.

（変形例２）
また、上述した実施形態では、データを格納するメモリを決定するために、各データを各メモリに格納したと仮定した場合における、単位時間当たりの各コアＣ０～Ｃ２からのアクセス必要時間の合計を求め、そのアクセス必要時間の合計が最少となるメモリを選択する例について説明した。しかしながら、データを格納するメモリを決定する手法は、これに限られない。例えば、データ毎に、単位時間当たりのアクセス頻度が最も高いコアを選び、そのコアからのアクセス必要時間が最も短いメモリを、該当するデータを格納するメモリとして決定しても良い。この場合において、単位時間当たりのアクセス頻度が最も高いコアが複数ある場合には、その複数のコアの間にあるグローバルメモリを、該当するデータを格納するメモリとして決定することが好ましい。 (Modification 2)
Further, in the above-described embodiment, in order to determine the memory for storing the data, the total access required time from each core C0 to C2 per unit time is calculated assuming that each data is stored in each memory. An example of selecting the memory that minimizes the total access time required was explained. However, the method for determining the memory for storing data is not limited to this. For example, the core having the highest access frequency per unit time may be selected for each data, and the memory having the shortest access time required from the core may be determined as the memory for storing the corresponding data. In this case, when there are a plurality of cores having the highest access frequency per unit time, it is preferable to determine the global memory between the plurality of cores as the memory for storing the corresponding data.

（変形例３）
さらに、上述した実施形態では、アクセスレイテンシ情報として、各コアＣ０～Ｃ２から各メモリへの一種類のアクセス必要時間を用いていた。しかしながら、各コアＣ０～Ｃ２が各メモリからデータを読み出すためのアクセスを行うときと、各メモリへデータを書き込むためのアクセスを行うときとで、有意なアクセス必要時間の相違が有る場合には、アクセスレイテンシ情報として、読み出し必要時間と書き込み必要時間とを別々に定めてもよい。 (Modification 3)
Further, in the above-described embodiment, one type of access required time from each core C0 to C2 to each memory is used as the access latency information. However, if there is a significant difference in the required access time between when each core C0 to C2 makes an access for reading data from each memory and when making an access for writing data to each memory, As the access latency information, the read required time and the write required time may be separately determined.

（変形例４）
また、上述した実施形態では、マルチコアマイコン２０を車載装置として適用する例について説明したが、マルチコアマイコン２０の適用対象はこれに限られない。 (Modification example 4)
Further, in the above-described embodiment, an example in which the multi-core microcomputer 20 is applied as an in-vehicle device has been described, but the application target of the multi-core microcomputer 20 is not limited to this.

１…記憶媒体、１ａ…自動並列化コンパイラ、２…アクセスレイテンシ情報、１０…コンピュータ、１０ａ…字句解析部、１０ｂ…構文・意味解析部、１０ｃ…依存関係解析部、１０ｄ…コア割付及びスケジューリング部、１０ｅ…コード生成部、１０ｆ…タスク実行頻度解析部、１０ｇ…データアクセス解析部、１０ｈ…格納メモリ決定部、１０ｉ…メモリマップ生成部、１１…ディスプレイ、１２…ＨＤＤ、１３…ＣＰＵ、１４…ＲＯＭ、１５…ＲＡＭ、１６…入力装置、１７…読取部、１９…コンパイラ、２０…マルチコアマイコン 1 ... Storage medium, 1a ... Automatic parallelizing compiler, 2 ... Access latency information, 10 ... Computer, 10a ... Lexical analysis unit, 10b ... Syntax / semantic analysis unit, 10c ... Dependency analysis unit, 10d ... Core allocation and scheduling unit 10e ... Code generation unit, 10f ... Task execution frequency analysis unit, 10g ... Data access analysis unit, 10h ... Storage memory determination unit, 10i ... Memory map generation unit, 11 ... Display, 12 ... HDD, 13 ... CPU, 14 ... ROM, 15 ... RAM, 16 ... input device, 17 ... reader, 19 ... compiler, 20 ... multi-core computer

Claims

From a single program for a single-core microcomputer having one core, a plurality of cores (C0, C1, C2) and a plurality of memories (L0, L1, L2, G0, G1) accessible by the plurality of cores can be obtained. The plurality of memories are a parallelization method for generating a parallel program (18a) for a multi-core microcomputer (20) including memories having different access required times required for access by the plurality of cores.
For each task composed of a plurality of processing units included in the single program, the allocation and execution order of the plurality of processing units to the plurality of cores are determined based on the dependency of the plurality of processing units. A parallel program generation procedure (10a to 10e) for generating the parallel program so that the plurality of processing units are executed according to the determined allocation to the plurality of cores and the execution order.
Allocation information of the plurality of processing units to the plurality of cores, data access information regarding data accessed by the plurality of processing units, execution frequency information indicating the execution frequency of the task, and the plurality of memories by the plurality of cores. A memory for storing the data from the plurality of memories so that the access time to the data by the plurality of cores is shortened as a whole based on the access required time information indicating the access required time for each of the above. Storage memory determination procedure (10h) to determine
A parallelization method comprising a memory map generation procedure (10i) for generating a memory map (18b) showing a relationship between the data determined by the storage memory determination procedure and the storage destination memory thereof.

The storage memory determination procedure is as follows.
The data by the plurality of cores per unit time based on the allocation information of the plurality of processing units to the plurality of cores, the data access information regarding the data accessed by the plurality of processing units, and the execution frequency information of the task. Including the access frequency calculation procedure (S110, S210, S310) for calculating the access frequency to
The parallel according to claim 1, wherein the memory for storing the data is determined based on the access frequency of the data by the plurality of cores and the access required time information for each of the plurality of memories by the plurality of cores. How to make it.

The storage memory determination procedure is as follows.
Based on the access frequency of the data by the plurality of cores and the access required time information for each of the plurality of memories by the plurality of cores, the plurality of memories per unit time as the memory for storing the data. It includes a memory selection procedure (S140, S240, S340) that selects the memory that minimizes the total access time required by the core.
The parallelization method according to claim 2, wherein the memory selected by the memory selection procedure is determined as the memory for storing the data.

The data accessed by the plurality of cores includes any of variable or constant data stored in the RAM, constant data stored in the data ROM, and function data stored in the code ROM and shared as the processing unit. The parallelization method according to any one of claims 1 to 3, which includes.

The parallelization method according to claim 4, wherein the multi-core microcomputer is provided with a plurality of memories for at least one type of memory of the RAM, the data ROM, and the code ROM.

A multi-core microcomputer (20) that executes the parallel program generated by the parallelization method according to any one of claims 1 to 5.
When the plurality of cores of the multi-core microcomputer access the data for execution of the processing unit assigned to each, the memory to be accessed is the memory defined according to the memory map.

From a single program for a single-core microcomputer having one core, a plurality of cores (C0, C1, C2) and a plurality of memories (L0, L1, L2, G0, G1) accessible by the plurality of cores can be obtained. The plurality of memories are parallelization tools for generating a parallel program (18a) for a multi-core microcomputer (20) including memories having different access required times for access by the plurality of cores.
For each task composed of a plurality of processing units included in the single program, the allocation and execution order of the plurality of processing units to the plurality of cores are determined based on the dependency of the plurality of processing units. A parallel program generation unit (10a to 10e) that generates the parallel program so that the plurality of processing units are executed according to the determined allocation to the plurality of cores and the execution order.
Allocation information of the plurality of processing units to the plurality of cores, data access information regarding data accessed by the plurality of processing units, execution frequency information indicating the execution frequency of the task, and the plurality of memories by the plurality of cores. A memory for storing the data from the plurality of memories so that the access time to the data by the plurality of cores is shortened as a whole based on the access required time information indicating the access required time for each of the above. Storage memory determination unit (10h) that determines
A parallelization tool including a memory map generation unit (10i) that generates a memory map (18b) showing the relationship between the data determined by the storage memory determination unit and the storage destination memory thereof.

The storage memory determination unit is
The data by the plurality of cores per unit time based on the allocation information of the plurality of processing units to the plurality of cores, the data access information regarding the data accessed by the plurality of processing units, and the execution frequency information of the task. Includes an access frequency calculation unit (S110, S210, S310) that calculates the access frequency to
The parallel according to claim 7, wherein the memory for storing the data is determined based on the access frequency of the data by the plurality of cores and the access required time information for each of the plurality of memories by the plurality of cores. Tool.

The storage memory determination unit is
Based on the access frequency of the data by the plurality of cores and the access required time information for each of the plurality of memories by the plurality of cores, the plurality of memories per unit time as the memory for storing the data. Includes memory selection units (S140, S240, S340) that select the memory that minimizes the total access time required by the core.
The parallelization tool according to claim 8, wherein the memory selected by the memory selection unit is determined as the memory for storing the data.

The data accessed by the plurality of cores includes any of variable or constant data stored in the RAM, constant data stored in the data ROM, and function data stored in the code ROM and shared as the processing unit. The parallelization tool according to any one of claims 7 to 9, which includes.

The parallelization tool according to claim 10, wherein the multi-core microcomputer is provided with a plurality of memories for at least one type of memory of the RAM, the data ROM, and the code ROM.

A multi-core microcomputer (20) that executes the parallel program generated by the parallelization tool according to any one of claims 7 to 11.
When the plurality of cores of the multi-core microcomputer access the data for execution of the processing unit assigned to each, the memory to be accessed is the memory defined according to the memory map.

A multi-core microcomputer that executes a parallel program (18a) for a multi-core microcomputer (20) having a plurality of cores (C0, C1, C2) generated from a single program for a single-core microcomputer having one core. ,
The plurality of cores have a plurality of memories (L0, L1, L2, G0, G1) that can be accessed, and the plurality of memories include memories that require different access times for access by the plurality of cores.
In the parallel program, for each task including a plurality of processing units included in the single program, the allocation and execution order of the plurality of processing units to the plurality of cores based on the dependency relationship of the plurality of processing units. Is determined, and the plurality of processing units are generated to be executed according to the determined allocation to the plurality of cores and the execution order.
When the plurality of cores access the data stored in the plurality of memories for the execution of the allocated processing unit, the memory to be accessed is the plurality of cores of the plurality of processing units. Allocation information to, data access information regarding data accessed by the plurality of processing units, execution frequency information indicating the execution frequency of the task, and access indicating the time required to access each of the plurality of memories by the plurality of cores. Based on the required time information, a memory for storing the data is determined from the plurality of memories so that the access time to the data by the plurality of cores is shortened as a whole, and the determined data is determined. A multi-core microcomputer that is a memory defined according to a memory map (18b) generated to show the relationship between the data and the storage destination memory.

The multi-core microcomputer is applied to an in-vehicle device for controlling an in-vehicle device mounted on a vehicle, and any one of claims 6, 12, and 13 for controlling the in-vehicle device by executing the parallel program. The multi-core microcomputer described in.

The memory map is based on the allocation information of the plurality of processing units to the plurality of cores, the data access information regarding the data accessed by the plurality of processing units, and the execution frequency information of the task. The access frequency of the data by the plurality of cores is calculated, and based on the calculated access frequency of the data by the plurality of cores and the access required time information for each of the plurality of memories by the plurality of cores. The multi-core microcomputer according to claim 13 or 14, wherein a memory for storing data is determined and is generated according to the determined content.

Based on the access frequency of the data by the plurality of cores and the access required time information for each of the plurality of memories by the plurality of cores, the plurality of cores per unit time are used as the memory for storing the data. The multi-core microcomputer according to claim 15, wherein a memory having the minimum total access required time is selected, and the selected memory is determined as a memory for storing the data.

The data accessed by the plurality of cores includes any of variable or constant data stored in the RAM, constant data stored in the data ROM, and function data stored in the code ROM and shared as the processing unit. The multi-core microcomputer according to any one of claims 13 to 16, including.

The multi-core microcomputer according to claim 17, wherein the multi-core microcomputer is provided with a plurality of memories for at least one type of memory of the RAM, the data ROM, and the code ROM.